Larry Franks and Brian Swan on Open Source and Device Development in the Cloud
HDInsight is trying to provide the best of two worlds in how it manages its data.
Azure Vault Storage (ASV) and the Hadoop Distributed File System (HDFS)implemented by HDInsight on Azure are distinct file systems that are optimized,respectively, for the storage of data and computations on that data.
HDInsight clusters are deployed in Azure on compute nodes to execute M/Rtasks and are dropped once these tasks have been completed. Keeping the data inthe HDFS clusters after computations have been completed would be an expensiveway to store this data. ASV provides a full featured HDFS file system overAzure Blob storage (ABS). ABS is a robust, general purpose Azure storagesolution, so storing data in ABS enables the clusters used for computation tobe safely deleted without losing user data. ASV is not only low cost. It has beendesigned as an HDFS extension to provide a seamless experience to customers byenabling the full set of components in the Hadoop ecosystem to operate directlyon the data it manages.
To use this feature in the current release, you will needHDInsight and Windows Azure Blob Storage accounts. To access your storageaccount from HDInsight, go to the Cluster and click on the Manage Cluster tile.
Click on the Set up ASV button.
Enter the credentials (Name and Passkey) for your Windows Azure Blob Storage account.
Now to run Hadoop wordcount job with data an ASV container name hadoop useHadoop jar hadoop-examples-1.1.0-SNAPSHOT.jar wordcount asv://hadoop/ outputfile
The scheme for accessing data in ASV is asv://container/path
To see the data in asv#cat asv://hadoop2/data