Larry Franks and Brian Swan on Open Source and Device Development in the Cloud
UPDATE: Windows Azure HDInsight has shipped. Azure Storage Vault (ASV) is now just blob storage. Note the following changes in the syntax used to access data in blob storage:
Also the old console in the original post has been replaced. For the latest documentation on HDInsight, see http://www.windowsazure.com/en-us/documentation/services/hdinsight/
HDInsight is trying to provide the best of two worlds in how it manages its data.
Azure Vault Storage (ASV) and the Hadoop Distributed File System (HDFS)implemented by HDInsight on Azure are distinct file systems that are optimized,respectively, for the storage of data and computations on that data.
NOTE: The ASV term used currently by HDInsight is being deprecated in favor of WASB (Windows Azure Storage - BLOB service). Both are backed by blob storage, so this change will not impact existing data stored in ASV; only the syntax used changes. Where you currently use asv:// to access data, going forward you will use wasb://. Both currently work, but at some point in the future, the asv://syntax will be removed.
HDInsight clusters are deployed in Azure on compute nodes to execute M/Rtasks and are dropped once these tasks have been completed. Keeping the data inthe HDFS clusters after computations have been completed would be an expensiveway to store this data. ASV provides a full featured HDFS file system overAzure Blob storage (ABS). ABS is a robust, general purpose Azure storagesolution, so storing data in ABS enables the clusters used for computation tobe safely deleted without losing user data. ASV is not only low cost. It has beendesigned as an HDFS extension to provide a seamless experience to customers byenabling the full set of components in the Hadoop ecosystem to operate directlyon the data it manages.
NOTE: The ASV:// syntax is being deprecated for wasb://(WASB = Windows Azure Storage – BLOB service)
To use this feature in the current release, you will needHDInsight and Windows Azure Blob Storage accounts. To access your storageaccount from HDInsight, go to the Cluster and click on the Manage Cluster tile.
Click on the Set up ASV button.
Enter the credentials (Name and Passkey) for your Windows Azure Blob Storage account.
Now to run Hadoop wordcount job with data an ASV container name hadoop useHadoop jar hadoop-examples-1.1.0-SNAPSHOT.jar wordcount asv://hadoop/ outputfile
The scheme for accessing data in ASV is asv://container/path
To see the data in asv#cat asv://hadoop2/data