I did some tests with the HDInsight CTP implementation on Windows. I have used theone on my own server machine not the HDInsight for Azure.
I have installed it, and did the example that comes with the CTP. Ok, it worked, but Iwanted to upload some of my own data and create a hadoop table on my csv ‘big’data.
I have inspected the scripts from the example and found out, that HDInsight is just astructure of data files. So uploading the files is necessary.
The right thing to use for any action in HDInsight is the Hadoop console, the link is onthe desktop.
To upload data one should in the first step create a directory for the data. Thiscan simply be done by using the hadoop command line tool in the console.
hadoop dfs -mkdir /test/data
To upload data:
hadoop dfs -copyFromLocal C:\temp\testdata.csv /test/data/testdata.csv
Get rid of the data again?
hadoop dfs -rmr -skipTrash /test
Be careful, -rmr means recursive, so it kills also all subdirectories and files inthere. And… -skipTrash means that is skips the trash, so it is gone immediatelywith no way to get it back.
For better use of the commands I have changed some of the CTP examples powershellscripts to be more generic. The powershell must be called in the hadoop commandwindow. The powershell syntax there in General
powershell -ExecutionPolicy unrestricted –F <SCRIPTNAME>.ps1 <ARGUMENTS>…
I have created three powershell scripts, create_Directory.ps1, import_Data.ps1 andremove_Directory.ps1.
powershell -ExecutionPolicy unrestricted –F create_Directory.ps1 <DIRECTORY>
<DIRECTORY> can be /test, /test/data, test/data/whatever… and so on.
powershell -ExecutionPolicy unrestricted –F import_Data.ps1 <SourceDirectory> <SourceFile> <targetDirectory> <targetFile>
powershell -ExecutionPolicy unrestricted –F remove_Directory.ps1 <DIRECTORY>
Be careful, the calls inside are recursive and with skip trash.
The powershell scripts are attached. Have fun :-)
Perhaps I do another blog post on how to create a simple table out of the imported data….