How to Install the PowerShell Cmdlets for Apache™ Hadoop™-based Services for Windows
Small Bites of Big Data
Cindy Gross, SQLCAT PM
UPDATED JUNE 2013 - The very early version of PowerShell cmdlets I discussed below have been replaced - see Managing Your HDInsight Cluster with PowerShell http://blogs.msdn.com/b/carlnol/archive/2013/06/07/managing-your-hdinsight-cluster-with-powershell.aspx
We have a cool new addition to Microsoft’s Apache™ Hadoop™-based Services for Windows – new PowerShell cmdlets! The initial readme.txt is not very detailed so I went through the installation and initial use step by step to help get your started. Enjoy this new way to remotely administer your Hadoop cluster!
1) Log in to your Windows client (the location where you plan to do remote administration of your Hadoop cluster) with a local administrator account.
2) Download the Hadoop PowerShell Cmdlets zip file from your Apache™ Hadoop™-based Services for Windows cluster: https://YourHadoopCluster.cloudapp.net/Home/Downloads(substitute your actual Hadoop Cluster name for “YourHadoopCluster”).
3) Unzip the files to C:\Windows\Microsoft.NET\Framework64\v4.0.30319 (assuming an x64 OS and an installation on C:).
4) Change PowerShell to use .NET 4.0 (by default it uses an older version that does not work with these cmdlets)
5) Install the SDK cmdlets
6) Open a new Windows PowerShell window (make sure you choose the x64 version on an x64 OS) and type
Verify it completes successfully.
7) Type a set of commands in the PowerShell window to verify the cmdlets work. For example set which cluster the other commands apply to (change to your actual names/password), check that the settings are correct, then get a file listing or the user directories:
Set-ClusterSettings -ClusterDns <cluster name> -UserName <username> -Password <password>Get-ClusterSettingsGet-FileListFromHdfs -HdfsPath hdfs:///user/
8) Once the cmdlets are installed and you have verified it is working you can add the SnapIn to the profile so you don’t have to add it every time you open PowerShell. Details on PowerShell profiles can be found here, a summary is below.
a. Open PowerShell with “Run as administrator” (required if you are changing the execution policy or creating a new profile)
b. Allow scripts you’ve self-created to be run without being signed (or you can choose to sign the ps1 you create later):
Set-Executionpolicy -ExecutionPolicy RemoteSigned
c. See if a Profile already exists: test-path $profile
d. If no Profile exists, make a new one that’s available to all users (this assumes x64):
new-item -path $env:windir\system32\WindowsPowerShell\v1.0\profile.ps1 -itemtype file -force
e. Edit the file you just created. Open Notepad with “Run as administrator”, paste in:
f. Optionally add the default Hadoop cluster (use your actual cluster name, username, password)
Set-ClusterSettings -ClusterDns <cluster name> -UserName <username> -Password <password>
g. Save as C:\Windows\system32\WindowsPowerShell\v1.0\profile.ps1 (choose “all files” so it does not append “.txt”).
h. The next time you open PowerShell you will not have to add the snapin or set your default cluster you want to manage.
The full list of cmdlets is in the readme.txt. As of this time the list includes:
Set-ClusterSettings -ClusterDns <hadoopcluster.cloudapp.net or hadoopcluster> -UserName <username> -Password <password>
New-HiveJob -Type query -Execute "SELECT * from HiveSampleTable limit 10;" [-Define <# delimited key=value pairs>]
New-MapReduceJarJob -Jar c:\apps\dist\hadoop-examples-1.1.0-SNAPSHOT.jar -Arguments "pi 16 100" [-Define <# delimited key=value pairs>]
New-MapReduceStreamingJob -Input "/example/data/gutenberg/davinci.txt" -Output "/example/data/streamingoutput/wc.txt" -Mapper cat.exe -Reducer wc.exe -File "hdfs:///example/apps/wc.exe,hdfs:///example/apps/cat.exe" [-Define <# delimited key=value pairs>]
Get-JobStatus -JobId <isotope-jobId>
Get-FileListFromHdfs -HdfsPath hdfs:///user/
Copy-File -LocalPath <file> -HdfsPath hdfs:///<path>
Get-JobHistoryList -NumberOfItems 2
Get-JobHistory -JobId <isotope-jobId>
Set-AsvCredentials -AccountName <accountname> -AccountKey <accountkey>
Set-S3Credentials -AccessKeyId <accesskeyid> -AccessKey <accesskey> -IsS3N <0|1>
I hope you’ve enjoyed this small bite of big data! Look for more blog posts soon on the samples and other activities.
Note: the CTP and TAP programs are available for a limited time. Details of the usage and the availability of the CTP may change rapidly.