How to Install the PowerShell Cmdlets for Apache™ Hadoop™-based Services for Windows

Small Bites of Big Data

Cindy Gross, SQLCAT PM

 

UPDATED JUNE 2013 - The very early version of PowerShell cmdlets I discussed below have been replaced - see Managing Your HDInsight Cluster with PowerShell http://blogs.msdn.com/b/carlnol/archive/2013/06/07/managing-your-hdinsight-cluster-with-powershell.aspx

We have a cool new addition to Microsoft’s Apache™ Hadoop™-based Services for Windows – new PowerShell cmdlets! The initial readme.txt is not very detailed so I went through the installation and initial use step by step to help get your started. Enjoy this new way to remotely administer your Hadoop cluster!

1)      Log in to your Windows client (the location where you plan to do remote administration of your Hadoop cluster) with a local administrator account.

2)      Download the Hadoop PowerShell Cmdlets zip file from your Apache™ Hadoop™-based Services for Windows cluster: https://YourHadoopCluster.cloudapp.net/Home/Downloads(substitute your actual Hadoop Cluster name for “YourHadoopCluster”).

3)      Unzip the files to C:\Windows\Microsoft.NET\Framework64\v4.0.30319 (assuming an x64 OS and an installation on C:).

4)      Change PowerShell to use .NET 4.0 (by default it uses an older version that does not work with these cmdlets)

  • Open “Windows PowerShell” (choose the x64 version – the one that does not show (x86) after the name).
  • Type $pshome and capture the result such as “C:\Windows\System32\WindowsPowerShell\v1.0.”  This directory is for an x64 installation of PowerShell, the SysWOW64 directory is for x86/32bit.
  • Close PowerShell.
  • Open a dos-prompt or Windows Explorer with “Run as administrator” and navigate to the directory you found above (i.e. C:\Windows\system32\WindowsPowerShell\v1.0)
  • If no powershell.exe.config file exists create an empty file with that name in that directory. Edit powershell.exe.config to add the following section:
    <?xml version="1.0"?>
    <configuration>
     
    <startup useLegacyV2RuntimeActivationPolicy="true">
      <supportedRuntime version="v4.0.30319"/>
      <supportedRuntime version="v2.0.50727"/>
    </startup>
    <
    /configuration>

5)      Install the SDK cmdlets

  • Open a “Command Prompt” with the “Run as administrator” option. Go to your .NET 4.0 directory which will be something like C:\Windows\Microsoft.NET\Framework64\v4.0.30319.
  • Run:
    installutil.exe Isotope.Sdk.Cmdlets.dll
  • Review the output to verify there were no errors during the install (look for things like commit instead of rollback and no errors).

 6)      Open a new Windows PowerShell window (make sure you choose the x64 version on an x64 OS) and type 

            Add-PSSnapin IsotopeSdkPSSnapIn

Verify it completes successfully.

7)      Type a set of commands in the PowerShell window to verify the cmdlets work. For example set which cluster the other commands apply to (change to your actual names/password), check that the settings are correct, then get a file listing or the user directories:

Set-ClusterSettings -ClusterDns <cluster name> -UserName <username> -Password <password>
Get-ClusterSettings
Get-FileListFromHdfs -HdfsPath hdfs:///user/

 

8)      Once the cmdlets are installed and you have verified it is working you can add the SnapIn to the profile so you don’t have to add it every time you open PowerShell. Details on PowerShell profiles can be found here, a summary is below.

a.       Open PowerShell with “Run as administrator” (required if you are changing the execution policy or creating a new profile)

b.      Allow scripts you’ve self-created to be run without being signed (or you can choose to sign the ps1 you create later):

Set-Executionpolicy -ExecutionPolicy RemoteSigned

c.       See if a Profile already exists: test-path $profile

d.      If no Profile exists, make a new one that’s available to all users (this assumes x64):

new-item -path $env:windir\system32\WindowsPowerShell\v1.0\profile.ps1 -itemtype file -force

e.      Edit the file you just created. Open Notepad with “Run as administrator”, paste in:

Add-PSSnapin IsotopeSdkPSSnapIn

f.        Optionally add the default Hadoop cluster (use your actual cluster name, username, password)

Set-ClusterSettings -ClusterDns <cluster name> -UserName <username> -Password <password>

g.       Save as C:\Windows\system32\WindowsPowerShell\v1.0\profile.ps1 (choose “all files” so it does not append “.txt”).

h.      The next time you open PowerShell you will not have to add the snapin or set your default cluster  you want to manage.

 

The full list of cmdlets is in the readme.txt. As of this time the list includes:

Set-ClusterSettings -ClusterDns <hadoopcluster.cloudapp.net or hadoopcluster> -UserName <username> -Password <password>

Get-ClusterSettings

New-HiveJob -Type query -Execute "SELECT * from HiveSampleTable limit 10;" [-Define <# delimited key=value pairs>]

New-MapReduceJarJob -Jar c:\apps\dist\hadoop-examples-1.1.0-SNAPSHOT.jar -Arguments "pi 16 100" [-Define <# delimited key=value pairs>]

New-MapReduceStreamingJob -Input "/example/data/gutenberg/davinci.txt" -Output "/example/data/streamingoutput/wc.txt" -Mapper cat.exe -Reducer wc.exe -File "hdfs:///example/apps/wc.exe,hdfs:///example/apps/cat.exe" [-Define <# delimited key=value pairs>]

Get-JobStatus -JobId <isotope-jobId>

Get-FileListFromHdfs -HdfsPath hdfs:///user/

Copy-File -LocalPath <file> -HdfsPath hdfs:///<path>

Get-JobHistoryList -NumberOfItems 2

Get-JobHistory -JobId <isotope-jobId>

Get-JobHistoryCount

Set-AsvCredentials -AccountName <accountname> -AccountKey <accountkey>

Get-AsvCredentials

Remove-AsvCredentials

Get-S3Credentials

Set-S3Credentials -AccessKeyId <accesskeyid> -AccessKey <accesskey> -IsS3N <0|1>

Remove-S3Credentials

 

I hope you’ve enjoyed this small bite of big data! Look for more blog posts soon on the samples and other activities.

Note: the CTP and TAP programs are available for a limited time. Details of the usage and the availability of the CTP may change rapidly.