Cindy Gross: Small Bites of Big Data, Small Data, All Data

Small Bites of Big Data, Small Data, All Data for Hadoop, SQL Server, Hive, Distributed Systems, Scale Out....

Sample PowerShell Script: HDInsight Custom Create

Sample PowerShell Script: HDInsight Custom Create

Rate This
  • Comments 10

This is a working script I use to create various HDInsight clusters. For a really reproducible, automated environment you would want to put this into a .ps1 script that accepts parameters (see here for an example). However, you may find the method below good for learning and experimenting. Replace all the “YOURxyz” sections with your actual information. Beware of oddities introduced by cut/paste such as spaces being replaced by line breaks or quotes being replaced by smart quotes. The # is a comment, some commands that you rarely run are commented out so remove the # to run them if you need them.

# This PowerShell script is meant to be a cut/paste of specific parts, it is NOT designed to be run as a whole.

# Do once after you install the cmdlets
#Get-AzurePublishSettingsFile
#Import-AzurePublishSettingsFile C:\Users\YOURDirectory\Downloads\YOURName-credentials.publishsettings

# Use if you admin more than one subscription
#Get-AzureAccount # This may be needed to log in to Azure
Select-AzureSubscription –SubscriptionName YOURSubscription
Get-AzureSubscription -Current

# Many things are easier in the ISE
ise

###############################################
            ### create clusters ###          
###############################################

# Add your specific information here
# Previous failures may make a name unavailable for a while – check to see if previous cluster was partially created
$ClusterName = "YOURNewHDInsightClusterName" #the name you will give to your cluster
$Location = "YOURDataCenter" #cluster data center must be East US, West US, or North Europe (as of December 2013)
$NumOfNodes = 1 #start small
$StorageAcct1 = "YOURExistingStorageAccountName" #currently must be in same data center as the cluster
$DefaultContainer = "YOURExistingContainerName" #already exists on the storage account

# These variables are automatically set for you
$FullStorage1 = "${StorageAcct1}.blob.core.windows.net"
$Key1 = Get-AzureStorageKey $StorageAcct1 | %{ $_.Primary }
$SubID = Get-AzureSubscription -Current | %{ $_.SubscriptionId }
$SubName = Get-AzureSubscription -Current | %{ $_.SubscriptionName }
$Cert = Get-AzureSubscription -Current | %{ $_.Certificate }
$Creds = Get-Credential -Message "New admin account to be created for your HDInsight cluster" #this prompts you

###############################################
# Sample quick create
###############################################
# Equivalent of quick create
# The ` specifies that the cmd continues on the next line, beware of artifical line breaks added during cut/paste from the blog
New-AzureHDInsightCluster -Name $ClusterName -ClusterSizeInNodes $NumOfNodes -Subscription $SubID -Location "$Location" `
-DefaultStorageAccountName $FullStorage1 -DefaultStorageAccountKey $Key1 -DefaultStorageContainerName $DefaultContainer -Credential $Creds

###############################################
# Sample custom create
###############################################
#https://hadoopsdk.codeplex.com/wikipage?title=PowerShell%20Cmdlets%20for%20Cluster%20Management
# Most params are the same as quick create, use a new cluster name
# Pass in a 2nd storage account, a SQLAzure db for the metastore (assume same db for Oozie and Hive), add Avro library, some config values
# Execute all the variable settings from above

# This value is set for you, don't change!
$configvalues = new-object 'Microsoft.WindowsAzure.Management.HDInsight.Cmdlet.DataObjects.AzureHDInsightHiveConfiguration'

# Add your specific information here
$ClusterName = "YOURNewHDInsightClusterName"
$StorageAcct2 = "YOURExistingStorageAccountName2"
$MetastoreAzureSQLDBName = "YOURExistingSQLAzureDBName"
$MetastoreAzureServerName = "YOURExistingSQLAzureServer.database.windows.net" #gives a DNS error if you don't use the full name
$configvalues.Configuration = @{ “hive.exec.compress.output”=”true” }  #this is an example of a config value you may pass in

# These variables are automatically set for you
$FullStorage2 = "${StorageAcct2}.blob.core.windows.net"
$Key2 = Get-AzureStorageKey $StorageAcct2 | %{ $_.Primary }
$MetastoreCreds = Get-Credential -Message "existing id/password for your SQL Azure DB (metastore)" #This prompts for the existing id and password of your existing SQL Azure DB

# Add a config file value
# Add AVRO SerDe libraries for Hive (on storage 1)
$configvalues.AdditionalLibraries = new-object 'Microsoft.WindowsAzure.Management.HDInsight.Cmdlet.DataObjects.AzureHDInsightDefaultStorageAccount'
$configvalues.AdditionalLibraries.StorageAccountName = $FullStorage1
$configvalues.AdditionalLibraries.StorageAccountKey = $Key1
$configvalues.AdditionalLibraries.StorageContainerName = "hivelibs" #container called hivelibs must exist on specified storage account
# Create custom cluster
New-AzureHDInsightClusterConfig -ClusterSizeInNodes $NumOfNodes `
    | Set-AzureHDInsightDefaultStorage -StorageAccountName $FullStorage1 -StorageAccountKey $Key1 -StorageContainerName $DefaultContainer `
    | Add-AzureHDInsightStorage -StorageAccountName $FullStorage2 -StorageAccountKey $Key2 `
    | Add-AzureHDInsightMetastore -SqlAzureServerName $MetastoreAzureServerName -DatabaseName $MetastoreAzureSQLDBName -Credential $MetastoreCreds -MetastoreType OozieMetastore `
    | Add-AzureHDInsightMetastore -SqlAzureServerName $MetastoreAzureServerName -DatabaseName $MetastoreAzureSQLDBName -Credential $MetastoreCreds -MetastoreType HiveMetastore `
    | Add-AzureHDInsightConfigValues -Hive $configvalues `
    | New-AzureHDInsightCluster -Subscription $SubID -Location "$Location" -Name $ClusterName -Credential $Creds

###############################################
# get status, properties, etc.
###############################################
#$SubName = $SubID = Get-AzureSubscription -Current | %{ $_.SubscriptionName }
Get-AzureHDInsightProperties -Subscription $SubName
Get-AzureHDInsightCluster -Subscription $SubName
Get-AzureHDInsightCluster -Subscription $SubName -name YOURClusterName

###############################################
# remove cluster
###############################################
#Remove-AzureHDInsightCluster -Name $ClusterName -Subscription $SubName

  • "Set-AzureSubscription –SubscriptionName YOURSubscription" should be "Select-AzureSubscription –SubscriptionName YOURSubscription"

  • Thanks, I fixed the set vs. select issue.

  • Cindy, I tried this method of creating a cluster. It works fine. But it creates a cluster of Version 2.1. Can you tell me hot to create a version 3.0 cluster?

  • Cindy, I tried this method of creating a cluster. It works fine. But it creates a cluster of Version 2.1. Can you tell me hot to create a version 3.0 cluster?

  • Jigyashu - type help New-AzureHDInsightCluster and you will see that one of the parameters is -Version. In the parameters section add $Version = "3.0". Then in the create statement add -Version $Version. For example: New-AzureHDInsightCluster -Name $ClusterName -Version $Version -ClusterSizeInNodes $NumOfNodes -Location "$Location" `

    -DefaultStorageAccountName $FullStorage1 -DefaultStorageAccountKey $Key1 -DefaultStorageContainerName $DefaultContainer -Credential $Creds

  • It seems like the section where you build up the $configValues section that enables Avro in the hivelibs container doesn't work when you build a version 3.0 cluster.

    The script runs fine until you add -Version "3.0" to the New-AzureHDInsightCluster command.  If you Remove the Add-AzureHDInsightConfigValues -Hive $configvalues.  The complex build with version 3 works fine.

    Any ideas what's broken here?  The error messages are not helpful at all.

  • When running the sample quick create script above have you or anyone ever receive the error below? If so, how did you resolve it?

    New-AzureHDInsightCluster : Unable to complete the cluster create operation. Operation failed with code '400'. Cluster left behind state: 'Error'. Message: 'ClusterUserNameInvalid'

    Thanks.

  • The sample script doesn't check validity of input. It is quite likely your name is already in use by another subscriber (it is unique within azuruehdinsight.net). Also make sure your password meets requirements. Go into the Azure portal and see if the failed cluster is still there and if there are associated error messages. Also try using the exact same values in the GUI and see if it gives you any more information.

  • I am trying to provision HBase cluster of version > 3.0 programmatically, as Hive query runs properly against HBase Version 3.1.2.438.

    Now if use the below command to setup HDInsight cluster of cluster type HBase, it throws up error message. But manual provision via Azure Portal is able to select right version. Am I missing something in the below command?

    New-AzureHDInsightCluster -Name $HBaseClusterName -Location $Location -ClusterType HBase -ClusterSizeInNodes $HBaseClusterSize -Version 3.1 -Credential $PsCredential -DefaultStorageAccountName "$StorageAccountName.blob.core.windows.net" -DefaultStorageAccountKey $storageAccountKey -DefaultStorageContainerName $ContainerName -ErrorAction Stop -Verbose | Out-Null

    Error Message : Cannot create a HBase cluster with version '3.1'. HBase cluster only supported after version 3.0.

    If I pass version parameter as 3.1.2.438

    Error Message: Cannot create a cluster with version '3.1.2.438'. Available Versions for your subscription are: 1.6,2.1,3.0,3.1

  • Version is a string, use "3.1" in quotes.

Page 1 of 1 (10 items)
Leave a Comment
  • Please add 3 and 7 and type the answer here:
  • Post