Cindy Gross: Small bites of Big Data, Small Data, All Data

Small bites of Big Data, Small Data, All Data for Hadoop, SQL Server, Hive, Distributed Systems, Scale Out....

Sample PowerShell Script: HDInsight Custom Create

Sample PowerShell Script: HDInsight Custom Create

Rate This
  • Comments 6

This is a working script I use to create various HDInsight clusters. For a really reproducible, automated environment you would want to put this into a .ps1 script that accepts parameters (see here for an example). However, you may find the method below good for learning and experimenting. Replace all the “YOURxyz” sections with your actual information. Beware of oddities introduced by cut/paste such as spaces being replaced by line breaks or quotes being replaced by smart quotes. The # is a comment, some commands that you rarely run are commented out so remove the # to run them if you need them.

# This PowerShell script is meant to be a cut/paste of specific parts, it is NOT designed to be run as a whole.

# Do once after you install the cmdlets
#Get-AzurePublishSettingsFile
#Import-AzurePublishSettingsFile C:\Users\YOURDirectory\Downloads\YOURName-credentials.publishsettings

# Use if you admin more than one subscription
#Get-AzureAccount # This may be needed to log in to Azure
Select-AzureSubscription –SubscriptionName YOURSubscription
Get-AzureSubscription -Current

# Many things are easier in the ISE
ise

###############################################
            ### create clusters ###          
###############################################

# Add your specific information here
# Previous failures may make a name unavailable for a while – check to see if previous cluster was partially created
$ClusterName = "YOURNewHDInsightClusterName" #the name you will give to your cluster
$Location = "YOURDataCenter" #cluster data center must be East US, West US, or North Europe (as of December 2013)
$NumOfNodes = 1 #start small
$StorageAcct1 = "YOURExistingStorageAccountName" #currently must be in same data center as the cluster
$DefaultContainer = "YOURExistingContainerName" #already exists on the storage account

# These variables are automatically set for you
$FullStorage1 = "${StorageAcct1}.blob.core.windows.net"
$Key1 = Get-AzureStorageKey $StorageAcct1 | %{ $_.Primary }
$SubID = Get-AzureSubscription -Current | %{ $_.SubscriptionId }
$SubName = Get-AzureSubscription -Current | %{ $_.SubscriptionName }
$Cert = Get-AzureSubscription -Current | %{ $_.Certificate }
$Creds = Get-Credential -Message "New admin account to be created for your HDInsight cluster" #this prompts you

###############################################
# Sample quick create
###############################################
# Equivalent of quick create
# The ` specifies that the cmd continues on the next line, beware of artifical line breaks added during cut/paste from the blog
New-AzureHDInsightCluster -Name $ClusterName -ClusterSizeInNodes $NumOfNodes -Subscription $SubID -Location "$Location" `
-DefaultStorageAccountName $FullStorage1 -DefaultStorageAccountKey $Key1 -DefaultStorageContainerName $DefaultContainer -Credential $Creds

###############################################
# Sample custom create
###############################################
#https://hadoopsdk.codeplex.com/wikipage?title=PowerShell%20Cmdlets%20for%20Cluster%20Management
# Most params are the same as quick create, use a new cluster name
# Pass in a 2nd storage account, a SQLAzure db for the metastore (assume same db for Oozie and Hive), add Avro library, some config values
# Execute all the variable settings from above

# This value is set for you, don't change!
$configvalues = new-object 'Microsoft.WindowsAzure.Management.HDInsight.Cmdlet.DataObjects.AzureHDInsightHiveConfiguration'

# Add your specific information here
$ClusterName = "YOURNewHDInsightClusterName"
$StorageAcct2 = "YOURExistingStorageAccountName2"
$MetastoreAzureSQLDBName = "YOURExistingSQLAzureDBName"
$MetastoreAzureServerName = "YOURExistingSQLAzureServer.database.windows.net" #gives a DNS error if you don't use the full name
$configvalues.Configuration = @{ “hive.exec.compress.output”=”true” }  #this is an example of a config value you may pass in

# These variables are automatically set for you
$FullStorage2 = "${StorageAcct2}.blob.core.windows.net"
$Key2 = Get-AzureStorageKey $StorageAcct2 | %{ $_.Primary }
$MetastoreCreds = Get-Credential -Message "existing id/password for your SQL Azure DB (metastore)" #This prompts for the existing id and password of your existing SQL Azure DB

# Add a config file value
# Add AVRO SerDe libraries for Hive (on storage 1)
$configvalues.AdditionalLibraries = new-object 'Microsoft.WindowsAzure.Management.HDInsight.Cmdlet.DataObjects.AzureHDInsightDefaultStorageAccount'
$configvalues.AdditionalLibraries.StorageAccountName = $FullStorage1
$configvalues.AdditionalLibraries.StorageAccountKey = $Key1
$configvalues.AdditionalLibraries.StorageContainerName = "hivelibs" #container called hivelibs must exist on specified storage account
# Create custom cluster
New-AzureHDInsightClusterConfig -ClusterSizeInNodes $NumOfNodes `
    | Set-AzureHDInsightDefaultStorage -StorageAccountName $FullStorage1 -StorageAccountKey $Key1 -StorageContainerName $DefaultContainer `
    | Add-AzureHDInsightStorage -StorageAccountName $FullStorage2 -StorageAccountKey $Key2 `
    | Add-AzureHDInsightMetastore -SqlAzureServerName $MetastoreAzureServerName -DatabaseName $MetastoreAzureSQLDBName -Credential $MetastoreCreds -MetastoreType OozieMetastore `
    | Add-AzureHDInsightMetastore -SqlAzureServerName $MetastoreAzureServerName -DatabaseName $MetastoreAzureSQLDBName -Credential $MetastoreCreds -MetastoreType HiveMetastore `
    | Add-AzureHDInsightConfigValues -Hive $configvalues `
    | New-AzureHDInsightCluster -Subscription $SubID -Location "$Location" -Name $ClusterName -Credential $Creds

###############################################
# get status, properties, etc.
###############################################
#$SubName = $SubID = Get-AzureSubscription -Current | %{ $_.SubscriptionName }
Get-AzureHDInsightProperties -Subscription $SubName
Get-AzureHDInsightCluster -Subscription $SubName
Get-AzureHDInsightCluster -Subscription $SubName -name YOURClusterName

###############################################
# remove cluster
###############################################
#Remove-AzureHDInsightCluster -Name $ClusterName -Subscription $SubName

  • "Set-AzureSubscription –SubscriptionName YOURSubscription" should be "Select-AzureSubscription –SubscriptionName YOURSubscription"

  • Thanks, I fixed the set vs. select issue.

  • Cindy, I tried this method of creating a cluster. It works fine. But it creates a cluster of Version 2.1. Can you tell me hot to create a version 3.0 cluster?

  • Cindy, I tried this method of creating a cluster. It works fine. But it creates a cluster of Version 2.1. Can you tell me hot to create a version 3.0 cluster?

  • Jigyashu - type help New-AzureHDInsightCluster and you will see that one of the parameters is -Version. In the parameters section add $Version = "3.0". Then in the create statement add -Version $Version. For example: New-AzureHDInsightCluster -Name $ClusterName -Version $Version -ClusterSizeInNodes $NumOfNodes -Location "$Location" `

    -DefaultStorageAccountName $FullStorage1 -DefaultStorageAccountKey $Key1 -DefaultStorageContainerName $DefaultContainer -Credential $Creds

  • It seems like the section where you build up the $configValues section that enables Avro in the hivelibs container doesn't work when you build a version 3.0 cluster.

    The script runs fine until you add -Version "3.0" to the New-AzureHDInsightCluster command.  If you Remove the Add-AzureHDInsightConfigValues -Hive $configvalues.  The complex build with version 3 works fine.

    Any ideas what's broken here?  The error messages are not helpful at all.

Page 1 of 1 (6 items)
Leave a Comment
  • Please add 1 and 7 and type the answer here:
  • Post