Creating a Cluster for HadoopOnAzure CTP

Small Bites of Big Data

Cindy Gross, SQLCAT PM

UPDATED Jun 2013: HadoopOnAzure CTP has been replaced with HDInsight Preview which has a different interface: Getting Started with Windows Azure HDInsight Service http://www.windowsazure.com/en-us/manage/services/hdinsight/get-started-hdinsight/

Are you ready to dive into this “Big Data” thing you keep hearing about? A good way to get started without having to understand and install all the HDFS, MapReduce, and other pieces yourself is to join the Hadoop CTP. It’s a very popular program so once you sign up you may have a short wait before you are given an account. Any cluster you create is time-bombed to free up unused resources for other eager CTP participants. This post will assume you’ve been granted a CTP account to use on https://www.hadooponazure.com/.

1)      Sign in to your Hadoop on Azure account. Go to https://www.hadooponazure.com/ and click on the “Sign in” button.

 

2)      If you don’t currently have a cluster allocated, you will be taken to a screen where you can request one.  Note that the saying at the top of the screen will change randomly each time you navigate to a screen.

  • Choose a unique name for your cluster and it will append .cloudapp.net.  My cluster is called cgross.cloudapp.net.
  • If you are just playing around be respectful of the fact that this is a very popular CTP and choose the “Small” cluster size (currently 4 nodes, 2 TB disk space).
  • Choose a Cluster login and password. I use a different login than my cluster name, cgross1.
  • You can choose to use SQL Azure for your Hive Metastore but I’m going to skip that for the sake of simplicity.
  • Choose “Request cluster”

 

3)      It will take several minutes for your cluster VMs to be created and allocated.

 4)      After a few minutes you will see a message that your cluster is ready for use. In the upper left you can see how long you have before the cluster expires. When you reach about 6 hours left you can choose the “Extend now” button to keep your cluster longer. You can also choose to “Release cluster” when you are finished to free up those resources for other CTP participants.

   

Explore the various tiles and what you can do in each. You may want to start with the “Downloads” tile and go through the “How-To and FAQ”. If you are idle for a while you will be logged out of the site, but this as long as the cluster has not expired it will all still be there once you log back in. The “Job History” stays around across cluster builds, it is not cleared out when a cluster is released or expired.

I hope you’ve enjoyed this small bite of big data! Look for more blog posts soon on the samples and other activities.

Note: the CTP and TAP programs are available for a limited time. Details of the usage and the availability of the CTP may change rapidly.