Microsoft Distribution of Apache Hadoop comes with Hive Support along with an Interactive Hive shell where users can run their Hive queries immediately without adding specific configuration. The Apache distribution running on Windows Azure has built in support to Hive.
What is Hive?
What is NOT Hive?
To learn more about Apache Hive please click here.
When you login to Windows Azure Hadoop Portal and have you cluster configured, you can launch Interactive Hive Just by select “Interactive JavaScript/Hive” tile as below:
Next you can just select “Interactive Hive” to open the web based shell and start writing Hive Queries instantly as below:
The Interactive Hive Shell will look like as below:
Alternatively you can remote login to your Hadoop custer Head node and launch the Hadoop Command Shell to launch Hive Queries directly as below:
Here are a few examples of Hive Queries:
Show tables;
hivesampletable page_view page_view_asc
Hive history file=C:\Apps\dist\logs\history/hive_job_log_avkash_201201100621_493568624.txt OK Time taken: 3.265 seconds
DESCRIBE page_view;
viewtime int userid bigint page_url string referrer_url string ip string IP Address of the User dt string country string
Hive history file=C:\Apps\dist\logs\history/hive_job_log_avkash_201201100555_19122541.txt OK Time taken: 3.64 seconds
DESCRIBE EXTENDED page_view;
viewtime int userid bigint page_url string referrer_url string ip string IP Address of the User dt string country string Detailed Table Information Table(tableName:page_view, dbName:default, owner:avkash, createTime:1326174755, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:viewtime, type:int, comment:null), FieldSchema(name:userid, type:bigint, comment:null), FieldSchema(name:page_url, type:string, comment:null), FieldSchema(name:referrer_url, type:string, comment:null), FieldSchema(name:ip, type:string, comment:IP Address of the User), FieldSchema(name:dt, type:string, comment:null), FieldSchema(name:country, type:string, comment:null)], location:hdfs://10.28.202.165:9000/hive/warehouse/page_view, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:dt, type:string, comment:null), FieldSchema(name:country, type:string, comment:null)], parameters:{transient_lastDdlTime=1326174755, comment=This is the page view table}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Resources:
Nice writeup. Unfortunately the wiki link at the end is broken :(
thanks,
Can I create a job from hive script too?
Can you please write an example?