Big Data Support
This is the team blog for the Big Data Support team at Microsoft. We support HDInsight which is Hadoop running on Windows Azure in the cloud, as well as other big data features.
RSS for comments
RSS for posts
Search this blog
Search all blogs
Browse by Tags
Big Data Support
Tagged Content List
How to use parameter substitution with Pig Latin and PowerShell
When running Pig in a production environment, you'll likely have one or more Pig Latin scripts that run on a recurring basis (daily, weekly, monthly, etc.) that need to locate their input data based on when or where they are run. For example, you may have a Pig job that performs daily log ingestion by...
12 Aug 2014
HDInsight: - Creating, Deploying and Executing Pig UDF
During my developer experience, I always look for how customization (write my own processing) can be done if functionality is not available in programming language. That thought was triggered again when I was working on Apache Pig in HDInsight. So I started researching it and thought it would be good...
7 Jul 2014
How to use a Custom JSON Serde with Microsoft Azure HDInsight
I had a recent need to parse JSON files using Hive. There were a couple of options that I could use. One is using native Hive JSON function such as get_json_object and the other is to use a JSON Serde to parse JSON objects containing nested elements with lesser code. I decided to go with the second approach...
18 Jun 2014
Some Frequently Asked Questions on Microsoft Azure HDInsight
We have seen some common questions on HDInsight when interacting with customers and partners. On this blog post, we are going to help answer some of those common questions. 1. What is Microsoft Azure HDInsight? HDInsight is a Hadoop-based service from Microsoft that brings a 100 percent Apache...
22 May 2014
HDInsight News - New Videos to watch - HDInsight Provisioning demonstrations
Jason H (HDInsight)
Check out these two recent videos demos regarding HDInsight provisioning These videos complement the product documentation outlined at http://azure.microsoft.com/en-us/documentation/articles/hdinsight-get-started/#provision HDInsight is the name given to the Microsoft Azure service (in the Microsoft...
9 May 2014
HDInsight: - backup and restore hive table
Introduction My name is Sudhir Rawat and I work on the Microsoft HDInsight support team. In this blog I am going to explain the options for backing up and restoring a Hive table on HDInsight. The general recommendation is to store hive metadata on SQL Azure during provisioning the cluster. Sometimes...
1 May 2014
Start using flume with HDInsight by installing HDP 2.0 on Windows Azure Virtual Machine
After reading Greg's article Using apache flume with HDInsight I wanted to start to learn more about flume, but my Linux skills are none existent and currently flume is not included in HDInsight. For more information on HDInsight see Windows Azure HDInsight . For more information on Apache Flume see...
24 Apr 2014
Sliding Window Data Partitioning on Microsoft Azure HDInsight
HCatalog is a table and storage management layer for Hadoop that enables users with different data processing tools like Pig, Mapreduce, Hive, and Oozie to read and write data. HCatalog's table abstraction presents these tools and users with a relational view of data in the cluster. HCatalog Integration...
23 Apr 2014
Querying HDInsight Job Status with WebHCat via Native PowerShell or Node.js
One of the great things about HDInsight is that under the covers, it has the same capabilities as other Hadoop installations. This means that you can use regular Hadoop endpoints like Ambari and WebHCat (formerly known as Templeton) to interact with an HDInsight Cluster. In this blog post, I’ll...
22 Apr 2014
Customizing HDInsight Cluster provisioning
In my last blog , I discussed how we can specify Hadoop configurations for a job on an HDInsight cluster. At the end of that blog, I also dicussed the alternative approach where you may want to change certain hadoop configurations from default values and would like to preserve the changes throughout...
15 Apr 2014
Using Apache Flume with HDInsight
Gregory Suarez - MSFT
Gregory Suarez – 03/18/2014 (This blog posting assumes some basic knowledge of Apache Flume) Overview When asked if Apache Flume can be used with HDInsight, the response is typically no. We do not currently include Flume in our HDInsight service offering or in the HDInsight Server...
18 Mar 2014
How to pass Hadoop configuration values for a job on HDInsight
I came across the question a few times recently from several customers– "how do we pass hadoop configurations at runtime for a mapreduce job or Hive Query via HDInsight PowerShell or .Net SDK?" I thought of sharing the answer here with others who may run into the same question. It is pretty common...
13 Feb 2014
Structured vs Semi-structured Data
My name is Bill Carroll and I am a member of the Microsoft HDInsight support team. The majority of my working career has been spent on SQL Server, a relational database. Little did I think about it all these years, but relational databases are structured data. When we create a table we define the structure...
21 Jan 2014
How to add custom Hive UDFs to HDInsight
I recently had a need to add a UDF to Hive on HDInsight. I thought that it would be good to share that experience on a blog post. Hive provides a library of built-in functions to achieve the most common needs. The cool thing is that it also provides the framework to create your own UDF. I had a recent...
14 Jan 2014
Mount Azure Blob Storage as Local Drive
Gregory Suarez - MSFT
Gregory Suarez – 01/09/2014 I was recently working with a colleague of mine who submitted a MapReduce job via an HDInsight Powershell script and he needed a quick way to visually inspect the last several lines of the output after it had completed. He was looking for an easy and flexible way to...
9 Jan 2014
Getting started with Sqoop in HDInsight
My name is Farooq and I am with HDinsight support team here at Microsoft. In this blog I will try to give some brief overview of Sqoop in HDinsight and then use an example of importing data from a Windows Azure SQL Database table to HDInsight cluster to demonstrate how you can get stated with Sqoop in...
7 Jan 2014
Getting started with the HDInsight PowerShell tools and SDK
Hi, my name is Azim and I work on the Big Data Support Team at Microsoft. If you have had a chance to read an earlier post by Dharshana, you may have seen how we can submit Hive query using the HDInsight PowerShell tools. In this blog, we will cover some basics of the HDInsight PowerShell tools and SDK...
21 Nov 2013
Get Started with Hive on HDInsight
Hi, my name is Dharshana and I work on the Big Data Support Team at Microsoft. As covered in the earlier post by Dan from our team, HDInsight provides a very easy to use interface to provision a Hadoop cluster with a few clicks and interact with the cluster programmatically. In this blog post, we will...
11 Nov 2013
The HDInsight Support Team is Open for Business
Hi, my name is Dan and I work on the HDInsight Support Team at Microsoft. This week the Azure HDInsight Service reached the General Availability (GA) milestone and the HDInsight support team is officially open for business! Azure HDInsight is a 100% Apache compatible Hadoop distribution available...
1 Nov 2013
We’ll be at SQL Saturday BI Edition #237 in Charlotte, NC
We have several engineers attending and presenting at SQL Saturday BI Edition #237 this Saturday 10/19/2013 in Matthews, NC (Just down the road from Charlotte for those folks attending the PASS Summit ). Rick Hallihan and Bill Carroll will be presenting an overview of the Windows Azure HDInsight service...
18 Oct 2013
Page 1 of 1 (20 items)
© 2014 Microsoft Corporation.
Privacy & Cookies