Last week at the Hadoop Summit we were able to connect with lots of fellow Big Data enthusiasts.  Since many weren't able to attend the summit in person, Lara Rubbelke, Community Principal Program Manager for Microsoft, shared her experience with us.  Read the post below for her recap, and if you are on Twitter, make sure you follow Lara at @SQLGal

What a great week at the Hadoop Summit!  Hortonworks and Yahoo! partnered to deliver the 5th annual community event with over 2,200 attendees in San Jose, CA.  During the two-day event there were over 70 sessions focusing on Apache Hadoop with topics ranging from implementation, reference architecture, and integration with Business Intelligence to futures for Hadoop and the Hadoop ecosystem. 

The event kicked off with a keynote from Hortonworks featuring Rob Bearden (Hortonworks CEO), Shaun Connelly (Hortonworks VP of Corporate Strategy) and Eric Baldeschwieier (Hortonworks CTO) discussing the importance of Hadoop Summit’s community approach and the huge opportunities and challenges around Hadoop and Big Data.  We are very proud and inspired to be a part of a community that is “shaping the world around us” as shared in the Hadoop Summit video.

We were honored to present two sessions at the Hadoop Summit. 

  • Our first session highlighted the progress we are making with key investments and contributions to support Hadoop.  Using the Common Crawl web corpus of 5 billion web pages we demonstrated how to write a search engine in under 5 minutes using Hadoop on the Azure platform.  Throughout the demo we highlighted the integration with System Center 2012 and Microsoft Business Intelligence, including Power View.  Of particular interest to the audience, we previewed a LINQ to Hive provider prototype using C# to query data stored in Hadoop. 
  • Microsoft's Hadoop Summit reference architecture session brought together Dave Mariani, VP Engineering at Klout, and Denny Lee to show how Hadoop and BI are better together.  In the session, we discussed how Klout uses Big Data to unify the social web from processing 15 social networks every day to having more than 54 Billion rows of data in the Klout Data Warehouse.  Yet, while we captured and stored all user and visitor events, it was important to support interactive queries.  To achieve this, we showed Klout's end-to-end architecture to process and query all of this data – with focuses on the Hive queries, Analysis Services and MDX queries, Scala/JavaScript API calls, and using Excel.

Our booth was buzzing with activity throughout the two day event.  Many attendees were interested in Hadoop on Azure, .NET integration, management through System Center and exploring data through Microsoft Business Intelligence. 

While the Hadoop Summit officially started on Wednesday June 13, the unofficial kickoff was on Tuesday night with the BigDataCamp led by Dave Neilson.  If you have never attended a BigDataCamp before, it is an unconference focused on Hadoop and Big Data where participants exchange ideas through lightening talks and breakout sessions.  One of the great parts of the BigDataCamp is the unpanel, where the moderator gathers questions and panelists from the attendees.  These are always lively debates and topics, and last week was no exception!  Visit the BigDataCamp website and watch for a BigDataCamp near you!