Browse by Tags

Tagged Content List
  • Blog Post: Pushing Hadoop Cluster Configuration Changes using PowerShell

    In my previous post I talked about Implementing and Deploying Rack Awareness using PowerShell. However PowerShell is a great tool for not only managing things like Rack Awareness but for installing and managing the Hadoop cluster; especially for managing configuration changes, the focus of this post...
  • Blog Post: Deploying Hadoop Rack Awareness with PowerShell

    In a previous post I talked about Implementing Hadoop Rack Awareness with PowerShell . One thing I skimmed over in this post was how to deploy the necessary files to the cluster and make the configuration file changes. Once again PowerShell is your friend. Deploying this solution involves two processes...
  • Blog Post: Implementing Hadoop Rack Awareness with PowerShell

    This post walks-through building a PowerShell script for enabling Rack Awareness in Hadoop. While several example scripts can be found online for Linux, samples building a script for Windows is less common. Hadoop divides the data into multiple file blocks and stores them on different machines. By default...
  • Blog Post: Managing Your HDInsight Cluster using PowerShell – Update

    Since writing my last post Managing Your HDInsight Cluster and .Net Job Submissions using PowerShell , there have been some useful modifications to the Azure PowerShell Tools. The HDInsight cmdlets no longer exist as these have now been integrated into the latest release of the Windows Azure Powershell...
  • Blog Post: Managing Your HDInsight Cluster and .Net Job Submissions using PowerShell

    This post explains how best to manage an HDInsight cluster using a management console and Windows PowerShell. The goal is to outline how to create a simple cluster, provide a mechanism for managing an elastic service, and demonstrate how to customize the cluster creation. Before provisioning a cluster...
  • Blog Post: Managing Hive Job Submissions With PowerShell

    In my previous post, I talked about “ Managing Your HDInsight Cluster with PowerShell ”. In this post I made no mention of using Hive. I hope to re-address this balance by specifically talking about how you can submit Hive jobs from the same local management console. As before all the scripts mentioned...
  • Blog Post: Managing Your HDInsight Cluster with PowerShell

    An updated version of this post can be found here . This blog post provides a mechanism for managing an HDInsight cluster using a local management console through the use of Windows PowerShell. The goal is to outline how to configure the local management console, create a simple cluster, submit jobs...
  • Blog Post: Submitting Hadoop MapReduce Jobs using PowerShell

    As always here is a link to the “ Generics based Framework for .Net Hadoop MapReduce Job Submission ” code. In all the samples I have shown so far I have always used the command-line consoles. However this does not need to be the case, PowerShell can be used. The Console application which is used to...
  • Blog Post: Hive and XML File Processing

    When I put together the “ Generics based Framework for .Net Hadoop MapReduce Job Submission ” code one of the goals was to support XML file processing. This was achieved by the creation of a modified Mahout document reader where one can specify the XML node to be presented for processing. But what if...
  • Blog Post: Implementing a MapReduce Join with Hadoop and the .Net Framework

    I have often been asked how does one implement a Join whilst writing MapReduce code. As such, I thought it would be useful to add an additional sample demonstrating how this is achieved. There are multiple mechanisms one can employ to perform a Join operation, and the one to be discussed will be a Reduce...
  • Blog Post: Framework for .Net Hadoop MapReduce Job Submission V1.0 Release

    It has been a few months since I have made a change to the “ Generics based Framework for .Net Hadoop MapReduce Job Submission ” code. However I was going to put together a sample for a Reduce side join and came across a issue around the usage of partitioners. As such I decided to add support...
  • Blog Post: Framework for .Net Hadoop MapReduce Job Submission TextOutput Type

    Some recent changes made to the “ Generics based Framework for .Net Hadoop MapReduce Job Submission ” code were to support Json and Binary Serialization from Mapper, in and out of Combiners, and out from the Reducer. However, this precluded one from controlling the format of the Text output. Say one...
  • Blog Post: C# MapReduce Based Co-occurrence Item Based Recommender

    As promised, to conclude the Co-occurrence Approach to an Item Based Recommender posts I wanted to port the MapReduce code to C#; just for kicks and to prove the code is also easy to write in C#. For an explanation of the MapReduce post review the previous article: http://blogs.msdn.com/b/carlnol/archive...
  • Blog Post: MapReduce Based Co-occurrence Approach to an Item Based Recommender

    In a previous post I covered the basics for a Co-occurrence Approach to an Item Based Recommender . As promised, here is the continuation of this work, an implementation of the same algorithm using MapReduce. Before reading this post it will be worth reading the Local version as it covers the sample...
  • Blog Post: Framework for .Net Hadoop MapReduce Job Submission Json Serialization

    A while back one of the changes made to the “ Generics based Framework for .Net Hadoop MapReduce Job Submission ” code was to support Binary Serialization from Mapper, in and out of Combiners, and out from the Reducer. Whereas this change was needed to support the Generic interfaces there...
  • Blog Post: Framework for .Net Hadoop MapReduce Job Submission configuration update

    To better support configuring the Stream environment whilst running .Net Streaming jobs I have made a change to the “ Generics based Framework for .Net Hadoop MapReduce Job Submission ” code. I have fixed a few bugs around setting job configuration options which were being controlled by...
  • Blog Post: Framework for .Net Hadoop MapReduce Job Submission Binary Output

    To end the week I decided to make a minor change to the “ Generics based Framework for .Net Hadoop MapReduce Job Submission ”. I have been doing some work on creating a co-occurrence matrix for item recommendations. I was going to map the process to a MapReduce job(s), then came across...
  • Blog Post: Framework for .Net Hadoop MapReduce Job Submission libjars update

    If you have been using the “ Generics based Framework for .Net Hadoop MapReduce Job Submission ” you may want to download the latest version of the code. The previous version of the code, when processing XML and Binary files, was dependent on a custom streaming JAR that contained the necessary...
  • Blog Post: Hadoop .Net HDFS File Access (Revisited Archived)

    Updated post can be found here: http://blogs.msdn.com/b/carlnol/archive/2013/02/08/hdinsight-net-hdfs-file-access.aspx Provided with the Microsoft Distribution of Hadoop, in addition to the C library, a Managed C++ solution for HDFS file access is provided. This solution enables one to consume HDFS...
  • Blog Post: Generics based Framework for .Net Hadoop MapReduce Job Submission

    Over the past month I have been working on a framework to allow composition and submission of MapReduce jobs using .Net. I have put together two previous blog posts on this, so rather than put together a third on the latest change I thought I would create a final composite post. To understand why lets...
  • Blog Post: .Net Hadoop MapReduce Job Framework - Revisited (Archived)

    An updated version of this post can be found at: http://blogs.msdn.com/b/carlnol/archive/2012/04/29/generic-based-framework-for-net-hadoop-mapreduce-job-submission.aspx If you have been using the Framework for Composing and Submitting .Net Hadoop MapReduce Jobs you may want to download an updated...
  • Blog Post: Framework for Composing and Submitting .Net Hadoop MapReduce Jobs (Archived)

    An updated version of this post can be found at: http://blogs.msdn.com/b/carlnol/archive/2012/04/29/generic-based-framework-for-net-hadoop-mapreduce-job-submission.aspx If you have been following my blog you will see that I have been putting together samples for writing .Net Hadoop MapReduce jobs;...
  • Blog Post: Hadoop .Net HDFS File Access (Archived)

    Updated post can be found here: http://blogs.msdn.com/b/carlnol/archive/2013/02/08/hdinsight-net-hdfs-file-access.aspx If you grab the latest installment of Microsoft Distribution of Hadoop you will notice, in addition to the C library, a Managed C++ solution for HDFS file access. This solution now...
  • Blog Post: Hadoop Streaming in F# and MapReduce (summary)

    With all my recent posts around Hadoop Streaming I thought it would be useful to summarize them into a single post. The main objective of these posts was to put together a codebase to enable F# developers to write Map/Reduce libraries through a simple API. The full code posting can be found here: http...
  • Blog Post: Hadoop XML Streaming and F# MapReduce

    So, to round out the Hadoop Streaming samples I thought I would put together an XML Streaming sample. As always the code can be found here: http://code.msdn.microsoft.com/Hadoop-Streaming-and-F-f2e76850 XML Streaming Reader So how does one stream in XML? If you read the Hadoop Streaming documentation...
Page 1 of 2 (32 items) 12