Welcome to today's Article Spotlight!
Check out the full version of the article here:
Hadoop-based Services For Windows
This blog post is a preview of the content in that article (you'll find 3-5 times more information in the TNWiki article). The article (and many others about Hadoop) is written by Wesley McSwain, SQL Server technical writer.
Apache Hadoop is an open source software framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It consists of two primary components: Hadoop Distributed File System (HDFS), a reliable and distributed data storage, and MapReduce, a parallel and distributed processing system. A Hadoop cluster can be made up of a single node or thousands.
HDFS is the primary distributed storage used by Hadoop applications. As you load data into a Hadoop cluster, HDFS splits up the data into blocks/chunks and creates multiple replicas of blocks and distributes them across the nodes of the cluster to enable reliable and extremely rapid computations.
The links in this section provide information on deploying Apache Hadoop to Microsoft Windows Platforms. All these articles are on TechNet Wiki:
This section contains information on using Hadoop with other BI technologies. All these articles are on TechNet Wiki:
This section contains a list of Hadoop-related how-to articles. All these articles are on TechNet Wiki:
Check out the article and add to it here (it's a lot bigger than the sections I featured in this blog post):
Jump on in. The Wiki is warm!
- User Ed