As a member of the HDInsight team I worked a bit on Hadoop code on Windows and contributed a couple of JIRA's there (JIRA is a bug tracking system Apache uses - contributing code usually involves filing JIRA's and posting patches there). Even if you don't want to contribute to Hadoop code (though it's fun), it's useful to be able to dive into the Hadoop code when you're running your Map-Reduce jobs against it to see what's going on under the hood and maybe debug problems. Anyway, whatever your reasons may be, in this post I'll guide you through how I personally setup my development environment for working on Hadoop on Windows in PowerShell.
After hopefully the preliminaries went off without a hitch, now we get to the slightly more exciting parts. We'll mostly be using a PowerShell script I wrote and pasted in PoshCode here. The script is meant to be modified and has a lot of hard-coded assumptions so you may want to get familiar with it, but if you're lucky you may be able to just follow along below without looking at it.
So now if you followed all these steps, you should have four PowerShell windows open running the four main Hadoop processes: ResourceManager and NodeManager for Yarn, and Namenode and Datanode for DFS. If after the last step you don't have that, I put the logs by default in "C:\YarnSingleNode\logs" so look through those.
A great way to edit and work with the Hadoop code is to use Eclipse. If you want to do that, it should be fairly simple:
mvn '-D=eclipse.workspace="C:\HadoopTrunk\Workspace"' eclipse:configure-workspace
This should all the code in Eclipse and mostly building and navigable. I personally had to do the following steps to get hadoop-common test code to compile as well:
* You may be able to get away with running as normal user, but I ran into trouble compiling Hadoop because it creates the hadoop.dll file with wrong permissions and then can't package it. If you know why that is and can work around it please do post a comment.