Carl Nolan’s ramblings on development
An updated version of this post can be found at:
If you have been using the Framework for Composing and Submitting .Net Hadoop MapReduce Jobs you may want to download an updated version of the code:
The biggest change in the latest code is the modification of the serialization mechanism. Formerly data was written out of the mapper, and combiner, as a string. This has now been changed to use a binary formatter. This means that the input into the mappers and reducers is no longer a string but rather an object; which can then be cast directly to the expected type. Here are the new Combiner and Reducer base classes:
Here is a sample of an implemented Map and Reduce types:
The changes are subtle but they simplify the processing in the combiner and reducer, removing the need for any string processing. To add the visibility of the data coming out of the reducer, a string format is still used.
The other change is around support for multiple key output from the mapper. Lets start with a sample showing how this is achieved:
Using multiple keys from the Mapper is a two-step process. Firstly the Mapper needs to be modified to output a string based key in the correct format. This is done by passing the set of string key values into the Utilities.FormatKeys() function. This concatenates the keys using the necessary tab character. Secondly, the job has to be submitted specifying the expected number of keys:
MSDN.Hadoop.Submission.Console.exe -input "stores/demographics" -output "stores/banking" -mapper "MSDN.Hadoop.MapReduceFSharp.StoreXmlElementMapper, MSDN.Hadoop.MapReduceFSharp" -reducer "MSDN.Hadoop.MapReduceFSharp.StoreXmlElementReducer, MSDN.Hadoop.MapReduceFSharp" -file "%HOMEPATH%\Projects\MSDN.Hadoop.MapReduce\Release\MSDN.Hadoop.MapReduceFSharp.dll" -nodename Store -format Xml -numberKeys 2
One final note, in the Document Classes folder there are two versions of the Streaming jar; one for running in azure and one for when running local. The difference is that they have been compiled with different version of Java. Just remember to use the appropriate version (dropping the –local and –azure prefixes) when copying to your Hadoop lib folder.
Hopefully you will find these changes useful.