Carl Nolan’s ramblings on development
To end the week I decided to make a minor change to the “Generics based Framework for .Net Hadoop MapReduce Job Submission”.
I have been doing some work on creating a co-occurrence matrix for item recommendations. I was going to map the process to a MapReduce job(s), then came across the issue of how I would output the vector data from the reducer. In the current framework the reducer outputs the key/value data in a string format. This works fine for simple data but for a vector this quickly becomes problematic.
To resolve this I have enabled a parameter called “outputFormat”. The default output will be the usual string format; optionally specified with the parameter value “Text”. Additionally a parameter value of “Binary” is supported:
MSDN.Hadoop.Submission.Console.exe -input "mobile/data" -output "mobile/querytimes" -mapper "MSDN.Hadoop.MapReduceFSharp.MobilePhoneQueryMapper, MSDN.Hadoop.MapReduceFSharp" -reducer "MSDN.Hadoop.MapReduceFSharp.MobilePhoneQueryReducer, MSDN.Hadoop.MapReduceFSharp" -outputFormat Binary -file "C:\Projects\Release\MSDN.Hadoop.MapReduceFSharp.dll"
When the output format is specified as binary the reducer value is output as a binary serialized version of the data, represented as a Base64 string. Reading the reduced output one can then easily serialize this object back into a .Net type:
Hopefully one will find this a lot simpler than performing string manipulations.