Carl Nolan’s ramblings on development
In my previous post I talked a little about testing the Hadoop Streaming F# MapReduce code; but it is worth saying a few words about the tester application.
The complete code for this blog post and the F# MapReduce code can be found at:
http://code.msdn.microsoft.com/Hadoop-Streaming-and-F-f2e76850
As mentioned Unit Testing the individual map and Reduce functions is relatively straight forward. However performing testing on sample input data is a little trickier. As such I put together a tester application that performs the following functions:
Running the Tester application allows one to check inputs and outputs, in a flow similar to running within Hadoop. The code listing for the tester application is:
The input options for the console tester application are:
To run the mapper and reducer A Process is defined. The ProcessStartInfo is defined such that both the Standard Input and Outputs are redirected.
To input data into the mapper one just has to open the file and pass into the mapper StandardInput.
The important thing to remember is that one needs to process the StandardOuput from the mapper on a separate thread. This is achieved using a Task:
Before waiting for the mapper executable to exit one will need to Close() the mapper input stream, ensure that the task processing the StandardOuput is completed, and finally Close() the mapper output stream.
In the input arguments a temp directory is specified. It is this directory that is used to save the output of the mapper. It is this file that is then sorted using the key value:
Finally the output from the sort process is passed into the reducer. The process for calling the reducer executable is very similar to that of calling the mapper executable; including ensureing that StandardOuput is processed on a separate processing thread.
In writing the tester application the key factors in getting the processing working are to remember to process the StandardOutput on a separate processing thread and ensure the streams are closed in the correct order so one can determine the outcome of the mapper and reducer executable calls. Other than this the a tester for any MapReduce should be easy to write.