If like me you are a .Net developer and have written some Streaming jobs it is not immediately obvious how one can do any reporting. However if you dig through the Streaming Documentation you will come across this in the FAQs:
How do I update counters in streaming applications? A streaming process can use the stderr to emit counter information.reporter:counter:<group>,<counter>,<amount> should be sent to stderr to update the counter.
So this does provide an easy mechanism to provide feedback from a running streaming job.
If you take my last binary streaming post, in running the code one has no idea of how many Microsoft Word, PDF, or Unknown documents have been processed.
Thus using the counter output format, one can define a simple counterReporter function:
One can then easily report on documents processed using the following slight code modification:
Thus we update the Group “Documents Processed”, with the document type, each time we process a document. Looking at the Hadoop job log we can now see:
All nice and easy.
If you want to do some error reporting the process is the same just with a different string format.