Carl Nolan’s ramblings on development
Some recent changes made to the “Generics based Framework for .Net Hadoop MapReduce Job Submission” code were to support Json and Binary Serialization from Mapper, in and out of Combiners, and out from the Reducer. However, this precluded one from controlling the format of the Text output. Say one wanted to create a tab delimited string from the Reducer. This could only be done using Json Serialization. To better support allowing one to construct the final text output I have created a new TextOutput type.
This TextOutput type is simple in structure. However, when this type is encountered during the serialization process, both Json and Binary serialization are bypassed and the text is written out in its raw format; including tabs and other characters usually escaped by the Json serializer.
As an example here is a modified version of one of the C# Reducer samples that supports both Json and Text output:
For the sample Reducer above the Json serialization output would be:
Android {"MaxTime":"PT23H59M54S","MinTime":"PT6S"} RIM OS {"MaxTime":"PT23H59M58S","MinTime":"PT1M7S"} Unknown {"MaxTime":"PT23H52M36S","MinTime":"PT36S"} Windows Phone {"MaxTime":"PT23H55M17S","MinTime":"PT32S"} iPhone OS {"MaxTime":"PT23H59M50S","MinTime":"PT1S"}
The corresponding Text Output would be:
Android (0:00:00:06.0000000, 0:23:59:54.0000000) RIM OS (0:00:01:07.0000000, 0:23:59:58.0000000) Unknown (0:00:00:36.0000000, 0:23:52:36.0000000) Windows Phone (0:00:00:32.0000000, 0:23:55:17.0000000) iPhone OS (0:00:00:01.0000000, 0:23:59:50.0000000)
As mentioned the actual definition of the TextOutput type is simple and is just a wrapper over a string, although depending on needs this may change:
One of the main rationales for adding the TextOutput support is so that data output by the framework can be easily used by Hive CREATE TABLE statements.
Hope you find this change useful.