While working with Map/Reduce jobs in Hadoop, it is very much possible that you have got “sorted data” stored in HDFS. As you may know the “Sort function” exists not only after map process in map task but also with merge process during reduce task, so having sorted data to sort again would be a big performance overhead. In this situation you may want to have your Map/Reduce job not to sort the data.
Note: If you have tried changing map.sort.class to no-op, it would haven’t work as well.
So the question comes:
So if you do not need result be sorted the following Hadoop patch would be great place to start:
Note: Before using above Patch the I would suggest reading the following comment from Robert about this patch:
Keyword: Hadoop, Map/Reduce, Jobs Performance, Hadoop Patch