Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 6, 2013

Hadoop MapReduce: to Sort or Not to Sort

Filed under: Hadoop,MapReduce,Sorting — Patrick Durusau @ 7:22 pm

Hadoop MapReduce: to Sort or Not to Sort by Tendu Yogurtcu.

From the post:

What is the big deal about Sort? Sort is fundamental to the MapReduce framework, the data is sorted between the Map and Reduce phases (see below). Syncsort’s contribution allows native Hadoop sort to be replaced by an alternative sort implementation, for both Map and Reduce sides, i.e. it makes Sort phase pluggable.

MapReduce

Opening up the Sort phase to alternative implementations will facilitate new use cases and data flows in the MapReduce framework. Let’s look at some of these use cases:

The use cases include:

  • Optimized sort implementations.
  • Hash-based aggregations.
  • Ability to run a job with a subset of data.
  • Optimized full joins.

See Tendu’s post for the details.

I first saw this at Use Cases for Hadoop’s New Pluggable Sort by Alex Popescu.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress