Designing algorithms for Map Reduce by Ricky Ho.
From the post:
Since the emerging of Hadoop implementation, I have been trying to morph existing algorithms from various areas into the map/reduce model. The result is pretty encouraging and I’ve found Map/Reduce is applicable in a wide spectrum of application scenarios.
So I want to write down my findings but then found the scope is too broad and also I haven’t spent enough time to explore different problem domains. Finally, I realize that there is no way for me to completely cover what Map/Reduce can do in all areas, so I just dump out what I know at this moment over the long weekend when I have an extra day.
Notice that Map/Reduce is good for “data parallelism”, which is different from “task parallelism”. Here is a description about their difference and a general parallel processing design methodology.
I’ll cover the abstract Map/Reduce processing model below. For a detail description of the implementation of Hadoop framework, please refer to my earlier blog here.
A bit dated (2010) but still worth your time.
I missed its initial appearance so appreciated Ricky pointing back to it in MapReduce: Detecting Cycles in Network Graph.
You may also want to consult: Designing good MapReduce algorithms by Jeffrey Ullman.