Cloud9: a MapReduce library for Hadoop
From the website:
Cloud9 is a MapReduce library for Hadoop designed to serve as both a teaching tool and to support research in data-intensive text processing. MapReduce is a programming model for expressing distributed computations on massive datasets and an execution framework for large-scale data processing on clusters of commodity servers. Hadoop provides an open-source implementation of the programming model. The library itself is available on github and distributed under the Apache License.
See Data-Intensive Text Processing with MapReduce by Lin and Dyer for more details on MapReduce.
Guide to using the Cloud9 library, including its use on particular data sets, such as Wikipedia.