This guide is intended to be an introduction to Crunch.


Crunch is used for processing data. Crunch builds on top of Apache Hadoop to provide a simpler interface for Java programmers to process data. In Crunch you create pipelines, not unlike Unix pipelines, such as the command below:

Interesting coverage of Crunch.

I don’t know that I agree with the characterization:

… using Hadoop …. require[s] learning a complex process called MapReduce or a higher level language such as Apache Hive or Apache Pig.

True, to use Hadoop means learning MapReduce or Hive or PIg but I don’t think of them as being all that complex. Besides, once you have learned them, the benefits are considerable.

