Rapid hadoop development with progressive testing by Abe Gong.
From the post:
Debugging Hadoop jobs can be a huge pain. The cycle time is slow, and error messages are often uninformative — especially if you’re using Hadoop streaming, or working on EMR.
I once found myself trying to debug a job that took a full six hours to fail. It took more than a week — a whole week! — to find and fix the problem. Of course, I was doing other things at the same time, but the need to constantly check up on the status of the job was a huge drain on my energy and productivity. It was a Very Bad Week.
Painful experiences like this have taught me to follow a test-driven approach to hadoop development. Whenever I’m working on a new hadoop-based data pipe, my goal is to isolate six distinct kinds of problems that arise in hadoop development.
(…)
See Abe’s post for the six steps and suggestions for how to do them.
Reformatted a bit with local tool preferences, Abe’s list will make a nice quick reference for Hadoop development.