Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 12, 2013

Rapid hadoop development with progressive testing

Filed under: Hadoop,MapReduce — Patrick Durusau @ 3:45 pm

Rapid hadoop development with progressive testing by Abe Gong.

From the post:

Debugging Hadoop jobs can be a huge pain. The cycle time is slow, and error messages are often uninformative — especially if you’re using Hadoop streaming, or working on EMR.

I once found myself trying to debug a job that took a full six hours to fail. It took more than a week — a whole week! — to find and fix the problem. Of course, I was doing other things at the same time, but the need to constantly check up on the status of the job was a huge drain on my energy and productivity. It was a Very Bad Week.

crushed by elephant

Painful experiences like this have taught me to follow a test-driven approach to hadoop development. Whenever I’m working on a new hadoop-based data pipe, my goal is to isolate six distinct kinds of problems that arise in hadoop development.

(…)

See Abe’s post for the six steps and suggestions for how to do them.

Reformatted a bit with local tool preferences, Abe’s list will make a nice quick reference for Hadoop development.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress