Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

August 5, 2012

More Fun with Hadoop In Action Exercises (Pig and Hive)

Filed under: Hadoop,Hive,MapReduce,Pig — Patrick Durusau @ 3:50 pm

More Fun with Hadoop In Action Exercises (Pig and Hive) by Sujit Pal.

From the post:

In my last post, I described a few Java based Hadoop Map-Reduce solutions from the Hadoop in Action (HIA) book. According to the Hadoop Fundamentals I course from Big Data University, part of being a Hadoop practioner also includes knowing about the many tools that are part of the Hadoop ecosystem. The course briefly touches on the following four tools – Pig, Hive, Jaql and Flume.

Of these, I decided to focus (at least for the time being) on Pig and Hive (for the somewhat stupid reason that the HIA book covers these too). Both of these are are high level DSLs that produce sequences of Map-Reduce jobs. Pig provides a data flow language called PigLatin, and Hive provides a SQL-like language called HiveQL. Both tools provide a REPL shell, and both can be extended with UDFs (User Defined Functions). The reason they coexist in spite of so much overlap is because they are aimed at different users – Pig appears to be aimed at the programmer types and Hive at the analyst types.

The appeal of both Pig and Hive lies in the productivity gains – writing Map-Reduce jobs by hand gives you control, but it takes time to write. Once you master Pig and/or Hive, it is much faster to generate sequences of Map-Reduce jobs. In this post, I will describe three use cases (the first of which comes from the HIA book, and the other two I dreamed up).

More useful Hadoop exercise examples.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress