Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 11, 2014

CUBE and ROLLUP:…

Filed under: Aggregation,Hadoop,Pig — Patrick Durusau @ 1:29 pm

CUBE and ROLLUP: Two Pig Functions That Every Data Scientist Should Know by Joshua Lande.

From the post:

I recently found two incredible functions in Apache Pig called CUBE and ROLLUP that every data scientist should know. These functions can be used to compute multi-level aggregations of a data set. I found the documentation for these functions to be confusing, so I will work through a simple example to explain how they work.

Joshua starts his post with a demonstration of using GROUP BY in Pig for simple aggregations. That sets the stage for demonstrating how important CUBE and ROLLUP can be for data aggregations in PIG.

Interesting possibilities suggest themselves by the time you finish Joshua’s posting.

I first saw this in a tweet by Dmitriy Ryaboy.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress