CUBE and ROLLUP: Two Pig Functions That Every Data Scientist Should Know by Joshua Lande.
From the post:
I recently found two incredible functions in Apache Pig called CUBE and ROLLUP that every data scientist should know. These functions can be used to compute multi-level aggregations of a data set. I found the documentation for these functions to be confusing, so I will work through a simple example to explain how they work.
Joshua starts his post with a demonstration of using GROUP BY
in Pig for simple aggregations. That sets the stage for demonstrating how important CUBE
and ROLLUP
can be for data aggregations in PIG.
Interesting possibilities suggest themselves by the time you finish Joshua’s posting.
I first saw this in a tweet by Dmitriy Ryaboy.