Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 12, 2012

Introducing DataFu: an open source collection of useful Apache Pig UDFs

Filed under: DataFu,Hadoop,MapReduce,Pig — Patrick Durusau @ 7:34 pm

Introducing DataFu: an open source collection of useful Apache Pig UDFs

From the post:

At LinkedIn, we make extensive use of Apache Pig for performing data analysis on Hadoop. Pig is a simple, high-level programming language that consists of just a few dozen operators and makes it easy to write MapReduce jobs. For more advanced tasks, Pig also supports User Defined Functions (UDFs), which let you integrate custom code in Java, Python, and JavaScript into your Pig scripts.

Over time, as we worked on data intensive products such as People You May Know and Skills, we developed a large number of UDFs at LinkedIn. Today, I’m happy to announce that we have consolidated these UDFs into a single, general-purpose library called DataFu and we are open sourcing it under the Apache 2.0 license:

Check out DataFu on GitHub!

DataFu includes UDFs for common statistics tasks, PageRank, set operations, bag operations, and a comprehensive suite of tests. Read on to learn more.

This is way cool!

Read the rest of Matthew’s post (link above) or get thee to GitHub!

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress