Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

August 4, 2012

Scalding

Filed under: MapReduce,Scala,Scalding — Patrick Durusau @ 3:07 pm

Scalding: Powerful & Concise MapReduce Programming

Description:

Scala is a functional programming language on the JVM. Hadoop uses a functional programming model to represent large-scale distributed computation. Scala is thus a very natural match for Hadoop.

In this presentation to the San Francisco Scala User Group, Dr. Oscar Boykin and Dr. Argyris Zymnis from Twitter give us some insight on Scalding DSL and provide some example jobs for common use cases.

Twitter uses Scalding for data analysis and machine learning, particularly in cases where we need more than sql-like queries on the logs, for instance fitting models and matrix processing. It scales beautifully from simple, grep-like jobs all the way up to jobs with hundreds of map-reduce pairs.

The Alice example failed (counted the different forms of Alice differently). I am reading a regex book so that may have made the problem more obvious.

Lesson: Test code/examples before presentation. 😉

See the Github repository: https://github.com/twitter/scalding.

Both Scalding and the presentation are worth your time.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress