From the homepage:
Petuum is a distributed machine learning framework. It takes care of the difficult system “plumbing work”, allowing you to focus on the ML. Petuum runs efficiently at scale on research clusters and cloud compute like Amazon EC2 and Google GCE.
A Bit More Details
Petuum provides essential distributed programming tools that minimize programmer effort. It has a distributed parameter server (key-value storage), a distributed task scheduler, and out-of-core (disk) storage for extremely large problems. Unlike general-purpose distributed programming platforms, Petuum is designed specifically for ML algorithms. This means that Petuum takes advantage of data correlation, staleness, and other statistical properties to maximize the performance for ML algorithms.
Plug and Play
Petuum comes with a fast and scalable parallel LASSO regression solver, as well as an implementation of topic model (Latent Dirichlet Allocation) and L2-norm Matrix Factorization – with more to be added on a regular basis. Petuum is fully self-contained, making installation a breeze – if you know how to use a Linux package manager and type “make”, you’re ready to use Petuum. No mucking around trying to find that Hadoop cluster, or (worse still) trying to install Hadoop yourself. Whether you have a single machine or an entire cluster, Petuum just works.
What’s Petuum anyway?
Petuum comes from “perpetuum mobile,” which is a musical style characterized by a continuous steady stream of notes. Paganini’s Moto Perpetuo is an excellent example. It is our goal to build a system that runs efficiently and reliably — in perpetual motion.
Musically inclined programmers? 😉
The bar for using Hadoop and machine learning gets lower by the day. At least in terms of details that can be mastered by code.
Which is how it should be. The creative work, choosing data, appropriate algorithms, etc., being left to human operators.
I first saw this at Danny Bickson’s Petuum – a new distributed machine learning framework from CMU (Eric Xing).
PS: Remember to register for the 3rd GraphLab Conference!