Archive for the ‘Piccolo’ Category

Piccolo: Distributed Computing via Shared Tables

Saturday, December 8th, 2012

Piccolo: Distributed Computing via Shared Tables

From the homepage:

Piccolo is a framework designed to make it easy to develop efficient distributed applications.

In contrast to traditional data-centric models (such as Hadoop) which present the user a single object at a time to operate on, Piccolo exposes a global table interface which is available to all parts of the computation simulataneously. This allows users to specify programs in an intuitive manner very similar to that of writing programs for a single machine.

Piccolo includes a number of optimizations to ensure that using this table interface is not just easy, but also fast:

Locality
To ensure locality of execution, tables are explicitly partitioned across machines. User code that interacts with the tables can specify a locality preference: this ensures that the code is executed locally with the data it is accessing.
Load-balancing
Not all load is created equal – often some partition of a computation will take much longer then others. Waiting idly for this task to finish wastes valuable time and resources. To address this Piccolo can migrate tasks away from busy machines to take advantage of otherwise idle workers, all while preserving the locality preferences and the correctness of the program.
Failure Handling
Machines failures are inevitable, and generally occur when you’re at the most critical time in your computation. Piccolo makes checkpointing and restoration easy and fast, allowing for quick recovery in case of failures.
Synchronization
Managing the correct synchronization and update across a distributed system can be complicated and slow. Piccolo addresses this by allowing users to defer synchronization logic to the system. Instead of explicitly locking tables in order to perform updates, users can attach accumulation functions to a table: these are used automatically by the framework to correctly combine concurrent updates to a table entry.

The closer you are to the metal, the more aware you will be of the distributed nature of processing and data.

Will the success of distributed processing/storage be when all but systems architects are unaware of its nature?

GraphLab vs. Piccolo vs. Spark

Saturday, December 8th, 2012

GraphLab vs. Piccolo vs. Spark by Danny Bickson.

From the post:

I got an interesting case study from Cui Henggang, a first year graduate student at CMU Parallel Data Lab. Cui implemented GMM on GraphLab, for comparing its performance to Piccolo and Spark. His collaborators on this projects where Jinliang Wei and Wei Dai. The algorithm is described on Chris Bishop, Pattern Recognition and Machine Learning, Chapter 9.2, page 438.

Danny reports Chu will be releasing his report and posting his GMM code to the graphic models toolkit (GraphLab).

I will post a pointer when the report appears, here and probably in a new post as well.