GPS: A Graph Processing System
From the post:
GPS is an open-source system for scalable, fault-tolerant, and easy-to-program execution of algorithms on extremely large graphs. GPS is similar to Google’s proprietary Pregel system, and Apache Giraph.GPS is a distributed system designed to run on a cluster of machines, such as Amazon’s EC2.
In systems such as GPS and Pregel, the input graph (directed, possibly with values on edges) is distributed across machines and vertices send each other messages to perform a computation. Computation is divided into iterations called supersteps. Analogous to the map() and reduce() functions of the MapReduce framework, in each superstep a user-defined function called vertex.compute() is applied to each vertex in parallel. The user expresses the logic of the computation by implementing vertex.compute(). This design is based on Valiant’s Bulk Synchronous Parallel model of computation. A detailed description can be found in the original Pregel paper.
There are five main differences between Pregel and GPS:
- GPS is open-source.
- GPS extends Pregel’s API with a master.compute() function, which enables easy and efficient implementation of algorithms that are composed of multiple vertex-centric computations, combined with global computations
- GPS has an optional dynamic repartitioning scheme, which reassigns vertices to different machines during graph computation to improve performance, based on observing communication patterns.
- GPS has an optimization called LALP that reduces the network I/O in when running certain algorithms on real-world graphs that have skewed degree distributions.
- GPS programs can be implemented using a higher-level domain specific language called Green-Marl, and automatically compiled into native GPS code. Green-Marl is a traditional imperative language with several graph-specific language constructs that enable intuitive and simple expression of complicated algorithms.
We have completed an initial version of GPS, which is available to download. We have run GPS on up to 100 Amazon EC2 large instances and on graphs of up to 250 million vertices and 10 billion edges. (emphasis added)
…
In light of the availability and performance statement, I suppose we can overlook the choice of a potentially confusing acronym. 😉
The Green-Marl compiler can be used to implement algorithms for GPS. Consult the Green-Marl paper before deciding its assumptions about processing will fit your use cases.
The team also wrote: Optimizing Graph Algorithms on Pregel-like Systems, due to appear in VLDB 2014.
I first saw this in a tweet by James Thornton.