Archive for the ‘Green-Mari’ Category

GPS: A Graph Processing System

Wednesday, June 18th, 2014

GPS: A Graph Processing System

From the post:

GPS is an open-source system for scalable, fault-tolerant, and easy-to-program execution of algorithms on extremely large graphs. GPS is similar to Google’s proprietary Pregel system, and Apache Giraph.GPS is a distributed system designed to run on a cluster of machines, such as Amazon’s EC2.

In systems such as GPS and Pregel, the input graph (directed, possibly with values on edges) is distributed across machines and vertices send each other messages to perform a computation. Computation is divided into iterations called supersteps. Analogous to the map() and reduce() functions of the MapReduce framework, in each superstep a user-defined function called vertex.compute() is applied to each vertex in parallel. The user expresses the logic of the computation by implementing vertex.compute(). This design is based on Valiant’s Bulk Synchronous Parallel model of computation. A detailed description can be found in the original Pregel paper.

There are five main differences between Pregel and GPS:

  • GPS is open-source.
  • GPS extends Pregel’s API with a master.compute() function, which enables easy and efficient implementation of algorithms that are composed of multiple vertex-centric computations, combined with global computations
  • GPS has an optional dynamic repartitioning scheme, which reassigns vertices to different machines during graph computation to improve performance, based on observing communication patterns.
  • GPS has an optimization called LALP that reduces the network I/O in when running certain algorithms on real-world graphs that have skewed degree distributions.
  • GPS programs can be implemented using a higher-level domain specific language called Green-Marl, and automatically compiled into native GPS code. Green-Marl is a traditional imperative language with several graph-specific language constructs that enable intuitive and simple expression of complicated algorithms.

We have completed an initial version of GPS, which is available to download. We have run GPS on up to 100 Amazon EC2 large instances and on graphs of up to 250 million vertices and 10 billion edges. (emphasis added)

In light of the availability and performance statement, I suppose we can overlook the choice of a potentially confusing acronym. 😉

The Green-Marl compiler can be used to implement algorithms for GPS. Consult the Green-Marl paper before deciding its assumptions about processing will fit your use cases.

The team also wrote: Optimizing Graph Algorithms on Pregel-like Systems, due to appear in VLDB 2014.

I first saw this in a tweet by James Thornton.

Green-Marl – The Paper

Wednesday, June 20th, 2012

Green-Marl: A DSL for Easy and Efficient Graph Analysis by Sungpack Hong, Hassan Chafi, Eric Sedlar, and Kunle Olukotun.

I previously reported on the Green-Marl website/software and mentioned this paper: Green-Marl. Catching up on a severe backlog of papers during a slow summer weekend and read in Green-Marl paper:

The above mathematical descriptions imply two important assumptions that Green-Marl makes:

1. The graph is immutable and is not modified during analysis.

2. There are no aliases between graph instances nor between graph properties.

We assume an immutable graph so that we can focus on the task of graph analysis, rather than worry about orthogonal issues such as how graphs are constructed or modified. Since Green-Marl is designed to be used in re-writting only parts of the user application (Section 3.1), one can construct or modify the graph in their own preferred way (e.g. from data file, from a database, etc.) but when a Green-Marl generated implementation is handed a graph, the assumption is that the graph will not be modified while a Green-Marl procedure is analyizing it.

I can imagine assumption #1 being in place for processing a topic map but certainly not assumption #2.

But given the type of graph analysis they want to perform, the assumptions are justifiable.

Given a graph, \( G = (V;E) \), and a set of properties defined on the graph, \( \amalg = \{P_1; P_2; \ldots P_n\} \), our language is specifically designed for the following types of graph analysis:


  • Computing a scalar value from \( (G,\amalg) \), e.g. the conductance of a sub-graph
  • 

  • Computing a new property \( P_{n+1} \) from \( (G, \amalg ) \), e.g. the pagerank of each node of a graph
  • 

  • Selecting a subgraph of interest out of the original graph \( (V^’,E^’) \subset (V,E) \), e.g. strongly connected components of a graph

What if we share the assumption:

1. The graph is immutable and is not modified during analysis.

And have a different second assumption:

There are aliases between nodes and graph properties.

We want to say:

Given a graph, \( G = (V;E) \), and a set of properties defined on the graph, \( \amalg = \{P_1; P_2; \ldots P_n\} \),

  1. Compute the aliases for each property defined on the graph
  2. Compute the aliases for each node of the graph

(My suspicion being that aliases of properties must be computed before aliases on nodes, although that would depend upon how aliases between nodes are defined.)

Where actions based on the computation of aliases for both properties and nodes are a separate step in the analysis.

(There are some other complexities that I haven’t fully teased out so suggestions/comments welcome. But then that is always the case.)

Green-Marl

Saturday, June 2nd, 2012

Green-Marl

From the website:

Green-Marl [1] is a domain-specific language that is specially designed for graph data analysis. For the further information for the Green-Marl language, refer to the language specification draft [2], which can also be found in this directory in the source package.

‘gm_comp’ is a compiler for Green-Marl. It reads a Green-Marl file and generates an equivalent, efficient and parallelized C++ implementation, i.e. .cc file. More specifically, the compiler produces a C++ function for each Green-Marl procedure. The generated c++ functions can be compiled with gcc and therefore can be merged into any user application that are compilable with gcc.

The C++ codes that are generated by ‘gm_comp’ assume the following libraries:

  • gcc (with builtin atomic functions)
  • gcc (with OpenMp support)
  • a custom graph library and runtime (gm_graph)

The first two are supported by any recent gcc distributions (version 4.2 or higher); the third one is included in this source package.

‘gm_comp’ is also able to generate codes for a completely different target environment (See Section 5).

This is the sort of resource that should appear in a daily “update” about topic map relevant material on the WWW or in the published literature.

The paper, Green-Marl: A DSL for Easy and Efficient Graph Analysis (ASPLOS 2012), by Sungpack Hong, Hassan Chafi and Eric Sedlar, is quite good.

I first saw Green-Marl at Pete Warden’s Five Short Links.