Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 20, 2013

GraphX: A Resilient Distributed Graph System on Spark

Filed under: Graphs,GraphX,Spark — Patrick Durusau @ 10:23 am

GraphX: A Resilient Distributed Graph System on Spark by Reynold Xin, Joseph Gonzalez, Michael Franklin, Ion Stoica.

Abstract:

From social networks to targeted advertising, big graphs capture the structure in data and are central to recent advances in machine learning and data mining. Unfortunately, directly applying existing data-parallel tools to graph computation tasks can be cumbersome and inefficient. The need for intuitive, scalable tools for graph computation has lead to the development of new graph-parallel systems (e.g. Pregel, PowerGraph) which are designed to efficiently execute graph algorithms. Unfortunately, these new graph-parallel systems do not address the challenges of graph construction and transformation which are often just as problematic as the subsequent computation. Furthermore, existing graph-parallel systems provide limited fault-tolerance and support for interactive data mining.

We introduce GraphX, which combines the advantages of both data-parallel and graph-parallel systems by efficiently expressing graph computation within the Spark data-parallel framework. We leverage new ideas in distributed graph representation to efficiently distribute graphs as tabular data-structures. Similarly, we leverage advances in data-flow systems to exploit in-memory computation and fault-tolerance. We provide powerful new operations to simplify graph construction and transformation. Using these primitives we implement the PowerGraph and Pregel abstractions in less than 20 lines of code. Finally, by exploiting the Scala foundation of Spark, we enable users to interactively load, transform, and compute on massive graphs.

Of particular note is the use of an immutable graph as the core data structure for GraphX.

The authors report that GraphX performs less well than PowerGraph (GraphLab 2.1) but promise performance gains and offsetting gains in productivity.

I didn’t find any additional resources at AMPLab on GraphX but did find:

Spark project homepage, and,

Screencasts on Spark

Both will benefit you when more information emerges on GraphX.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress