Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 15, 2014

GraphX: Graph Processing in a Distributed Dataflow Framework

Filed under: Distributed Computing,Graphs,GraphX — Patrick Durusau @ 7:25 pm

GraphX: Graph Processing in a Distributed Dataflow Framework by Joseph Gonzalez, Reynold Xin, Ankur Dave, Dan Crankshaw, Michael Franklin, Ion Stoica.

Abstract:

In pursuit of graph processing performance, the systems community has largely abandoned general-purpose distributed dataflow frameworks in favor of specialized graph processing systems that provide tailored programming abstractions and accelerate the execution of iterative graph algorithms. In this paper we argue that many of the advantages of specialized graph processing systems can be recovered in a modern general-purpose distributed dataflow system. We introduce GraphX, an embedded graph processing framework built on top of Apache Spark, a widely used distributed dataflow system. GraphX presents a familiar composable graph abstraction that is sufficient to express existing graph APIs, yet can be implemented using only a few basic dataflow operators (e.g., join, map, group-by). To achieve performance parity with specialized graph systems, GraphX recasts graph-specific optimizations as distributed join optimizations and materialized view maintenance. By leveraging advances in distributed dataflow frameworks, GraphX brings low-cost fault tolerance to graph processing. We evaluate GraphX on real workloads and demonstrate that GraphX achieves an order of magnitude performance gain over the base dataflow framework and matches the performance of specialized graph processing systems while enabling a wider range of computation.

GraphX: Graph Processing in a Distributed Dataflow Framework (as PDF file)

The “other” systems for comparison were GraphLab and Giraph. Those systems were tuned in cooperation with experts in their use. These are some of the “fairest” benchmarks you are likely to see this year. Quite different from “shiny graph engine” versus lame or misconfigured system benchmarks.

Definitely the slow-read paper for this week!

I first saw this in a tweet by Arnon Rotem-Gal-Oz.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress