Scaling Apache Giraph to a trillion edges by Avery Ching.
From the post:
Graph structures are ubiquitous: they provide a basic model of entities with connections between them that can represent almost anything. Flight routes connect airports, computers communicate to one another via the Internet, webpages have hypertext links to navigate to other webpages, and so on. Facebook manages a social graph that is composed of people, their friendships, subscriptions, and other connections. Open graph allows application developers to connect objects in their applications with real-world actions (such as user X is listening to song Y).
Analyzing these real world graphs at the scale of hundreds of billions or even a trillion (10^12) edges with available software was impossible last year. We needed a programming framework to express a wide range of graph algorithms in a simple way and scale them to massive datasets. After the improvements described in this article, Apache Giraph provided the solution to our requirements.
In the summer of 2012, we began exploring a diverse set of graph algorithms across many different Facebook products as well as academic literature. We selected a few representative use cases that cut across the problem space with different system bottlenecks and programming complexity. Our diverse use cases and the desired features of the programming framework drove the requirements for our system infrastructure. We required an iterative computing model, graph-based API, and fast access to Facebook data. Based on these requirements, we selected a few promising graph-processing platforms including Apache Hive, GraphLab, and Apache Giraph for evaluation.
For your convenience:
Your appropriate scale is probably less than a trillion edges but everybody likes a great scaling story.
This is a great scaling story.
[…] expanding to include GraphLab (GraphLab – Next Generation [Johnny Come Lately VCs]) and Giraph (Scaling Apache Giraph to a trillion edges) or some of the other heavy hitters (insert your favorite) in the graph […]
Pingback by Benchmarking Graph Databases « Another Word For It — September 25, 2013 @ 3:34 pm