Mosaic: processing a trillion-edge graph on a single machine by Adrian Colyer.
From the post:
Mosaic: Processing a trillion-edge graph on a single machine Maass et al., EuroSys’17
Unless your graph is bigger than Facebook’s, you can process it on a single machine.
With the inception of the internet, large-scale graphs comprising web graphs or social networks have become common. For example, Facebook recently reported their largest social graph comprises 1.4 billion vertices and 1 trillion edges. To process such graphs, they ran a distributed graph processing engine, Giraph, on 200 machines. But, with Mosaic, we are able to process large graphs, even proportional to Facebook’s graph, on a single machine.
In this case it’s quite a special machine – with Intel Xeon Phi coprocessors and NVMe storage. But it’s really not that expensive – the Xeon Phi used in the paper costs around $549, and a 1.2TB Intel SSD 750 costs around $750. How much do large distributed clusters cost in comparison? Especially when using expensive interconnects and large amounts of RAM.
So Mosaic costs less, but it also consistently outperforms other state-of-the-art out of core (secondary storage) engines by 3.2x-58.6x, and shows comparable performance to distributed graph engines. At one trillion edge scale, Mosaic can run an iteration of PageRank in 21 minutes (after paying a fairly hefty one-off set-up cost).
(And remember, if you have a less-than-a-trillion edges scale problem, say just a few billion edges, you can do an awful lot with just a single thread too!).
Another advantage of the single machine design, is a much simpler approach to fault tolerance:
… handling fault tolerance is as simple as checkpointing the intermediate stale data (i.e., vertex array). Further, the read-only vertex array for the current iteration can be written to disk parallel to the graph processing; it only requires a barrier on each superstep. Recovery is also trivial; processing can resume with the last checkpoint of the vertex array.
There’s a lot to this paper. Perhaps the two most central aspects are design sympathy for modern hardware, and the Hilbert-ordered tiling scheme used to divide up the work. So I’m going to concentrate mostly on those in the space available.
…
A publicly accessible version of the paper: Mosaic: Processing a trillion-edge graph on a single machine. Presentation slides.
Definitely a paper for near the top of my reading list!
Shallow but broad graphs (think telephone surveillance data) are all the rage but how would relatively narrow but deep graphs fare when being processed by Mosaic?
Using top-end but not uncommon hardware may enable your processing requirements to escape the notice of the NSA. Another benefit to commodity hardware.
Enjoy!