The impact of fast networks on graph analytics, part 1 by Frank McSherry.
From the post:
This is a joint post with Malte Schwarzkopf, cross-blogged here and at the CamSaS blog.
tl;dr: A recent NSDI paper argued that data analytics stacks don’t get much faster at tasks like PageRank when given better networking, but this is likely just a property of the stack they evaluated (Spark and GraphX) rather than generally true. A different framework (timely dataflow) goes 6x faster than GraphX on a 1G network, which improves by 3x to 15-17x faster than GraphX on a 10G network.
I spent the past few weeks visiting the CamSaS folks at the University of Cambridge Computer Lab. Together, we did some interesting work, which we – Malte Schwarzkopf and I – are now going to tell you about.
Recently, a paper entitled “Making Sense of Performance in Data Analytics Frameworks” appeared at NSDI 2015. This paper contains some surprising results: in particular, it argues that data analytics stacks are limited more by CPU than they are by network or disk IO. Specifically,
“Network optimizations can only reduce job completion time by a median of at most 2%. The network is not a bottleneck because much less data is sent over the network than is transferred to and from disk. As a result, network I/O is mostly irrelevant to overall performance, even on 1Gbps networks.” (§1)
The measurements were done using Spark, but the authors argue that they generalize to other systems. We thought that this was surprising, as it doesn’t match our experience with other data processing systems. In this blog post, we will look into whether these observations do indeed generalize.
One of the three workloads in the paper is the BDBench query set from Berkeley, which includes a “page-rank-like computation”. Moreover, PageRank also appears as an extra example in the NSDI slide deck (slide 38-39), used there to illustrate that at most a 10% improvement in job completion time can be had even for a network-intensive workload.
This was especially surprising to us because of the recent discussion around whether graph computations require distributed data processing systems at all. Several distributed systems get beat by a simple, single-threaded implementation on a laptop for various graph computations. The common interpretation is that graph computations are communication-limited; the network gets in the way, and you are better off with one machine if the computation fits.[footnote omitted]
…
The authors introduce Rust and timely dataflow to achieve rather remarkable performance gains. That is if you think a 4x-16x speedup over GraphX on the same hardware is a performance gain. (Most do.)
Code and instructions are available so you can test their conclusions for yourself. Hardware is your responsibility.
While you are waiting for part 2 to arrive, try Frank’s homepage for some fascinating reading.