BPP: Large Graph Storage for Efficient Disk Based Processing by Kamran Najeebullah, Kifayat Ullah Khan, Waqas Nawaz, and Young-Koo Lee.
Abstract:
Processing very large graphs like social networks, biological and chemical compounds is a challenging task. Distributed graph processing systems process the billion-scale graphs efficiently but incur overheads of efficient partitioning and distribution of the graph over a cluster of nodes. Distributed processing also requires cluster management and fault tolerance. In order to overcome these problems GraphChi was proposed recently. GraphChi significantly outperformed all the representative distributed processing frameworks. Still, we observe that GraphChi incurs some serious degradation in performance due to 1) high number of non-sequential I/Os for processing every chunk of graph; and 2) lack of true parallelism to process the graph. In this paper we propose a simple yet powerful engine BiShard Parallel Processor (BPP) to efficiently process billions-scale graphs on a single PC. We extend the storage structure proposed by GraphChi and introduce a new processing model called BiShard Parallel (BP). BP enables full CPU parallelism for processing the graph and significantly reduces the number of non-sequential I/Os required to process every chunk of the graph. Our experiments on real large graphs show that our solution significantly outperforms GraphChi.
…[B]illion-scale graph on a single PC.
Cool!
Err, but the experimental results in the paper are based on “7 thousand plus vertices and more than 1 hundred thousand edges.”
I’m not sure how I get to a “billion-scale” graph?
Interesting results and quite possibly will lead to other breakthroughs in graph processing.
A bit more attention to making the abstract match the results would be appreciated.
Not to mention finding acronyms that don’t conflict with better known ones, like “BP.”
Searching for “BP” isn’t likely to find this paper even in a very long tail of results.
I first saw this in a tweet by Stefano Bertolo.
Rightly said a graph with 1 hundred thousand edges is not billion-scale. I just found the Journal version of this conference paper that lists results on truly billion-scale graphs. It seems that the author took your advice for a better name to search with as well. You can find the journal version here http://www.sersc.org/journals/IJMUE/vol9_no2_2014/20.pdf
Comment by Kamran — March 11, 2014 @ 7:05 pm