Work Efficient Parallel Algorithms for Large Graph Exploration by Dip Sankar Banerjee, Shashank Sharma, Kishore Kothapalli.
Abstract:
Graph algorithms play a prominent role in several fields of sciences and engineering. Notable among them are graph traversal, finding the connected components of a graph, and computing shortest paths. There are several efficient implementations of the above problems on a variety of modern multiprocessor architectures. It can be noticed in recent times that the size of the graphs that correspond to real world data sets has been increasing. Parallelism offers only a limited succor to this situation as current parallel architectures have severe short-comings when deployed for most graph algorithms. At the same time, these graphs are also getting very sparse in nature. This calls for particular work efficient solutions aimed at processing large, sparse graphs on modern parallel architectures. In this paper, we introduce graph pruning as a technique that aims to reduce the size of the graph. Certain elements of the graph can be pruned depending on the nature of the computation. Once a solution is obtained for the pruned graph, the solution is extended to the entire graph. We apply the above technique on three fundamental graph algorithms: breadth first search (BFS), Connected Components (CC), and All Pairs Shortest Paths (APSP). To validate our technique, we implement our algorithms on a heterogeneous platform consisting of a multicore CPU and a GPU. On this platform, we achieve an average of 35% improvement compared to state-of-the-art solutions. Such an improvement has the potential to speed up other applications that rely on these algorithms.
This paper makes me curious if pruning could be a viable solution for processing very large topic maps?
I may have a map that includes all of Wikipedia but for some purposes, I have no interest in any of the Wikipedia data.
Or I may have different accounting entries that display depending upon the user accessing my accounting map. Totals, audit trails should be generated from one consistent set of entries.
Any experience or thoughts in that regard?