Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

August 9, 2014

400 GTEPS on 4096 GPUs

Filed under: Distributed Systems,GPU,Graphs — Patrick Durusau @ 7:14 pm

Breadth-First Graph Search Uses 2D Domain Decomposition – 400 GTEPS on 4096 GPUs by Rob Farber.

From the post:

Parallel Breadth-First Search is a standard benchmark and the basis of many other graph algorithms. The challenge li[]es in partitioning the graph across multiple nodes in a cluster while avoiding load-imbalance and communications delays. The authors of the paper, “Parallel Breadth First Search on the Kepler Architecture” utilize an interesting 2D decomposition of the graph adjacency matrix. Tests on R-MAT graphs shows large graph performance ranging from 1.1 GTEP on a single K20 to 396 GTEP using 4096 GPUs. The tests also compared performance against the method of Beamer (10 GTEP single SMP device and 240 GTEP on 115k cores).

See Rob’s post for background on the distributed DFS problem and additional references.

Graph processing continues to improve at an impressive rate but I wonder how applicable some techniques are to intersections of graphs?

The optimization of using a bitmap to mark vertices visited (Scalable Graph Exploration on Multicore Processors, Agarwal, et al., 2010), cited by authors of Parallel Distributed Breadth First Search on the Kepler Architecture saying:

Then, to reduce the work, we used an integer map to keep track of visited vertices. Agarwal et al., first introduced this optimization using a bitmap that has been used in almost all subsequent works.

appears to be stumbling block to tracking a vertex that appears in intersecting graphs.

Or would you track visited vertices in each intersecting graph separately? And communicate results from each intersecting graph?

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress