STINGER: Spatio-Temporal Interaction Networks and Graphs (STING) Extensible Representation by David A. Bader, Georgia Institute of Technolgy; Jonathan Berry, Sandia National Laboratories; Adam Amos-Binks, Carleton University, Canada; Daniel Chavarrıa-Miranda, Pacific Northwest National Laboratory; Charles Hastings, Hayden Software Consulting, Inc.; Kamesh Madduri, Lawrence Berkeley National Laboratory; and, Steven C. Poulos, U.S. Department of Defense. Dated May 9, 2009.
Abstract:
In this document, we propose a dynamic graph data structure that can serve as a common data structure for multiple real-world applications. The extensible representation for dynamic complex networks is space-efficient, allows parallelism over vertices and edges independently, and can be used for efficient checkpoint/restart of the data.
Describes a deeply interesting data structure for graphs that can be used on different frameworks.
See the Stinger wiki page (with source code as attachments).
And, see D. Ediger, K. Jiang, J. Riedy, and D.A. Bader, “Massive Streaming Data Analytics: A Case Study with Clustering Coefficients,” 4th Workshop on Multithreaded Architectures and Applications (MTAAP), Atlanta, GA, April 23, 2010.
Abstract:
We present a new approach for parallel massive graph analysis of streaming, temporal data with a dynamic and extensible representation. Handling the constant stream of new data from health care, security, business, and social network applications requires new algorithms and data structures. We examine data structure and algorithm trade-offs that extract the parallelism necessary for high-performance updating analysis of massive graphs. Static analysis kernels often rely on storing input data in a specific structure. Maintaining these structures for each possible kernel with high data rates incurs a significant performance cost. A case study computing clustering coefficients on a general-purpose data structure demonstrates incremental updates can be more efficient than global recomputation. Within this kernel, we compare three methods for dynamically updating local clustering coefficients: a brute-force local recalculation, a sorting algorithm, and our new approximation method using a Bloom filter. On 32 processors of a Cray XMT with a synthetic scale-free graph of 224 ≈ 16 million vertices and 229 ≈ 537 million edges, the brute-force method processes a mean of over 50,000 updates per second and our Bloom filter approaches 200,000 updates per second.
The authors refer to their approach as “massive streaming data analytics“. I think you will agree.
OK, admittedly they used a Cray XMT. But, such processing power will be available the average site sooner than you think. Soon enough that reading along these lines will put you ahead of the next curve.