GraphChi-DB: Simple Design for a Scalable Graph Database System — on Just a PC by Aapo Kyrola and Carlos Guestrin.
Abstract:
We propose a new data structure, Parallel Adjacency Lists (PAL), for efficiently managing graphs with billions of edges on disk. The PAL structure is based on the graph storage model of GraphChi (Kyrola et. al., OSDI 2012), but we extend it to enable online database features such as queries and fast insertions. In addition, we extend the model with edge and vertex attributes. Compared to previous data structures, PAL can store graphs more compactly while allowing fast access to both the incoming and the outgoing edges of a vertex, without duplicating data. Based on PAL, we design a graph database management system, GraphChi-DB, which can also execute powerful analytical graph computation.
We evaluate our design experimentally and demonstrate that GraphChi-DB achieves state-of-the-art performance on graphs that are much larger than the available memory. GraphChi-DB enables anyone with just a laptop or a PC to work with extremely large graphs.
Open source will be released at: https://github.com/GraphChi.
With data structure improvements like you find with GraphChi-DB, it won’t be long until the average laptop becomes a weapons grade munition. 😉
Study the Partitioned Adjacency List (PAL) details carefully to follow up on the suggestions of using PAL for RDBMS and RDF storage (topic maps?).
Highly recommended!