…[N]ew “GraphStore” core – Gephi

Gephi boosts its performance with new “GraphStore” core by Mathieu Bastian.

From the post:

Gephi is a graph visualization and analysis platform – the entire tool revolves around the graph the user is manipulating. All modules (e.g. filter, ranking, layout etc.) touch the graph in some way or another and everything happens in real-time, reflected in the visualization. It’s therefore extremely important to rely on a robust and fast underlying graph structure. As explained in this article we decided in 2013 to rewrite the graph structure and started the GraphStore project. Today, this project is mostly complete and it’s time to look at some of the benefits GraphStore is bringing into Gephi (which its 0.9 release is approaching).

Performance is critical when analyzing graphs. A lot can be done to optimize how graphs are represented and accessed in the code but it remains a hard problem. The first versions of Gephi didn’t always shine in that area as the graphs were using a lot of memory and some operations such as filter were slow on large networks. A lot was learnt though and when the time came to start from scratch we knew what would move the needle. Compared to the previous implementation, GraphStore uses simpler data structures (e.g. more arrays, less maps) and cache-friendly collections to make common graph operations faster. Along the way, we relied on many micro-benchmarks to understand what was expensive and what was not. As often with Java, this can lead to surprises but it’s a necessary process to build a world-class graph library.

What shall we say about the performance numbers?

IMPRESSIVE!

The test were against “two different classic graphs, one small (1.5K nodes, 19K edges) and one medium (83K nodes, 68K edges).”

Less than big data size graphs but isn’t the goal of big data analysis to extract the small portion of relevant data from the big data?

Yes?

Maybe there should be an axiom about gathering of irrelevant data into a big data pile, only to be excluded again.

Or premature graphification of largely irrelevant data.

Something to think about as you contribute to the further development of this high performing graph library.

Enjoy!

Comments are closed.