Benchmarking Graph Databases by Alekh Jindal.
Speaking of data skepticism.
From the post:
Graph data management has recently received a lot of attention, particularly with the explosion of social media and other complex, inter-dependent datasets. As a result, a number of graph data management systems have been proposed. But this brings us to the question: What happens to the good old relational database systems (RDBMSs) in the context of graph data management?
The article names some of the usual graph database suspects.
But for its comparison, it selects only one (Neo4j) and compares it against three relational databases, MySQL, Vertica and VoltDB.
What’s missing? How about expanding to include GraphLab (GraphLab – Next Generation [Johnny Come Lately VCs]) and Giraph (Scaling Apache Giraph to a trillion edges) or some of the other heavy hitters (insert your favorite) in the graph world?
Nothing against Neo4j. It is making rapid progress on a query language and isn’t hard to learn. But it lacks the raw processing power of an application like Apache Giraph. Giraph, after all, is used to process the entire Facebook data set, not a “4k nodes and 88k edges” Facebook sample as in this comparison.
Not to mention that only two algorithms were used in this comparison: PageRank and Shortest Paths.
Personally I can imagine users being interested in running more than two algorithms. But that’s just me.
Every benchmarking project has to start somewhere but this sort of comparison doesn’t really advance the discussion of competing technologies.
Not that any comparison would be complete without a discussion of typical uses cases and user observations on how each candidate did or did not meet their expectations.