Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 10, 2015

Titan Graph DB Performance Tips

Filed under: Graphs,TinkerPop,Titan — Patrick Durusau @ 2:33 pm

Titan Graph DB Performance Tips

From the post:

In Hawkular Inventory, we use the Tinkerpop API (version 2 for the time being) to store our inventory model in a graph database. We chose Titan as the storage engine configured to store the data in the Cassandra cluster that is also backing Hawkular Metrics and Alerts. This blog post will guide you through some performance-related lessons with Titan that we learned so far.

Inventory is under heavy development with a lot of redesign and refactoring going on between releases so we took quite a naive approach to storing and querying data from the graph database. That is, we store entities from our model as vertices in the graph and the relationships between the entities as edges in the graph. Quite simple and a school book example of how it should look like.

We did declare a couple of indices in the database on the read-only aspects of the vertices (i.e. a “type” of the entity the vertex corresponds to) but we actually didn’t pay too much attention to the performance. We wanted to have the model right first.

Fast forward a couple of months and of course, the performance started to be a real problem. The Hawkular agent for Wildfly is inserting a non-trivial amount of entities and not only inserting them but also querying them has seen a huge performance degradation compared to the simple examples we were unit testing with (due to number of vertices and edges stored).

The time has come to think about how to squeeze some performance out of Titan as well as how to store the data and query it more intelligently.

Several performance tips but the one that caught my eye and resulted in an order of magnitude performance gain:

3. Mirror Properties on The Edges

This is the single most important optimization we’ve done so far. The rationale is this. To jump from a vertex to another vertex over an edge is a fairly expensive operation. Titan uses the adjacency lists to store the vertices and their edges in wide rows in Cassandra. It uses another adjacency list for edges and their target vertices.

So to go from vertex to vertex, Titan actually has to do 2 queries. It would be much easier if we could avoid that at least in some cases.

The solution here is to copy the values of some (frequently used and, in our case, immutable) properties from the “source” and “target” vertices directly to the edges. This helps especially in the cases where you do some kind of filtering on the target vertices that you instead can do directly on the edges. If there is a high number of edges to go through, this helps tremendously because you greatly reduce the number of times you have to do the “second hop” to the target vertex.

I am curious what is being stored on the vertex that requires a second search to jump to the target vertex?

That is if you have moved “popular” vertex properties to the edge, why not move other properties of the node there?

Suggestions?

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress