Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

June 2, 2014

Powers of Ten – Part II

Filed under: Faunus,Graphs,Gremlin,Titan — Patrick Durusau @ 6:54 pm

Powers of Ten – Part II by Stephen Mallette.

From the post:

“‘Curiouser and curiouser!’ cried Alice (she was so much surprised, that for the moment she quite forgot how to speak good English); ‘now I’m opening out like the largest telescope that ever was!”
    — Lewis CarrollAlice’s Adventures in Wonderland

It is sometimes surprising to see just how much data is available. Much like Alice and her sudden increase in height, in Lewis Carroll’s famous story, the upward growth of data can happen quite quickly and the opportunity to produce a multi-billion edge graph becomes immediately present. Luckily, Titan is capable of scaling to accommodate such size and with the right strategies for loading this data, the development efforts can more rapidly shift to the rewards of massive scale graph analytics.

This article represents the second installment in the two part Powers of Ten series that discusses bulk loading data into Titan at varying scales. For purposes of this series, the “scale” is determined by the number of edges to be loaded. As it so happens, the strategies for bulk loading tend to change as the scale increases over powers of ten, which creates a memorable way to categorize different strategies. “Part I” of this series, looked at strategies for loading millions and tens of millions of edges and focused on usage of Gremlin to do so. This part of the series will focus on hundreds of millions and billions of edges and will focus on the usage of Faunus as the loading tool.

Note: By Titan 0.5.0, Faunus will be pulled into the Titan project under the name Titan/Hadoop.

Scaling to graph processing to hundreds of millions and billions of edges.

Deeply interesting work but I am left with multiple questions:

  • Hundreds of millions and billions of edges, to load. Any other graph metrics? Traversal for example?
  • Does loading performance scale with more servers? Instead of m2.4xlarge EC2 instances, what is the performance with 8x?
  • What kind of knob tuning was useful with a social network dataset?

I am sure there are other questions but those are the first ones that came to mind.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress