Powers of Ten – Part I by Stephen Mallette.
From the post:
“No, no! The adventures first,’ said the Gryphon in an impatient tone: ‘explanations take such a dreadful time.”
— Lewis Carroll – Alice’s Adventures in WonderlandIt is often quite simple to envision the benefits of using Titan. Developing complex graph analytics over a multi-billion edge distributed graph represent the adventures that await. Like the Gryphon from Lewis Carroll’s tale, the desire to immediately dive into the adventures can be quite strong. Unfortunately and quite obviously, the benefits of Titan cannot be realized until there is some data present within it. Consider the explanations that follow; they are the strategies by which data is bulk loaded to Titan enabling the adventures to ensue.
There are a number of different variables that might influence the approach to loading data into a graph, but the attribute that provides the best guidance in making a decision is size. For purposes of this article, “size” refers to the estimated number of edges to be loaded into the graph. The strategy used for loading data tends to change in powers of ten, where the strategy for loading 1 million edges is different than the approach for 10 million edges.
Given this neat and memorable way to categorize batch loading strategies, this two-part article outlines each strategy starting with the smallest at 1 million edges or less and continuing in powers of ten up to 1 billion and more. This first part will focus on 1 million and 10 million edges, which generally involves common Gremlin operations. The second part will focus on 100 million and 1 billion edges, which generally involves the use of Faunus.
Great guidance on loading relatively small data sets using Gremlin. Looking forward to seeing the harder tests with 100 million and 1 billion edge sets.