BitPath — Label Order Constrained Reachability Queries over Large Graphs by Medha Atre, Vineet Chaoji, and Mohammed J. Zaki.
Abstract:
In this paper we focus on the following constrained reachability problem over edge-labeled graphs like RDF — “given source node x, destination node y, and a sequence of edge labels (a, b, c, d), is there a path between the two nodes such that the edge labels on the path satisfy a regular expression “*a.*b.*c.*d.*“. A “*” before “a” allows any other edge label to appear on the path before edge “a”. “a.*” forces at least one edge with label “a”. “.*” after “a” allows zero or more edge labels after “a” and before “b”. Our query processing algorithm uses simple divide-and-conquer and greedy pruning procedures to limit the search space. However, our graph indexing technique — based on “compressed bit-vectors” — allows indexing large graphs which otherwise would have been infeasible. We have evaluated our approach on graphs with more than 22 million edges and 6 million nodes — much larger compared to the datasets used in the contemporary work on path queries.
If similarity is a type of relationship (as per Peter Neubauer, Neo4j), does it stand to reason that similarity may be expressed by a series of labeled edges? And if so, would this technique enable robust processing of the same? Suggestions?