Faunus: Graph Analytics Engine by Marko Rodriguez.
From the description:
Faunus is a graph analytics engine built atop the Hadoop distributed computing platform. The graph representation is a distributed adjacency list, whereby a vertex and its incident edges are co-located on the same machine. Querying a Faunus graph is possible with a MapReduce-variant of the Gremlin graph traversal language. A Gremlin expression compiles down to a series of MapReduce-steps that are sequence optimized and then executed by Hadoop. Results are stored as transformations to the input graph (graph derivations) or computational side-effects such as aggregates (graph statistics). Beyond querying, a collection of input/output formats are supported which enable Faunus to load/store graphs in the distributed graph database Titan, various graph formats stored in HDFS, and via arbitrary user-defined functions. This presentation will focus primarily on Faunus, but will also review the satellite technologies that enable it.
I saw this slide deck after posting ConceptNet5 [Herein of Hypergraphs] and writing about the “id-less” nodes and edges of ConceptNet5.
So when I see nodes and edges with IDs, I have to wonder why?
What requirement is being met or advantage that is obtained by using IDs and not addressing a node by its content?*
Remembering that we are no longer concerned with shaving bits off of identifiers for storage and/or processing concerns.
* I suspect that addressing by content presumes a level of granularity that may not be appropriate in all cases. Hard to say. But I do want to look at the issue more closely.