Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 12, 2012

Graph Degree Distributions using R over Hadoop

Filed under: Graphs,Hadoop,R — Patrick Durusau @ 8:04 pm

Graph Degree Distributions using R over Hadoop

From the post:

The purpose of this post is to demonstrate how to express the computation of two fundamental graph statistics — each as a graph traversal and as a MapReduce algorithm. The graph engines explored for this purpose are Neo4j and Hadoop. However, with respects to Hadoop, instead of focusing on a particular vertex-centric BSP-based graph-processing package such as Hama or Giraph, the results presented are via native Hadoop (HDFS + MapReduce). Moreover, instead of developing the MapReduce algorithms in Java, the R programming language is used. RHadoop is a small, open-source package developed by Revolution Analytics that binds R to Hadoop and allows for the representation of MapReduce algorithms using native R.

The two graph algorithms presented compute degree statistics: vertex in-degree and graph in-degree distribution. Both are related, and in fact, the results of the first can be used as the input to the second. That is, graph in-degree distribution is a function of vertex in-degree. Together, these two fundamental statistics serve as a foundation for more quantifying statistics developed in the domains of graph theory and network science.

Observes that 10 billion elements (nodes + edges) require a single server. In the 100 billion element range, multiple servers are required.

Despite the emphasis on “big data,” 10 billion elements would be sufficient for many purposes.

Interesting use of R with Hadoop.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress