Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 25, 2013

Faunus 4.1 Release

Filed under: Faunus,Graph Analytics,Graphs — Patrick Durusau @ 5:36 pm

Faunus 4.1 Release

I don’t find this change reflected in the 4.1 release notes but elsewhere Marko Rodriguez writes:

I tested the new code on a subset of the Friendster data (6 node Hadoop and 6 node Cassandra cluster).

    vertices: 7 minutes to write 39 million vertices at ~100mb/second from the Hadoop to the Cassandra cluster.

  • edges: 15 minutes to write 245 million edges at ~40mb/second from the Hadoop to the Cassandra cluster.

This is the fastest bulk load time I’ve seen to date. This means, DBPedia can be written in ~20 minutes! I’ve attached an annotated version of the Ganglia monitor to the email that shows the outgoing throughput for the various stages of the MapReduce job. In the past, I was lucky to get 5-10mb/second out of the edge writing stage (this had to do with how I was being dumb about how reduce worked in Hadoop — wasn’t considering the copy/shuffle aspect of the stage).

At this rate, this means we can do billion edges graphs in a little over 1 hour. I bet though I can now speed this up more with some parameter tweaking as I was noticing that Cassandra was RED HOT and locking up a few times on transaction commits. Anywho, Faunus 0.4.1 is going to be gangbusters!

Approximately one billion edges an hour?

It’s not > /dev/null speed but still quite respectable. 😉

Faunus 0.4.1 wikidoc.

Download Faunus 0.4.1.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress