Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 24, 2013

I Mapreduced a Neo store:…

Filed under: Graphs,Hadoop,Neo4j — Patrick Durusau @ 2:14 pm

I Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop by Kris Geusebroek. (Berlin Buzzwords 2013)

From the description:

When exploring very large raw datasets containing massive interconnected networks, it is sometimes helpful to extract your data, or a subset thereof, into a graph database like Neo4j. This allows you to easily explore and visualize networked data to discover meaningful patterns.

When your graph has 100M+ nodes and 1000M+ edges, using the regular Neo4j import tools will make the import very time-intensive (as in many hours to days).

In this talk, I’ll show you how we used Hadoop to scale the creation of very large Neo4j databases by distributing the load across a cluster and how we solved problems like creating sequential row ids and position-dependent records using a distributed framework like Hadoop.

If you find the slides hard to read (I did) you may want to try:

Combining Neo4J and Hadoop (part I) and,

Combining Neo4J and Hadoop (part II)

A recent update from Chris: I MapReduced a Neo4j store.

BTW, the code is on github.

Just in case you have any modest sized graphs that you want to play with in Neo4j. 😉

PS: I just found Chris’s slides: http://www.slideshare.net/godatadriven/i-mapreduced-a-neo-store-creating-large-neo4j-databases-with-hadoop

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress