Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 12, 2013

Sinking Data to Neo4j from Hadoop with Cascading

Filed under: Cascading,Hadoop,Neo4j — Patrick Durusau @ 10:18 am

Sinking Data to Neo4j from Hadoop with Cascading by Paul Ingles.

From the post:

Recently, I worked with a colleague (Paul Lam, aka @Quantisan on building a connector library to let Cascading interoperate with Neo4j: cascading.neo4j. Paul had been experimenting with Neo4j and Cypher to explore our data through graphs and we wanted an easy way to flow our existing data on Hadoop into Neo4j.

The data processing pipeline we’ve been growing at uSwitch.com is built around Cascalog, Hive, Hadoop and Kafka.

Once the data has been aggregated and stored a lot of our ETL is performed upon Cascalog and, by extension, Cascading. Querying/analysis is a mix of Cascalog and Hive. This layer is built upon our long-term data storage system: Hadoop; this, all combined, lets us store high-resolution data immutably at a much lower cost than uSwitch’s previous platform.

As Paul notes later in his post, this isn’t a fast solution, about 20,000 nodes a second.

But if that fits your requirements, could be a good place to start.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress