Making Master Data Management Fun with Neo4j – Part 1 by Brian Underwood.

From Part 1:

Joining multiple disparate data-sources, commonly dubbed Master Data Management (MDM), is usually not a fun exercise. I would like to show you how to use a graph database (Neo4j) and an interesting dataset (developer-oriented collaboration sites) to make MDM an enjoyable experience. This approach will allow you to quickly and sensibly merge data from different sources into a consistent picture and query across the data efficiently to answer your most pressing questions.

To start I’ll just be importing one data source: StackOverflow questions tagged with neo4j and their answers. In future blog posts I will discuss how to integrate other data sources into a single graph database to provide a richer view of the world of Neo4j developers’ online social interactions.

I’ve created a GraphGist to explore questions about the imported data, but in this post I’d like to briefly discuss the process of getting data from StackOverflow into Neo4j.

Part 1 imports data from Stackover flow into Neoj.

Making Master Data Management Fun with Neo4j – Part 2 imports Github data:

All together I was able to import:

  • 6,337 repositories
  • 6,232 users
  • 11,011 issues
  • 474 commits

In my next post I’ll show the process of how I linked the orignal StackOveflow data with the new GitHub data. Stay tuned for that, but in the meantime I’d also like to share the more technical details of what I did for those who are interested.

Definitely looking forward to seeing the reconciliation of data between StackOverflow and GitHub.

