Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 27, 2015

Making Master Data Management Fun with Neo4j – Part 1, 2

Filed under: Graphs,Neo4j — Patrick Durusau @ 3:29 pm

Making Master Data Management Fun with Neo4j – Part 1 by Brian Underwood.

From Part 1:

Joining multiple disparate data-sources, commonly dubbed Master Data Management (MDM), is usually not a fun exercise. I would like to show you how to use a graph database (Neo4j) and an interesting dataset (developer-oriented collaboration sites) to make MDM an enjoyable experience. This approach will allow you to quickly and sensibly merge data from different sources into a consistent picture and query across the data efficiently to answer your most pressing questions.

To start I’ll just be importing one data source: StackOverflow questions tagged with neo4j and their answers. In future blog posts I will discuss how to integrate other data sources into a single graph database to provide a richer view of the world of Neo4j developers’ online social interactions.

I’ve created a GraphGist to explore questions about the imported data, but in this post I’d like to briefly discuss the process of getting data from StackOverflow into Neo4j.

Part 1 imports data from Stackover flow into Neoj.

Making Master Data Management Fun with Neo4j – Part 2 imports Github data:

All together I was able to import:

  • 6,337 repositories
  • 6,232 users
  • 11,011 issues
  • 474 commits
  • 22,676 comments

In my next post I’ll show the process of how I linked the orignal StackOveflow data with the new GitHub data. Stay tuned for that, but in the meantime I’d also like to share the more technical details of what I did for those who are interested.

Definitely looking forward to seeing the reconciliation of data between StackOverflow and GitHub.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress