Making Master Data Management Fun with Neo4j – Part 1 by Brian Underwood.
From Part 1:
Joining multiple disparate data-sources, commonly dubbed Master Data Management (MDM), is usually not a fun exercise. I would like to show you how to use a graph database (Neo4j) and an interesting dataset (developer-oriented collaboration sites) to make MDM an enjoyable experience. This approach will allow you to quickly and sensibly merge data from different sources into a consistent picture and query across the data efficiently to answer your most pressing questions.
To start I’ll just be importing one data source: StackOverflow questions tagged with
neo4j
and their answers. In future blog posts I will discuss how to integrate other data sources into a single graph database to provide a richer view of the world of Neo4j developers’ online social interactions.I’ve created a GraphGist to explore questions about the imported data, but in this post I’d like to briefly discuss the process of getting data from StackOverflow into Neo4j.
…
Part 1 imports data from Stackover flow into Neoj.
Making Master Data Management Fun with Neo4j – Part 2 imports Github data:
All together I was able to import:
- 6,337 repositories
- 6,232 users
- 11,011 issues
- 474 commits
- 22,676 comments
In my next post I’ll show the process of how I linked the orignal StackOveflow data with the new GitHub data. Stay tuned for that, but in the meantime I’d also like to share the more technical details of what I did for those who are interested.
Definitely looking forward to seeing the reconciliation of data between StackOverflow and GitHub.