I Mapreduced a Neo store by Kris Geusebroek.
From the post:
Lately I’ve been busy talking at conferences to tell people about our way to create large Neo4j databases. Large means some tens of millions of nodes and hundreds of millions of relationships and billions of properties.
Our use case consisted of exploring our data to find interesting patterns. The data we want to explore is about financial transactions between people, so the Neo4j graph model is a good fit for us. Because we don’t know upfront what we are looking for we need to create a Neo4j database with some parts of the data and explore that. When there is nothing interesting to find we go enhance our data to contain new information and possibly new connections and create a new Neo4j database with the extra information.
This means it’s not about a one time load of the current data and keep that up to date by adding some more nodes and edges. It’s really about building a new database from the ground up everytime we think of some new way to look at the data.
Deeply interesting work, particularly for its investigation of the internal file structure of Neo4j.
Curious about the
…building a new database from the ground up everytime we think of some new way to look at the data.
To what extent are static database structures a legacy of a shortage of CPU cycles?
With limited CPU cycles, it was necessary to create a static structure, against which query languages could be developed and optimized (again because of a shortage of CPU cycles), and the persisted data structure avoided the overhead of rebuilding the data structure for each user.
It may be that cellphones and tablets need the convenience of static data structures or at least representations of static data structures.
But what of server farms populated by TBs of 3D memory?
Isn’t it time to start thinking beyond the limitations imposed by decades of CPU cycle shortages?