On Demand Memory Specialization for Distributed Graph Databases by Xavier Martinez-Palau, David Dominguez-Sal, Reza Akbarinia, Patrick Valduriez, Josep Lluís Larriba-Pey.
Abstract:
In this paper, we propose the DN-tree that is a data structure to build lossy summaries of the frequent data access patterns of the queries in a distributed graph data management system. These compact representations allow us an efficient communication of the data structure in distributed systems. We exploit this data structure with a new Dynamic Data Partitioning strategy (DYDAP) that assigns the portions of the graph according to historical data access patterns, and guarantees a small network communication and a computational load balance in distributed graph queries. This method is able to adapt dynamically to new workloads and evolve when the query distribution changes. Our experiments show that DYDAP yields a throughput up to an order of magnitude higher than previous methods based on cache specialization, in a variety of scenarios, and the average response time of the system is divided by two.
Graph partitioning based on evolving and summarized query experience. Results in a min cut on far less than the whole graph.
A heavy read but promising approach.
Makes me curious about non-uniform merging priorities.
For example, there could be some topics that merge frequently, say those representing financial information. While others, say for historical figures, may merge infrequently.
Rather than treating all topics equally in terms of computational resources, some topics could be processed at higher priority than others. (I am assuming an appropriately defined SLA.)
Although the partitioning approach could be applicable to distributed topic maps as well.