From the webpage:
dipLODocus[RDF] is a new system for RDF data processing supporting both simple transactional queries and complex analytics efficiently. dipLODocus[RDF] is based on a novel hybrid storage model considering RDF data both from a graph perspective (by storing RDF subgraphs or RDF molecules) and from a “vertical” analytics perspective (by storing compact lists of literal values for a given attribute).
Overview
Our system is built on three main structures: RDF molecule clusters (which can be seen as hybrid structures borrowing both from property tables and RDF subgraphs), template lists (storing literals in compact lists as in a column-oriented database system) and an efficient hash-table indexing URIs and literals based on the clusters they belong to.
Figure below gives a simple example of a few molecule clusters—storing information about students—and of a template list—compactly storing lists of student IDs. Molecules can be seen as horizontal structures storing information about a given object instance in the database (like rows in relational systems). Template lists, on the other hand, store vertical lists of values corresponding to one type of object (like columns in a relational system).
Interesting performance numbers:
- 30x RDF-3X on LUBM queries
- 350x Virtuoso on analytic queries
Combines data structures as opposed to adopting one single approach.
Perhaps data structures will be explored and optimized for data, rather than the other way around?
dipLODocus[RDF] | Short and Long-Tail RDF Analytics for Massive Webs of Data by Marcin Wylot, Jigé Pont, Mariusz Wisniewski, and Philippe Cudré-Mauroux (paper – PDF).
I first saw this at the SemanticWeb.com.