Archive for the ‘Data Factorization’ Category

FDB: A Query Engine for Factorised Relational Databases

Tuesday, March 20th, 2012

FDB: A Query Engine for Factorised Relational Databases by Nurzhan Bakibayev, Dan Olteanu, and Jakub Závodný.

Abstract:

Factorised databases are relational databases that use compact factorised representations at the physical layer to reduce data redundancy and boost query performance. This paper introduces FDB, an in-memory query engine for select-project-join queries on factorised databases. Key components of FDB are novel algorithms for query optimisation and evaluation that exploit the succinctness brought by data factorisation. Experiments show that for data sets with many-to-many relationships FDB can outperform relational engines by orders of magnitude.

It is twelve pages of dense slogging but I wonder if you have a reaction to:

Finally, factorised representations are relational algebra expressions with well-understood semantics. Their relational nature sets them apart from XML documents, object-oriented databases, and nested objects [2], where the goal is to avoid the rigidity of the relational model. (on the second page)

Where [2] is: S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.

Online version of Foundations of Databases

DBLP has a nice listing of the references (with links) in Foundations of Databases

Abiteboul and company are cited without a page reference (my printed edition is 685 pages long) and the only comparison that I can uncover between the relational model and any of those mentioned here is that an object-oriented database has oids, which aren’t members of a “printable class” as are keys.

I am not sure what sort of oid isn’t a member of a “printable” class but am willing to leave that to one side for the moment.

My problem is with the characterization “…to avoid the rigidity of the relational model.”

The relational model has been implemented in any number of rigid ways, but is that the fault of a model based on operations on tuples, which can be singletons?

What if factorisation were applied to a graph database, composed of singletons, enabling the use of “…relational algebra expressions with well-understood semantics.”?

It sounds like factorisation could speed up classes of “expected” queries across graph databases. I don’t think anyone creates a database, graph or otherwise, without some classes of queries in mind. The user would be no worse off when they create an unexpected query.