Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 20, 2012

Worst-case Optimal Join Algorithms

Filed under: Algorithms,Database,Joins — Patrick Durusau @ 3:52 pm

Worst-case Optimal Join Algorithms by Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra.

Abstract:

Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a novel algorithm to process these queries optimally in terms of worst-case data complexity. Our result builds on recent work by Atserias, Grohe, and Marx, who gave bounds on the size of a full conjunctive query in terms of the sizes of the individual relations in the body of the query. These bounds, however, are not constructive: they rely on Shearer’s entropy inequality which is information-theoretic. Thus, the previous results leave open the question of whether there exist algorithms whose running time achieve these optimal bounds. An answer to this question may be interesting to database practice, as it is known that any algorithm based on the traditional select-project-join style plans typically employed in an RDBMS are asymptotically slower than the optimal for some queries. We construct an algorithm whose running time is worst-case optimal for all natural join queries. Our result may be of independent interest, as our algorithm also yields a constructive proof of the general fractional cover bound by Atserias, Grohe, and Marx without using Shearer’s inequality. This bound implies two famous inequalities in geometry: the Loomis-Whitney inequality and the Bollob\’as-Thomason inequality. Hence, our results algorithmically prove these inequalities as well. Finally, we discuss how our algorithm can be used to compute a relaxed notion of joins.

With reference to the optimal join problem the authors say:

Implicitly, this problem has been studied for over three decades: a modern RDBMS use decades of highly tuned algorithms to efficiently produce query results. Nevertheless, as we described above, such systems are asymptotically suboptimal – even in the above simple example of (1). Our main result is an algorithm that achieves asymptotically optimal worst-case running times for all conjunctive join queries.

The author’s strategy involves evaluation of the keys in a join and the dividing of those keys into separate sets. The information used by the authors has always been present, just not used in join processing. (pp. 2-3 of the article)

There are a myriad of details to be mastered in the article but I suspect this line of thinking may be profitable in many situations where “join” operations are relevant.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress