Fast Set Intersection in Memory by Bolin Ding and Arnd Christian König.
Set intersection is a fundamental operation in information retrieval and database systems. This paper introduces linear space data structures to represent sets such that their intersection can be computed in a worst-case efficient way. In general, given k (preprocessed) sets, with totally n elements, we will show how to compute their intersection in expected time O(n / sqrt(w) + kr), where r is the intersection size and w is the number of bits in a machine-word. In addition,we introduce a very simple version of this algorithm that has weaker asymptotic guarantees but performs even better in practice; both algorithms outperform the state of the art techniques for both synthetic and real data sets and workloads.
Important not only for the algorithm but how they arrived at it.
They peeked at the data.
Not trying to solve the set intersection problem in the abstract but looking at data you are likely to encounter.
I am all for the pure theory side of things but there is something to be said for less airy (dare I say windy?) solutions.