Indexing Reverse Top-k Queries by Sean Chester, Alex Thomo, S. Venkatesh, and Sue Whitesides.
Abstract:
We consider the recently introduced monochromatic reverse top-k queries which ask for, given a new tuple q and a dataset D, all possible top-k queries on D union {q} for which q is in the result. Towards this problem, we focus on designing indexes in two dimensions for repeated (or batch) querying, a novel but practical consideration. We present the insight that by representing the dataset as an arrangement of lines, a critical k-polygon can be identified and used exclusively to respond to reverse top-k queries. We construct an index based on this observation which has guaranteed worst-case query cost that is logarithmic in the size of the k-polygon.
We implement our work and compare it to related approaches, demonstrating that our index is fast in practice. Furthermore, we demonstrate through our experiments that a k-polygon is comprised of a small proportion of the original data, so our index structure consumes little disk space.
This was the article that made me start looking for resources on “reverse data management.”
Interesting in its own right but I commend it to your attention in part because of the recognition that interesting challenges lie ahead with higher-dimensional indexing.
If you think about it, “indexing” in the sense of looking for a simple string token, is indexing in one dimension.
If you index a simple string token, but with the further requirement that it appear in JAMA (Journal of the American Medical Association), that is indexing in two dimensions.
If you index a simple string token, appearing in JAMA, but it must also be in an article with the “tag” cancer, now you are indexing in three dimensions.
A human indexer, creating an annual index of cancer publications would move along those dimensions with ease.
Topic maps are an attempt to capture the insight that allows automatic replication of the human indexer’s insight.