Query Rewriting in Search Engines by Hugh Williams was mentioned in Amazon CloudSearch, Elastic Search as a Service by Jeff Dalton. (You need to read Jeff’s comments on Amazon Cloudsearch but onto query rewriting.)
From the post:
There’s countless information on search ranking – creating ranking functions, and their factors such as PageRank and text. Query rewriting is less conspicuous but equally important. Our experience at eBay is that query rewriting has the potential to deliver as much improvement to search as core ranking, and that’s what I’ve seen and heard at other companies.
What is query rewriting?
Let’s start with an example. Suppose a user queries for Gucci handbags at eBay. If we take this literally, the results will be those that have the words Gucci and handbags somewhere in the matching documents. Unfortunately, many great answers aren’t returned. Why?
Consider a document that contains Gucci and handbag, but never uses the plural handbags. It won’t match the query, and won’t be returned. Same story if the document contains Gucci and purse (rather than handbag). And again for a document that contains Gucci but doesn’t contain handbags or a synonym – instead it’s tagged in the “handbags” category on eBay; the user implicitly assumed it’d be returned when a buyer types Gucci handbags as their query.
To solve this problem, we need to do one of two things: add words to the documents so that they match other queries, or add words to the queries so that they match other documents. Query rewriting is the latter approach, and that’s the topic of this post. What I will say about expanding documents is there are tradeoffs: it’s always smart to compute something once in search and store it, rather than compute it for every query, and so there’s a certain attraction to modifying documents once. On the other hand, there are vastly more words in documents than there are words in queries, and doing too much to documents gets expensive and leads to imprecise matching (or returning too many irrelevant documents). I’ve also observed over the years that what works for queries doesn’t always work for documents.
You really need to read the post by Hugh a couple of times.
Query rewriting is approaching the problem of subject identity from the other side of topic maps.
Topic maps collect different identifiers for a subject as a basis for “merging”.
Query rewriting changes a query so it specifies different identifiers for a subject.
Let me try to draw a graphic for you (my graphic skills are crude at best):
I used “/” as an alternative marker for topic maps to illustrate that matching any identifier returns all of them. For query rewrite, the “+” sign indicates that each identifier is searched for in addition to the others.
The result is the same set of identifiers and results from using them on a query set.
From a different point of view.