Bad vs Good Search Experience by Emir Dizdarevic.
From the post:
The Problem
This article will show how a bad search solution can be improved. We will demonstrate how to build an enterprise search solution relatively easy using Apache Lucene/SOLR.
We took a local ad site as an example of a bad search experience.
We crawled the ad site with Apache Nutch, using a couple of home grown plugins to fetch only the data we want and not the whole site. Stay tuned for a separate article on this topic.
‘BAD’ search is based on real search results from the ad site i.e. how the website search currently works. ‘GOOD ‘ search is based on same data but indexed with Apache Lucene/Solr (inverted index).
BAD Search: We assume that it’s based on exact match criteria or something similar to ‘%like%’ database statement. To simulate this behavior we used content field that it tokenized by whitespace, lowercased and used phrase queries every time. This is the closest we could get to existing ad site search solution, but even this bad it was performing better.
An excellent post in part because of the detailed example but also to show that improving search results is an iterative process.
Enjoy!