Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 18, 2012

Different ways to make auto suggestions with Solr

Filed under: AutoComplete,AutoSuggestion,Query Expansion,Solr — Patrick Durusau @ 5:24 pm

Different ways to make auto suggestions with Solr

From the post:

Nowadays almost every website has a full text search box as well as the auto suggestion feature in order to help users to find what they are looking for, by typing the least possible number of characters possible. The example below shows what this feature looks like in Google. It progressively suggests how to complete the current word and/or phrase, and corrects typo errors. That’s a meaningful example which contains multi-term suggestions depending on the most popular queries, combined with spelling correction.

(graphic omitted)

There are different ways to make auto complete suggestions with Solr. You can find many articles and examples on the internet, but making the right choice is not always easy. The goal of this post is compare the available options in order to identify the best solution tailored to your needs, rather than describe any one specific approach in depth.

It’s common practice to make auto-suggestions based on the indexed data. In fact a user is usually looking for something that can be found within the index, that’s why we’d like to show the words that are similar to the current query and at the same time relevant within the index. On the other hand, it is recommended to provide query suggestions; we can for example capture and index on a specific solr core all the user queries which return more than zero results, so we can use those information to make auto-suggestions as well. What actually matters is that we are going to make suggestions based on what’s inside the index; for this purpose it’s not relevant if the index contains user queries or “normal data”, the solutions we are going to consider can be applied in both cases.

The Suggester module is the method that looks the most promising:

This solution has its own separate index which you can automatically build on every commit. Using collation you can have multi-term suggestions. Furthermore, it is possible to use a custom dictionary instead of the index content, which makes the current solution even more flexible.

I like to think of multi-term suggestions as tuneable query expansions that return materials on a subject more precisely than the original query.

The custom dictionary has even more potential:

When a file-based dictionary is used (non-empty sourceLocation parameter above) then it’s expected to be a plain text file in UTF-8 encoding. Blank lines and lines that start with a ‘#’ are ignored. The remaining lines must consist of either a string without literal TAB (\u0007) character, or a string and a TAB separated floating-point weight. (http://wiki.apache.org/solr/Suggester)

The custom dictionary can contain single terms or phrases.

Hmmm, a custom dictionary:

  1. Is easy to author
  2. Contains words and phrases
  3. Is an editorial artifact
  4. Not limited to a single Solr installation
  5. Could be domain specific
  6. Assists in returning more, not less precise results

The handling of the more precise results is up to your imagination.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress