Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 29, 2012

Lucene’s new analyzing suggester [Can You Say Synonym?]

Filed under: Lucene,Synonymy — Patrick Durusau @ 3:49 pm

Lucene’s new analyzing suggester by Mike McCandless.

From the post:

Live suggestions as you type into a search box, sometimes called suggest or autocomplete, is now a standard, essential search feature ever since Google set a high bar after going live just over four years ago.

In Lucene we have several different suggest implementations, under the suggest module; today I’m describing the new AnalyzingSuggester (to be committed soon; it should be available in 4.1).

To use it, you provide the set of suggest targets, which is the full set of strings and weights that may be suggested. The targets can come from anywhere; typically you’d process your query logs to create the targets, giving a higher weight to those queries that appear more frequently. If you sell movies you might use all movie titles with a weight according to sales popularity.

You also provide an analyzer, which is used to process each target into analyzed form. Under the hood, the analyzed form is indexed into an FST. At lookup time, the incoming query is processed by the same analyzer and the FST is searched for all completions sharing the analyzed form as a prefix.

Even though the matching is performed on the analyzed form, what’s suggested is the original target (i.e., the unanalyzed input). Because Lucene has such a rich set of analyzer components, this can be used to create some useful suggesters:

One of the use cases that Mike mentions is use of the AnalyzingSuggester to suggest synonyms of terms entered by a user.

That presumes that you know the target of the search and likely synonyms that occur in it.

Use standard synonym sets and you will get standard synonym results.

Develop custom synonym sets and you can deliver time/resource saving results.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress