Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 18, 2014

ElasticSearch Analyzers – Parts 1 and 2

Filed under: ElasticSearch,Search Engines,Searching — Patrick Durusau @ 4:16 pm

Andrew Cholakian has written a two part introduction to analyzers in ElasticSearch.

All About Analyzers, Part One

From the introduction:

Choosing the right analyzer for an Elasticsearch query can be as much art as science. Analyzers are the special algorithms that determine how a string field in a document is transformed into terms in an inverted index. If you need a refresher on the basics of inverted indexes and where analysis fits into Elasticsearch in general please see this chapter in Exploring Elasticsearch covering analyzers. In this article we’ll survey various analyzers, each of which showcases a very different approach to parsing text.

Ten tokenizers, thirty-one token filters, and three character filters ship with the Elasticsearch distribution; a truly overwhelming number of options. This number can be increased further still through plugins, making the choices even harder to wrap one’s head around. Combinations of these tokenizers, token filters, and character filters create what’s called an analyzer. There are eight standard analyzers defined, but really, they are simply convenient shortcuts for arranging tokenizers, token filters, and character filters yourself. While reaching an understanding of this multitude of options may sound difficult, becoming reasonably competent in the use of analyzers is merely a matter of time and practice. Once the basic mechanisms behind analysis are understood, these tools are relatively easy to reason about and compose.

All About Analyzers, Part Two (continues part 1).

Very much worth your time if you need a refresher or analyzers for ElasticSearch and/or are approaching them for the first time.

Of course I went hunting for the treatment of synonyms, only to find the standard fare.

Not bad by any means but a grade school student knows synonyms depend upon any number of factors but you would be hard pressed to find that in any search engine.

I suppose you could define synonyms as most engines do and then filter the results to eliminate from a gene search “hits” from Field and Stream, Guns & Ammo, and the like. Although your searchers may be interested in how to trick out an AR-15. 😉

It may be that simple bulk steps are faster than more sophisticated searching. Will have to give that some thought.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress