Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

August 5, 2014

A new proximity query for Lucene, using automatons

Filed under: Automata,Lucene,Search Engines — Patrick Durusau @ 6:34 pm

A new proximity query for Lucene, using automatons by Michael McCandless.

From the post:


As of Lucene 4.10 there will be a new proximity query to further generalize on MultiPhraseQuery and the span queries: it allows you to directly build an arbitrary automaton expressing how the terms must occur in sequence, including any transitions to handle slop.

automata

This is a very expert query, allowing you fine control over exactly what sequence of tokens constitutes a match. You build the automaton state-by-state and transition-by-transition, including explicitly adding any transitions (sorry, no QueryParser support yet, patches welcome!). Once that’s done, the query determinizes the automaton and then uses the same infrastructure (e.g. CompiledAutomaton) that queries like FuzzyQuery use for fast term matching, but applied to term positions instead of term bytes. The query is naively scored like a phrase query, which may not be ideal in some cases.

Micahael walks through current proximity queries before diving into the new proximity query for Lucene 4.10.

As always, this is a real treat!

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress