A new proximity query for Lucene, using automatons by Michael McCandless.
From the post:
…
As of Lucene 4.10 there will be a new proximity query to further generalize on MultiPhraseQuery and the span queries: it allows you to directly build an arbitrary automaton expressing how the terms must occur in sequence, including any transitions to handle slop.This is a very expert query, allowing you fine control over exactly what sequence of tokens constitutes a match. You build the automaton state-by-state and transition-by-transition, including explicitly adding any transitions (sorry, no
QueryParser
support yet, patches welcome!). Once that’s done, the query determinizes the automaton and then uses the same infrastructure (e.g.CompiledAutomaton
) that queries like FuzzyQuery use for fast term matching, but applied to term positions instead of term bytes. The query is naively scored like a phrase query, which may not be ideal in some cases.
…
Micahael walks through current proximity queries before diving into the new proximity query for Lucene 4.10.
As always, this is a real treat!