Screaming fast Lucene searches using C++ via JNI by Michael McCandless.
From the post:
At the end of the day, when Lucene executes a query, after the initial setup the true hot-spot is usually rather basic code that decodes sequential blocks of integer docIDs, term frequencies and positions, matches them (e.g. taking union or intersection for BooleanQuery), computes a score for each hit and finally saves the hit if it’s competitive, during collection.
Even apparently complex queries like FuzzyQuery or WildcardQuery go through a rewrite process that reduces them to much simpler forms like BooleanQuery.
Lucene’s hot-spots are so simple that optimizing them by porting them to native C++ (via JNI) was too tempting!
So I did just that, creating the lucene-c-boost github project, and the resulting speedups are exciting:
(…)
Speedups range from 0.7 X to 7.8 X.
Read Michael’s post for explanations, warnings, caveats, etc.
But it is exciting news!