New index statistics in Lucene 4.0

New index statistics in Lucene 4.0

Mike McCandless writes:

In the past, Lucene recorded only the bare minimal aggregate index statistics necessary to support its hard-wired classic vector space scoring model.

Fortunately, this situation is wildly improved in trunk (to be 4.0), where we have a selection of modern scoring models, including Okapi BM25, Language models, Divergence from Randomness models and Information-based models. To support these, we now save a number of commonly used index statistics per index segment, and make them available at search time.

Mike uses a simple example to illustrate the statistics available in Lucene 4.0.

Comments are closed.