Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 30, 2013

Lucene now has an in-memory terms dictionary…

Filed under: Indexing,Lucene — Patrick Durusau @ 7:05 pm

Lucene now has an in-memory terms dictionary, thanks to Google Summer of Code by Mike McCandless.

From the post:

Last year, Han Jiang’s Google Summer of Code project was a big success: he created a new (now, default) postings format for substantially faster searches, along with smaller indices.

This summer, Han was at it again, with a new Google Summer of Code project with Lucene: he created a new terms dictionary holding all terms and their metadata in memory as an FST.

In fact, he created two new terms dictionary implementations. The first, FSTTermsWriter/Reader, hold all terms and metadata in a single in-memory FST, while the second, FSTOrdTermsWriter/Reader, does the same but also supports retrieving the ordinal for a term (TermsEnum.ord()) and looking up a term given its ordinal (TermsEnum.seekExact(long ord)). The second one also uses this ord internally so that the FST is more compact, while all metadata is stored outside of the FST, referenced by ord.

Lucene continues to improve, rapidly!

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress