Lucene now has an in-memory terms dictionary, thanks to Google Summer of Code by Mike McCandless.
From the post:
Last year, Han Jiang’s Google Summer of Code project was a big success: he created a new (now, default) postings format for substantially faster searches, along with smaller indices.
This summer, Han was at it again, with a new Google Summer of Code project with Lucene: he created a new terms dictionary holding all terms and their metadata in memory as an FST.
In fact, he created two new terms dictionary implementations. The first,
FSTTermsWriter/Reader
, hold all terms and metadata in a single in-memoryFST
, while the second,FSTOrdTermsWriter/Reader
, does the same but also supports retrieving the ordinal for a term (TermsEnum.ord()
) and looking up a term given its ordinal (TermsEnum.seekExact(long ord)
). The second one also uses thisord
internally so that theFST
is more compact, while all metadata is stored outside of theFST
, referenced byord
.
…
Lucene continues to improve, rapidly!