Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

August 22, 2011

Finite State Transducers in Lucene

Filed under: Indexing,Software — Patrick Durusau @ 7:42 pm

I found part 1 of this series a DZone but there was no reference to part 2. I found part 2 by tracing the article back to its original blog post and seeing it was followed by part 2.

Using Finite State Transducers in Lucene

Finite State Transducers, Part 2

I won’t try to summarize the posts, they are short and heavy on links to more material but would quote this comment from the second article:

To test this, I indexed the first 10 million 1KB documents derived from Wikipedia’s English database download. The resulting RAM required for the FST was ~38% – 52% smaller (larger segments see more gains, as the FST “scales up” well). Not only is the RAM required much lower, but term lookups are also faster: the FuzzyQuery united~2 was ~22% faster.

If using less RAM and faster lookups are of interest to you, these posts should be on your reading list.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress