Zoie: Real-time search indexing
Somehow appropriate that following the lead on Kafka would lead me to Zoie (and other goodies to be reported).
From the website:
Zoie is a real-time search and indexing system built on Apache Lucene.
Donated by LinkedIn.com on July 19, 2008, and has been deployed in a real-time large-scale consumer website: LinkedIn.com handling millions of searches as well as hundreds of thousands of updates daily.
News: Zoie 2.0.0 is released … – Compatible with Lucene 2.9.x.
In a real-time search/indexing system, a document is made available as soon as it is added to the index. This functionality is especially important to time-sensitive information such as news, job openings, tweets etc.
Design Goals:
- Additions of documents must be made available to searchers immediately
- Indexing must not affect search performance
- Additions of documents must not fragment the index (which hurts search performance)
- Deletes and/or updates of documents must not affect search performance.
In topic map terms:
- Additions to topic map must be made available to searchers immediately
- Indexing must not affect search performance
- Additions to topic map must not fragment the index (which hurts search performance)
- Deletes and/or updates of a topic map must not affect search performance.
I would say that #’s 3 and 4 are research questions at this point.
Additions, updates and deletions in a topic map may have unforeseen (unforeseeable?) consequences.
Such as causing:
- merging to occur
- merging to be undone
- roles to be played
- roles to not be played
- association to be valid
- association to be invalid
to name only a few.
It may be possible to formally prove the impact that certain events will have but I am not aware of any definitive analysis on the subject.
Suggestions?