Client-side search by Gene Golovchinsky.
From the post:
One simplifying assumption I could make was that my document collection was static: sorry, the submission deadline for the conference has passed. Thus, I could have a static index that could be made available to each client, and all the client needed to do was match and rank.
Each of my documents had a three character id, and a set of fields. I didn’t bother with the fields, and just lumped everything together in the index. The approach was simple, again due to lots of assumptions. I treated the inverted index as a hash table that maps keywords onto lists of document ids. OK, document ids and term frequencies. Including positional information is an exercise left to the reader.
A refreshing reminder that simplified requirements can lead to successful applications.
Or to put it another way, not every application has to meet every possible use case.
For example, I might want to have a photo matching application that only allows users to pick match/no match for any pair of photos.
Not why, what reasons for match/no match, etc.
But it does capture the users identity in an association as saying photo # and photo # are of the same person.
That doesn’t provide any basis for automated comparison of those judgments, but not every judgment is required to do so.
I am starting to think of subject identification as a continuum of practices, some of which enable more reuse than others.
Which of those you choose, depends upon your requirements, your resources and other factors.