Client-side search by Gene Golovchinsky.
From the post:
When we rolled out the CHI 2013 previews site, we got a couple of requests for being able to search the site with keywords. Of course interfaces for search are one of my core research interests, so that request got me thinking. How could we do search on this site? The problem with the conventional approach to search is that it requires some server-side code to do the searching and to return results to the client. This approach wouldn’t work for our simple web site, because from the server’s perspective, our site was static — just a few HTML files, a little bit of JavaScript, and about 600 videos. Using Google to search the site wouldn’t work either, because most of the searchable content is located on two pages, with hundreds of items on each page. So what to do?
I looked around briefly trying to find some client-side indexing and retrieval code, and struck out. Finally, I decided to take a crack at writing a search engine in JavaScript. Now, before you get your expectations up, I was not trying to re-implement Lucene in JavaScript. All I wanted was some rudimentary keyword search capability. Building that in JavaScript was not so difficult.
One simplifying assumption I could make was that my document collection was static: sorry, the submission deadline for the conference has passed. Thus, I could have a static index that could be made available to each client, and all the client needed to do was match and rank.
Each of my documents had a three character id, and a set of fields. I didn’t bother with the fields, and just lumped everything together in the index. The approach was simple, again due to lots of assumptions. I treated the inverted index as a hash table that maps keywords onto lists of document ids. OK, document ids and term frequencies. Including positional information is an exercise left to the reader.
A refreshing reminder that simplified requirements can lead to successful applications.
Or to put it another way, not every application has to meet every possible use case.
For example, I might want to have a photo matching application that only allows users to pick match/no match for any pair of photos.
Not why, what reasons for match/no match, etc.
But it does capture the users identity in an association as saying photo # and photo # are of the same person.
That doesn’t provide any basis for automated comparison of those judgments, but not every judgment is required to do so.
I am starting to think of subject identification as a continuum of practices, some of which enable more reuse than others.
Which of those you choose, depends upon your requirements, your resources and other factors.