I was reading the Jaccard distance treatment in Anand Rajaraman and Jeffrey D. Ullman and something that keeps nagging at me became clearer.
Is document indexing the wrong level for indexing?
Take a traditional research paper as an example.
You would give me low marks if I handed in a paper with the following as one of my footnotes:
# Principia Mathematica, Volume 1
But that is a perfectly acceptable result for a search engine. I am pointed to an entire document as relevant to my search.
True enough but hardly very helpful.
Search engines can take me to a document but that still leaves all the hard work to me.
Not that I mind the hard work but that hard work is done over and over again, as each user encounters the document.
Seems terribly inefficient to have the same work done each time the document is returned.
Say for example that I am searching for the proof that 1 + 1 = 2, I should be able to create a representative for that subject that points every searcher to the same location. As opposed to them digging out that bit of information for themselves.
I have heard that bit of information assigned various locations in Principia Mathematica. I am acquiring a reprint so I can verify its location for myself and will be posting its location.
Topic maps help because they are about subject indexing which I take to be different from document indexing.
A document index only tells you that somewhere in a document, one or more terms relevant to your search may be found. Not terribly helpful.
A subject index, on the other hand, particularly if made using a topic map, not only isolates the location of a subject but can also tell you additional information about the subject. Such as other information about the subject.