Interesting blog post on indexing by Kyle Banker of MongoDB.
Recommended in part to understanding the limits of traditional indexing.
Ask yourself, what is the index in Kyle’s examples indexing?
Kyle says the example are indexing recipes but is that really true?
Or is it the case that the index is indexing the occurrence of a string at a location in the text?
Not exactly the same thing.
That is to say there is a difference between a token that appears in a text and a subject we think about when we see that token.
It is what enables us to say that two or more words that are spelled differently are synonyms.
Something other that the two words as strings is what we are relying on to make the claim they are synonyms.
A traditional indexing engine, of the sort described here, can only index the strings it encounters in the text.
What would be more useful would be an indexing engine that indexed the subjects in a text.
I think we would call such a subject-indexing engine a topic map engine. Yes?
- Do you agree/disagree that a word indexing engine is not a subject indexing engine? (3-5 pages, no citations)
- What would you change about a word indexing engine (if anything) to make it a subject indexing engine? (3-5 pages, no citations)
- What texts/subjects would you use as test cases for your engine? (3-5 pages, citations of the test documents)