Large Data Sets and Ever Changing Terminology from TaxoDiary.
From the post:
Indexing enables accurate, consistent retrieval to the full depth and breadth of the collection. This does not mean that the statistics-based systems the government loves so much will go away, but they are learning to embrace the addition of taxonomy terms as indexing.
To answer your question, relevant metadata, tagging, normalization of entity references and similar indexing functions just make it easier to allow a person to locate what’s needed. Easy to say and very hard to do.
Search is like having to stand in a long line waiting to order a cold drink on a hot day. So there will always be dissatisfaction because “search” stands between you and what you want. You want the drink but hate the line. That said, I think the reason controlled indexing (taxonomy or thesaurus) is so popular compared to the free ranging keywords is that they have control. They make moving through the line efficient. You know how long the wait is and what terms you need to achieve the result.
I like the “…cold drink on a hot day” comparison. Goes on to point out the problems created by “ever changing terminology.” Which isn’t something that is going to stop happening. People have been inventing new terminologies probably as long as we have been able to communicate. However poorly we do that.
The post does advocate use of MAI (machine assisted indexing) from the author’s company but the advantages of indexing ring true whatever you use to achieve that result.
I do think the author should get kudo’s for pointing out that indexing is a hard problem. No magic cures, no syntax to save everyone, and no static solutions. As domains change, so do indexes. It is just as simple as that.