It came up in a conversation with Sam Hunting recently that search engines are holding a large end of a long tail of semantics. Well, that is how I would summarize almost 30 minutes of hitting at and around the idea!
Think about it, search engines by their present construction, are bound to a large end of a long tail of search results. That is the end of the long tail that they report to users, with varying degrees of filtering and enhancement, not to mention paid ads.
Problem: The long tail of semantics hasn’t been established in general, for some particular set of terms and certainly not for any particular user. Opps. (as Rick Perry would say)
And each search result represents some unknown position in some long tail of semantics for a particular user. Opps, again.
Search engines do well enough to keep users coming back, so they are hitting some part of the long tail of semantics, they just don’t know what part for any particular user.
I am sure it is easier to count occurrences, queries and the like and trust that the search engine is hitting high enough somewhere on the long tail to justify ad rates.
But what if we could improve that? That is not be banging around somewhere on a long tail of semantics in general but some particular sub-tail of semantics.
For example, we know when terminology is being taken from an English language journal on cardiology. We have three semantic indicators, English as a language, journal as means of publication and cardiology as a subject area. What is more, we can discover without too much difficulty, the papers cited by authors of that journal. Which more likely than note would be recognized by other readers of that journal. So what if we kept the results from that area segregated from other search results and did the same (virtually) with other recognized areas. (Mathematics for example have varying terms even within disciplines, set theory for example, so work would be left to be done.)
Rather than putting search results together and later trying to disambiguate them, start that process at the beginning and preserve as much data as we can that may help distinguish part of a long tail into smaller ones.
(This sounds like “personalization” to me as I write it but personalization has its own hazards and dangers. Some of which can be avoided by asking a librarian. More on that another time.)