Enhancing search results using machine learning by Emmanuel Espina
From the introduction:
To introduce you in the topic let’s think about how the users are used to work with “information retrieval platforms” (I mean, search engines). The user enters your site, sees a little rectangular box with a button that reads “search” besides it, and figures out that he must think about some keywords to describe what he wants, write them in the search box and hit search. Despite we are all very used to this, a deeper analysis of the workings of this procedure leads to the conclusion that it is a quite unintuitive procedure. Before search engines, the action of “mentally extracting keywords” from concepts was not a so common activity.
It is something natural to categorize things, to classify the ideas or concepts, but extracting keywords is a different intellectual activity. While searching, the user must think like the search engine! The user must think “well, this machine will give me documents with the words I am going to enter, so which are the words that have the best chance to give me what I want” (emphasis added)
Hmmmm, but prior to full-text search, users learned how to think like the indexers who created the index they were using. Indexers were a first line of defense against unbounded information as indexes covered particular resources and had mechanisms to account for changing terminology. Not to mention domain specific vocabularies that users could master.
A second line of defense were librarians who not only mastered domain specific indexes but who could also move from one specialized finding aid to another, collating information as they went. The ability to transition from one finding aid is one that has yet to be duplicated by automatic means. In part because it depends on the resources available in a particular library.
Do read the article to see how the author proposes to use machine learning to improve search results.
BTW, do you know of any sets of query responses that are publicly available?