Web query disambiguation using PageRank by Christos Makris, Yannis Plegas, and Sofia Stamou. (Makris, C., Plegas, Y. and Stamou, S. (2012), Web query disambiguation using PageRank. J. Am. Soc. Inf. Sci.. doi: 10.1002/asi.22685)
Abstract:
In this article, we propose new word sense disambiguation strategies for resolving the senses of polysemous query terms issued to Web search engines, and we explore the application of those strategies when used in a query expansion framework. The novelty of our approach lies in the exploitation of the Web page PageRank values as indicators of the significance the different senses of a term carry when employed in search queries. We also aim at scalable query sense resolution techniques that can be applied without loss of efficiency to large data sets such as those on the Web. Our experimental findings validate that the proposed techniques perform more accurately than do the traditional disambiguation strategies and improve the quality of the search results, when involved in query expansion.
A better summary of the author’s approach lies within the article:
The intuition behind our method is that we could improve the Web users’ search experience if we could correlate the importance that the sense of a term has when employed in a query (i.e., the importance of the sense as perceived by the information seeker) with the importance the same sense has when contained in a Web page (i.e., the importance of the sense as perceived by the information provider). We rely on the exploitation of PageRank because of its effectiveness in capturing the importance of every page on the Web graph based on their links’ connectivity, and from which we may infer the importance of every page in the “collective mind” of the Web content providers/creators. To account for that, we explore whether the PageRank value of a page may serve as an indicator of how significant the dominant senses of a query-matching term in the page are and, based on that, disambiguate the query.
Which reminds me of statistical machine translation, which replaced syntax based methods years ago.
Perhaps pagerank is summing our linguistic preferences from some word senses.
If that is the case, how would you incorporate that in ranking results to be delivered to a user from a topic map? There are different possible search outcomes, how do we establish the one a user prefers?