Archive for the ‘Polysemy’ Category

Web query disambiguation using PageRank

Sunday, July 1st, 2012

Web query disambiguation using PageRank by Christos Makris, Yannis Plegas, and Sofia Stamou. (Makris, C., Plegas, Y. and Stamou, S. (2012), Web query disambiguation using PageRank. J. Am. Soc. Inf. Sci.. doi: 10.1002/asi.22685)


In this article, we propose new word sense disambiguation strategies for resolving the senses of polysemous query terms issued to Web search engines, and we explore the application of those strategies when used in a query expansion framework. The novelty of our approach lies in the exploitation of the Web page PageRank values as indicators of the significance the different senses of a term carry when employed in search queries. We also aim at scalable query sense resolution techniques that can be applied without loss of efficiency to large data sets such as those on the Web. Our experimental findings validate that the proposed techniques perform more accurately than do the traditional disambiguation strategies and improve the quality of the search results, when involved in query expansion.

A better summary of the author’s approach lies within the article:

The intuition behind our method is that we could improve the Web users’ search experience if we could correlate the importance that the sense of a term has when employed in a query (i.e., the importance of the sense as perceived by the information seeker) with the importance the same sense has when contained in a Web page (i.e., the importance of the sense as perceived by the information provider). We rely on the exploitation of PageRank because of its effectiveness in capturing the importance of every page on the Web graph based on their links’ connectivity, and from which we may infer the importance of every page in the “collective mind” of the Web content providers/creators. To account for that, we explore whether the PageRank value of a page may serve as an indicator of how significant the dominant senses of a query-matching term in the page are and, based on that, disambiguate the query.

Which reminds me of statistical machine translation, which replaced syntax based methods years ago.

Perhaps pagerank is summing our linguistic preferences from some word senses.

If that is the case, how would you incorporate that in ranking results to be delivered to a user from a topic map? There are different possible search outcomes, how do we establish the one a user prefers?

Draft (polysemy and ambiguity)

Sunday, January 22nd, 2012

Draft by Mark Liberman

From the post:

In a series of Language Log posts, Geoff Pullum has called attention to the prevalence of polysemy and ambiguity:

The people who think clarity involves lack of ambiguity, so we have to strive to eliminate all multiple meanings and should never let a word develop a new sense… they simply don’t get it about how language works, do they?

Languages love multiple meanings. They lust after them. They roll around in them like a dog in fresh grass.

The other day, as I reading a discussion in our comments about whether English draftable does or doesn’t refer to the same concept as Finnish asevelvollisuus (“obligation to serve in the military”), I happened to be sitting in a current of uncomfortably cold air. So of course I wondered how the English word draft came to refer to military conscription as well as air flow. And a few seconds of thought brought to mind several others senses of the the noun draft and its associated verb. I figured that this must represent a confusion of several originally separate words. But then I looked it up.

If you like language and have an appreciation for polsemy and ambiguity, you will enjoy this post a lot.