Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 24, 2010

Explicit Semantic Analysis

Filed under: Classification,Data Integration,Information Retrieval,Semantics — Patrick Durusau @ 7:58 am

Explicit Semantic Analysis looks like another tool for the topic maps toolkit.

Not 100% accurate but close enough to give a topic map project involving a serious amount of text a running start.

Start with Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis by Evgeniy Gabrilovich and Shaul Markovitch.

There are 55 citations of this work (as of 2010-04-24), ranging from Geographic Information Retrieval and Beyond the Stars: Exploiting Free-Text User Reviews for Improving the Accuracy of Movie Recommendations (2009) to Explicit Versus Latent Concept Models for Cross-Language Information Retrieval.

I encountered this line of work while reading Combining Concept Based and Text Based Indexes for CLIR by Philipp Sorg and Philipp Cimiano (slides) from the 2009 Cross Language Evaluation Forum. (For any search engines, CLIR = Cross-Language Information Retrieval.) Cross Language Evaluation Forum General link because it does not expose direct links to resources.

Quibble:

Evgeniy Gabrilovich and Shaul Markovitch say that:

We represent texts as a weighted mixture of a predetermined set of natural concepts, which are defined by humans themselves and can be easily explained. To achieve this aim, we use concepts defined by Wikipedia articles, e.g., COMPUTER SCIENCE, INDIA, or LANGUAGE.

and

The choice of encyclopedia articles as concepts is quite natural, as each article is focused on a single issue, which it discusses in detail.

Their use of “natural,” which I equate in academic writing to “…a miracle occurs…,” drew my attention. There are things we choose to treat as concepts or even subject representatives, but that hardly makes them “natural.” Most academic articles would claim (whether true or not) to be “…focused on a single issue, which it discusses in detail.”

Rather than “natural concepts,” describe the headers of Wikipedia texts. More accurate and sets the groundwork for investigation into the nature and length of headers and their impact on semantic mapping and information retrieval.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress