Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 18, 2012

A Language-Independent Approach to Keyphrase Extraction and Evaluation

A Language-Independent Approach to Keyphrase Extraction and Evaluation (2010) by Mari-sanna Paukkeri, Ilari T. Nieminen, Matti Pöllä and Timo Honkela.

Abstract:

We present Likey, a language-independent keyphrase extraction method based on statistical analysis and the use of a reference corpus. Likey has a very light-weight preprocessing phase and no parameters to be tuned. Thus, it is not restricted to any single language or language family. We test Likey having exactly the same configuration with 11 European languages. Furthermore, we present an automatic evaluation method based on Wikipedia intra-linking.

Useful approach for developing a rough-cut of keywords in documents. Keywords that may indicate a need for topics to represent subjects.

Interesting that:

Phrases occurring only once in the document cannot be selected as keyphrases.

I would have thought unique phrases would automatically qualify as keyphrases. The ranking of phrases, calculated with the reference corpus and text, excludes unique phrases, in the absence of any ratio for ranking.

That sounds like a bug and not a feature to me.

Reasoning that phrases unique to an author are unique identifications of subjects. Certainly grist for a topic map mill.

Web based demonstration: http://cog.hut.fi/likeydemo/.

Mari-Sanna Paukkeri: Contact details and publications.

1 Comment

  1. […] approach but be aware that if it was using Likely as described in: A Language-Independent Approach to Keyphrase Extraction and Evaluation, the absence of phrases in the reference corpus may mean the phrases are omitted from the […]

    Pingback by Authors and Articles, Keywords, SOMs and Graphs [Oh My!] « Another Word For It — November 18, 2012 @ 4:28 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress