A Language-Independent Approach to Keyphrase Extraction and Evaluation (2010) by Mari-sanna Paukkeri, Ilari T. Nieminen, Matti Pöllä and Timo Honkela.
Abstract:
We present Likey, a language-independent keyphrase extraction method based on statistical analysis and the use of a reference corpus. Likey has a very light-weight preprocessing phase and no parameters to be tuned. Thus, it is not restricted to any single language or language family. We test Likey having exactly the same configuration with 11 European languages. Furthermore, we present an automatic evaluation method based on Wikipedia intra-linking.
Useful approach for developing a rough-cut of keywords in documents. Keywords that may indicate a need for topics to represent subjects.
Interesting that:
Phrases occurring only once in the document cannot be selected as keyphrases.
I would have thought unique phrases would automatically qualify as keyphrases. The ranking of phrases, calculated with the reference corpus and text, excludes unique phrases, in the absence of any ratio for ranking.
That sounds like a bug and not a feature to me.
Reasoning that phrases unique to an author are unique identifications of subjects. Certainly grist for a topic map mill.
Web based demonstration: http://cog.hut.fi/likeydemo/.
Mari-Sanna Paukkeri: Contact details and publications.