Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 26, 2012

Using information retrieval technology for a corpus analysis platform

Filed under: Corpora,Corpus Linguistics,Information Retrieval,Lucene,MapReduce — Patrick Durusau @ 3:57 pm

Using information retrieval technology for a corpus analysis platform by Carsten Schnober.

Abstract:

This paper describes a practical approach to use the information retrieval engine Lucene for the corpus analysis platform KorAP, currently being developed at the Institut für Deutsche Sprache (IDS Mannheim). It presents a method to use Lucene’s indexing technique and to exploit it for linguistically annotated data, allowing full flexibility to handle multiple annotation layers. It uses multiple indexes and MapReduce techniques in order to keep KorAP scalable.

The support for multiple annotation layers is of particular interest to me because the “subjects” of interest in a text may vary from one reader to another.

Being mindful that for topic maps, the annotation layers and annotations themselves may be subjects for some purposes.

1 Comment

  1. […] Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity « Using information retrieval technology for a corpus analysis platform […]

    Pingback by KONVENS2012: The 11th Conference on Natural Language Processing (proceedings) « Another Word For It — September 26, 2012 @ 4:05 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress