Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 31, 2012

UIMA Analysis Engine for Keyword Recognition and Transformation

Filed under: Indexing,Lucene,UIMA — Patrick Durusau @ 4:39 pm

UIMA Analysis Engine for Keyword Recognition and Transformation by Sujit Pal.

From the post:

You have probably noticed that I’ve been playing with UIMA lately, perhaps a bit aimlessly. One of my goals with UIMA is to create an Analysis Engine (AE) that I can plug into the front of the Lucene analyzer chain for one of my applications. The AE would detect and mark keywords in the input stream so they would be exempt from stemming by downstream Lucene analyzers.

So couple of weeks ago, I picked up the bits and pieces of UIMA code that I had written and started to refactor them to form a sequence of primitive AEs that detected keywords in text using pattern and dictionary recognition. Each primitive AE places new KeywordAnnotation objects into an annotation index.

The primitive AEs I came up with are pretty basic, but offers a surprising amount of bang for the buck. There are just two annotators – the PatternAnnotator and DictionaryAnnotator – that do the processing for my primitive AEs listed below. Obviously, more can be added (and will, eventually) as required.

  • Pattern based keyword recognition
  • Pattern based keyword recognition and transformation
  • Dictionary based keyword recognition, case sensitive
  • Dictionary based keyword recognition and transformation, case sensitive
  • Dictionary based keyword recognition, case insensitive
  • Dictionary based keyword recognition and transformation, case insensitive

The first of two posts that I missed from last year, recently brought to my attention by Jack Park.

The ability to annotate, implying, among other things, the ability to create synonym annotations for keywords.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress