Apache cTAKES « Another Word For It

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 10, 2013

Apache cTAKES

Filed under: Knowledge Discovery,Medical Informatics,Natural Language Processing,Text Analytics — Patrick Durusau @ 5:01 am

Apache cTAKES

From the webpage:

Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) is an open-source natural language processing system for information extraction from electronic medical record clinical free-text. It processes clinical notes, identifying types of clinical named entities from various dictionaries including the Unified Medical Language System (UMLS) – medications, diseases/disorders, signs/symptoms, anatomical sites and procedures. Each named entity has attributes for the text span, the ontology mapping code, subject (patient, family member, etc.) and context (negated/not negated, conditional, generic, degree of certainty). Some of the attributes are expressed as relations, for example the location of a clinical condition (locationOf relation) or the severity of a clinical condition (degreeOf relation).

Apache cTAKES was built using the Apache UIMA Unstructured Information Management Architecture engineering framework and Apache OpenNLP natural language processing toolkit. Its components are specifically trained for the clinical domain out of diverse manually annotated datasets, and create rich linguistic and semantic annotations that can be utilized by clinical decision support systems and clinical research. cTAKES has been used in a variety of use cases in the domain of biomedicine such as phenotype discovery, translational science, pharmacogenomics and pharmacogenetics.

Apache cTAKES employs a number of rule-based and machine learning methods. Apache cTAKES components include:

Sentence boundary detection

Tokenization (rule-based)

Morphologic normalization

POS tagging

Shallow parsing

Named Entity Recognition

Dictionary mapping

Semantic typing is based on these UMLS semantic types: diseases/disorders, signs/symptoms, anatomical sites, procedures, medications

Assertion module

Dependency parser

Constituency parser

Semantic Role Labeler

Coreference resolver

Relation extractor

Drug Profile module

Smoking status classifier

The goal of cTAKES is to be a world-class natural language processing system in the healthcare domain. cTAKES can be used in a great variety of retrievals and use cases. It is intended to be modular and expandable at the information model and method level.
The cTAKES community is committed to best practices and R&D (research and development) by using cutting edge technologies and novel research. The idea is to quickly translate the best performing methods into cTAKES code.

Processing a text with cTAKES is a processing of adding semantic information to the text.

As you can imagine, the better the semantics that are added, the better searching and other functions become.

In order to make added semantic information interoperable, well, that’s a topic map question.

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 10, 2013

Apache cTAKES

No Comments