From the webpage:
Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) is an open-source natural language processing system for information extraction from electronic medical record clinical free-text. It processes clinical notes, identifying types of clinical named entities from various dictionaries including the Unified Medical Language System (UMLS) – medications, diseases/disorders, signs/symptoms, anatomical sites and procedures. Each named entity has attributes for the text span, the ontology mapping code, subject (patient, family member, etc.) and context (negated/not negated, conditional, generic, degree of certainty). Some of the attributes are expressed as relations, for example the location of a clinical condition (locationOf relation) or the severity of a clinical condition (degreeOf relation).
Apache cTAKES was built using the Apache UIMA Unstructured Information Management Architecture engineering framework and Apache OpenNLP natural language processing toolkit. Its components are specifically trained for the clinical domain out of diverse manually annotated datasets, and create rich linguistic and semantic annotations that can be utilized by clinical decision support systems and clinical research. cTAKES has been used in a variety of use cases in the domain of biomedicine such as phenotype discovery, translational science, pharmacogenomics and pharmacogenetics.
Apache cTAKES employs a number of rule-based and machine learning methods. Apache cTAKES components include:
- Sentence boundary detection
- Tokenization (rule-based)
- Morphologic normalization
- POS tagging
- Shallow parsing
- Named Entity Recognition
- Dictionary mapping
- Semantic typing is based on these UMLS semantic types: diseases/disorders, signs/symptoms, anatomical sites, procedures, medications
Assertion module Dependency parser Constituency parser Semantic Role Labeler Coreference resolver Relation extractor Drug Profile module Smoking status classifier
The goal of cTAKES is to be a world-class natural language processing system in the healthcare domain. cTAKES can be used in a great variety of retrievals and use cases. It is intended to be modular and expandable at the information model and method level.
The cTAKES community is committed to best practices and R&D (research and development) by using cutting edge technologies and novel research. The idea is to quickly translate the best performing methods into cTAKES code.
Processing a text with cTAKES is a processing of adding semantic information to the text.
As you can imagine, the better the semantics that are added, the better searching and other functions become.
In order to make added semantic information interoperable, well, that’s a topic map question.
I first saw this in a tweet by Tim O’Reilly.