CRAFT: THE COLORADO RICHLY ANNOTATED FULL TEXT CORPUS
From the Quick Facts:
- 67 full text articles
- >560,000 Tokens
- >21,000 Sentences
- ~100,000 concept annotations to 7 different biomedical ontologies/terminologies
- Chemical Entities of Biological Interest (ChEBI)
- Cell Type Ontology (CL)
- Entrez Gene
- Gene Ontology (biological process, cellular component, and molecular function)
- NCBI Taxonomy
- Protein Ontology
- Sequence Ontology
- Penn Treebank markup for each sentence
- Multiple output formats available
Let’s see: 67 articles resulted in 100,000 concept annotations, or about 1,493 per article for seven (7) ontologies/terminologies.
Ready to test this mapping out in your topic map application?