Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

August 19, 2012

CRAFT: THE COLORADO RICHLY ANNOTATED FULL TEXT CORPUS

Filed under: Bioinformatics,Biomedical,Corpora,Natural Language Processing — Patrick Durusau @ 3:41 pm

CRAFT: THE COLORADO RICHLY ANNOTATED FULL TEXT CORPUS

From the Quick Facts:

  • 67 full text articles
  • >560,000 Tokens
  • >21,000 Sentences
  • ~100,000 concept annotations to 7 different biomedical ontologies/terminologies
    • Chemical Entities of Biological Interest (ChEBI)
    • Cell Type Ontology (CL)
    • Entrez Gene
    • Gene Ontology (biological process, cellular component, and molecular function)
    • NCBI Taxonomy
    • Protein Ontology
    • Sequence Ontology
  • Penn Treebank markup for each sentence
  • Multiple output formats available

Let’s see: 67 articles resulted in 100,000 concept annotations, or about 1,493 per article for seven (7) ontologies/terminologies.

Ready to test this mapping out in your topic map application?

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress