Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 4, 2012

Manual Gene Ontology annotation workflow

Filed under: Annotation,Bioinformatics,Curation,Ontology — Patrick Durusau @ 9:00 pm

Manual Gene Ontology annotation workflow at the Mouse Genome Informatics Database by Harold J. Drabkin, Judith A. Blake and for the Mouse Genome Informatics Database. Database (2012) 2012 : bas045 doi: 10.1093/database/bas045.

Abstract:

The Mouse Genome Database, the Gene Expression Database and the Mouse Tumor Biology database are integrated components of the Mouse Genome Informatics (MGI) resource (http://www.informatics.jax.org). The MGI system presents both a consensus view and an experimental view of the knowledge concerning the genetics and genomics of the laboratory mouse. From genotype to phenotype, this information resource integrates information about genes, sequences, maps, expression analyses, alleles, strains and mutant phenotypes. Comparative mammalian data are also presented particularly in regards to the use of the mouse as a model for the investigation of molecular and genetic components of human diseases. These data are collected from literature curation as well as downloads of large datasets (SwissProt, LocusLink, etc.). MGI is one of the founding members of the Gene Ontology (GO) and uses the GO for functional annotation of genes. Here, we discuss the workflow associated with manual GO annotation at MGI, from literature collection to display of the annotations. Peer-reviewed literature is collected mostly from a set of journals available electronically. Selected articles are entered into a master bibliography and indexed to one of eight areas of interest such as ‘GO’ or ‘homology’ or ‘phenotype’. Each article is then either indexed to a gene already contained in the database or funneled through a separate nomenclature database to add genes. The master bibliography and associated indexing provide information for various curator-reports such as ‘papers selected for GO that refer to genes with NO GO annotation’. Once indexed, curators who have expertise in appropriate disciplines enter pertinent information. MGI makes use of several controlled vocabularies that ensure uniform data encoding, enable robust analysis and support the construction of complex queries. These vocabularies range from pick-lists to structured vocabularies such as the GO. All data associations are supported with statements of evidence as well as access to source publications.

Semantic uniformity is achievable, in a limited enough sphere, provided you are willing to pay the price for it.

It has a high rate of return over less carefully curated content.

The project is producing high quality results, although hampered by a lack of resources.

My question is whether a similar high quality of results could be achieved with less semantically consistent curation by distributed contributors?

Harnessing the community of those interested in such a resource. And refining those less semantically consistent entries into higher quality annotations.

Pointers to examples of such projects?

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress