Archive for the ‘Gene Ontology’ Category

How Stable is Your Ontology?

Tuesday, February 19th, 2013

Assessing identity, redundancy and confounds in Gene Ontology annotations over time by Jesse Gillis and Paul Pavlidis. (Bioinformatics (2013) 29 (4): 476-482. doi: 10.1093/bioinformatics/bts727)

Abstract:

Motivation: The Gene Ontology (GO) is heavily used in systems biology, but the potential for redundancy, confounds with other data sources and problems with stability over time have been little explored.

Results: We report that GO annotations are stable over short periods, with 3% of genes not being most semantically similar to themselves between monthly GO editions. However, we find that genes can alter their ‘functional identity’ over time, with 20% of genes not matching to themselves (by semantic similarity) after 2 years. We further find that annotation bias in GO, in which some genes are more characterized than others, has declined in yeast, but generally increased in humans. Finally, we discovered that many entries in protein interaction databases are owing to the same published reports that are used for GO annotations, with 66% of assessed GO groups exhibiting this confound. We provide a case study to illustrate how this information can be used in analyses of gene sets and networks.

Availability: Data available at http://chibi.ubc.ca/assessGO.

How does your ontology account for changes in identity over time?

The “O” Word (Ontology) Isn’t Enough

Tuesday, October 16th, 2012

The Units Ontology makes reference to the Gene Ontology as an example of a successful web ontology effort.

As it should. The Gene Ontology (GO) is the only successful web ontology effort. A universe with one (1) inhabitant.

The GO has a number of differences from wannabe successful ontology candidates. (see the article below)

The first difference echoes loudly across the semantic engineering universe:

One of the factors that account for GO’s success is that it originated from within the biological community rather than being created and subsequently imposed by external knowledge engineers. Terms were created by those who had expertise in the domain, thus avoiding the huge effort that would have been required for a computer scientist to learn and organize large amounts of biological functional information. This also led to general acceptance of the terminology and its organization within the community. This is not to say that there have been no disagreements among biologists over the conceptualization, and there is of course a protocol for arriving at a consensus when there is such a disagreement. However, a model of a domain is more likely to conform to the shared view of a community if the modelers are within or at least consult to a large degree with members of that community.

Did you catch that first line?

One of the factors that account for GO’s success is that it originated from within the biological community rather than being created and subsequently imposed by external knowledge engineers.

Saying the “O” word, ontology, that will benefit everyone if they will just listen to you, isn’t enough.

There are other factors to consider:

A Short Study on the Success of the Gene Ontology by Michael Bada, Robert Stevens, Carole Goble, Yolanda Gil, Michael Ashburner, Judith A. Blake, J. Michael Cherry, Midori Harris, Suzanna Lewis.

Abstract:

While most ontologies have been used only by the groups who created them and for their initially defined purposes, the Gene Ontology (GO), an evolving structured controlled vocabulary of nearly 16,000 terms in the domain of biological functionality, has been widely used for annotation of biological-database entries and in biomedical research. As a set of learned lessons offered to other ontology developers, we list and briefly discuss the characteristics of GO that we believe are most responsible for its success: community involvement; clear goals; limited scope; simple, intuitive structure; continuous evolution; active curation; and early use.

Mosaic: making biological sense of complex networks

Thursday, July 5th, 2012

Mosaic: making biological sense of complex networks by Chao Zhang, Kristina Hanspers, Allan Kuchinsky, Nathan Salomonis, Dong Xu, and Alexander R. Pico. (Bioinformatics (2012) 28 (14): 1943-1944. doi: 10.1093/bioinformatics/bts278)

Abstract:

We present a Cytoscape plugin called Mosaic to support interactive network annotation, partitioning, layout and coloring based on gene ontology or other relevant annotations.

From the Introduction:

The increasing throughput and quality of molecular measurements in the domains of genomics, proteomics and metabolomics continue to fuel the understanding of biological processes. Collected per molecule, the scope of these data extends to physical, genetic and biochemical interactions that in turn comprise extensive networks. There are software tools available to visualize and analyze data-derived biological networks (Smoot et al., 2011). One challenge faced by these tools is how to make sense of such networks often represented as massive ‘hairballs’. Many network analysis algorithms filter or partition networks based on topological features, optionally weighted by orthogonal node or edge data (Bader and Hogue, 2003; Royer et al., 2008). Another approach is to mathematically model networks and rely on their statistical properties to make associations with other networks, phenotypes and drug effects, sidestepping the issue of making sense of the network itself altogether (Machado et al., 2011). Acknowledging that there is still great value in engaging the minds of researchers in exploratory data analysis at the level of networks (Kelder et al., 2010), we have produced a Cytoscape plugin called Mosaic to support interactive network annotation and visualization that includes partitioning, layout and coloring based on biologically relevant ontologies (Fig. 1). Mosaic shows slices of a given network in the visual language of biological pathways, which are familiar to any biologist and are ideal frameworks for integrating knowledge.

[Fig. 1 omitted}

Cytoscape is a free and open source network visualization platform that actively supports independent plugin development (Smoot et al., 2011). For annotation, Mosaic relies primarily on the full gene ontology (GO) or simplified ‘slim’ versions (http://www.geneontology.org/GO.slims.shtml). The cellular layout of partitioned subnetworks strictly depends on the cellular component branch of GO, but the other two functions, partitioning and coloring, can be driven by any annotation associated with a major gene or protein identifier system.

You will need:

As per the Mosaic project page.

The Mosaic page offers additional documentation, which will take a while to process. I am particularly interested in annotations of the network driving partitioning.

Indexing the content of Gene Ontology with apache SOLR

Sunday, April 8th, 2012

Indexing the content of Gene Ontology with apache SOLR by Pierre Lindenbaum.

Pierre walks you through the use of Solr to index GeneOntology. As with all of his work, impressive!

Of course, one awesome post deserves another! So Pierre follows with:

Apache SOLR and GeneOntology: Creating the JQUERY-UI client (with autocompletion)

So you get to learn JQuery/UI stuff as well.

Inside the Variation Toolkit: Tools for Gene Ontology

Tuesday, January 31st, 2012

Inside the Variation Toolkit: Tools for Gene Ontology by Pierre Lindenbaum.

From the post:

GeneOntologyDbManager is a C++ tool that is part of my experimental Variation Toolkit.

This program is a set of tools for GeneOntology, it is based on the sqlite3 library.

Pierre walks through building and using his GeneOntologyDbManager.

Rather appropriate to mention an area (bioinformatics) that is exploding with information on the same day as GPU and database posts. Plus I am sure you will find the Gene Ontology useful for topic map purposes.