Archive for the ‘Gene Ontology’ Category

Ten Quick Tips for Using the Gene Ontology

Tuesday, November 26th, 2013

Ten Quick Tips for Using the Gene Ontology by Judith A. Blake.

From the post:

The Gene Ontology (GO) provides core biological knowledge representation for modern biologists, whether computationally or experimentally based. GO resources include biomedical ontologies that cover molecular domains of all life forms as well as extensive compilations of gene product annotations to these ontologies that provide largely species-neutral, comprehensive statements about what gene products do. Although extensively used in data analysis workflows, and widely incorporated into numerous data analysis platforms and applications, the general user of GO resources often misses fundamental distinctions about GO structures, GO annotations, and what can and can not be extrapolated from GO resources. Here are ten quick tips for using the Gene Ontology.

Tip 1: Know the Source of the GO Annotations You Use

Tip 2: Understand the Scope of GO Annotations

Tip 3: Consider Differences in Evidence Codes

Tip 4: Probe Completeness of GO Annotations

Tip 5: Understand the Complexity of the GO Structure

Tip 6: Choose Analysis Tools Carefully

Tip 7: Provide the Version of the Data/Tools Used

Tip 8: Seek Input from the GOC Community and Make Use of GOC Resources

Tip 9: Contribute to the GO

Tip 10: Acknowledge the Work of the GO Consortium

See Judith’s article for her comments and pointers under each tip.

The take away here is that an ontology may have the information you are looking for, but understanding what you have found is an entirely different matter.

For GO, follow Judith’s specific suggestions/tips, for any other ontology, take steps to understand the ontology before relying upon it.

I first saw this in a tweet by Stephen Turner.

The 97% Junk Part of Human DNA

Sunday, August 4th, 2013

Researchers from the Gene and Stem Cell Therapy Program at Sydney’s Centenary Institute have confirmed that, far from being “junk,” the 97 per cent of human DNA that does not encode instructions for making proteins can play a significant role in controlling cell development.

And in doing so, the researchers have unravelled a previously unknown mechanism for regulating the activity of genes, increasing our understanding of the way cells develop and opening the way to new possibilities for therapy.

Using the latest gene sequencing techniques and sophisticated computer analysis, a research group led by Professor John Rasko AO and including Centenary’s Head of Bioinformatics, Dr William Ritchie, has shown how particular white blood cells use non-coding DNA to regulate the activity of a group of genes that determines their shape and function. The work is published today in the scientific journal Cell.*

There’s a poke with a sharp stick to any gene ontology.

Roles in associations of genes have suddenly expanded.

Your call:

  1. Wait until a committee can officially name the new roles and parts of the “junk” that play those roles, or
  2. Create names/roles on the fly and merge those with subsequent identifiers on an ongoing basis as our understanding improves.

Any questions?

*Justin J.-L. Wong, William Ritchie, Olivia A. Ebner, Matthias Selbach, Jason W.H. Wong, Yizhou Huang, Dadi Gao, Natalia Pinello, Maria Gonzalez, Kinsha Baidya, Annora Thoeng, Teh-Liane Khoo, Charles G. Bailey, Jeff Holst, John E.J. Rasko. Orchestrated Intron Retention Regulates Normal Granulocyte Differentiation. Cell, 2013; 154 (3): 583 DOI: 10.1016/j.cell.2013.06.052

How Stable is Your Ontology?

Tuesday, February 19th, 2013

Assessing identity, redundancy and confounds in Gene Ontology annotations over time by Jesse Gillis and Paul Pavlidis. (Bioinformatics (2013) 29 (4): 476-482. doi: 10.1093/bioinformatics/bts727)


Motivation: The Gene Ontology (GO) is heavily used in systems biology, but the potential for redundancy, confounds with other data sources and problems with stability over time have been little explored.

Results: We report that GO annotations are stable over short periods, with 3% of genes not being most semantically similar to themselves between monthly GO editions. However, we find that genes can alter their ‘functional identity’ over time, with 20% of genes not matching to themselves (by semantic similarity) after 2 years. We further find that annotation bias in GO, in which some genes are more characterized than others, has declined in yeast, but generally increased in humans. Finally, we discovered that many entries in protein interaction databases are owing to the same published reports that are used for GO annotations, with 66% of assessed GO groups exhibiting this confound. We provide a case study to illustrate how this information can be used in analyses of gene sets and networks.

Availability: Data available at

How does your ontology account for changes in identity over time?

The “O” Word (Ontology) Isn’t Enough

Tuesday, October 16th, 2012

The Units Ontology makes reference to the Gene Ontology as an example of a successful web ontology effort.

As it should. The Gene Ontology (GO) is the only successful web ontology effort. A universe with one (1) inhabitant.

The GO has a number of differences from wannabe successful ontology candidates. (see the article below)

The first difference echoes loudly across the semantic engineering universe:

One of the factors that account for GO’s success is that it originated from within the biological community rather than being created and subsequently imposed by external knowledge engineers. Terms were created by those who had expertise in the domain, thus avoiding the huge effort that would have been required for a computer scientist to learn and organize large amounts of biological functional information. This also led to general acceptance of the terminology and its organization within the community. This is not to say that there have been no disagreements among biologists over the conceptualization, and there is of course a protocol for arriving at a consensus when there is such a disagreement. However, a model of a domain is more likely to conform to the shared view of a community if the modelers are within or at least consult to a large degree with members of that community.

Did you catch that first line?

One of the factors that account for GO’s success is that it originated from within the biological community rather than being created and subsequently imposed by external knowledge engineers.

Saying the “O” word, ontology, that will benefit everyone if they will just listen to you, isn’t enough.

There are other factors to consider:

A Short Study on the Success of the Gene Ontology by Michael Bada, Robert Stevens, Carole Goble, Yolanda Gil, Michael Ashburner, Judith A. Blake, J. Michael Cherry, Midori Harris, Suzanna Lewis.


While most ontologies have been used only by the groups who created them and for their initially defined purposes, the Gene Ontology (GO), an evolving structured controlled vocabulary of nearly 16,000 terms in the domain of biological functionality, has been widely used for annotation of biological-database entries and in biomedical research. As a set of learned lessons offered to other ontology developers, we list and briefly discuss the characteristics of GO that we believe are most responsible for its success: community involvement; clear goals; limited scope; simple, intuitive structure; continuous evolution; active curation; and early use.

Mosaic: making biological sense of complex networks

Thursday, July 5th, 2012

Mosaic: making biological sense of complex networks by Chao Zhang, Kristina Hanspers, Allan Kuchinsky, Nathan Salomonis, Dong Xu, and Alexander R. Pico. (Bioinformatics (2012) 28 (14): 1943-1944. doi: 10.1093/bioinformatics/bts278)


We present a Cytoscape plugin called Mosaic to support interactive network annotation, partitioning, layout and coloring based on gene ontology or other relevant annotations.

From the Introduction:

The increasing throughput and quality of molecular measurements in the domains of genomics, proteomics and metabolomics continue to fuel the understanding of biological processes. Collected per molecule, the scope of these data extends to physical, genetic and biochemical interactions that in turn comprise extensive networks. There are software tools available to visualize and analyze data-derived biological networks (Smoot et al., 2011). One challenge faced by these tools is how to make sense of such networks often represented as massive ‘hairballs’. Many network analysis algorithms filter or partition networks based on topological features, optionally weighted by orthogonal node or edge data (Bader and Hogue, 2003; Royer et al., 2008). Another approach is to mathematically model networks and rely on their statistical properties to make associations with other networks, phenotypes and drug effects, sidestepping the issue of making sense of the network itself altogether (Machado et al., 2011). Acknowledging that there is still great value in engaging the minds of researchers in exploratory data analysis at the level of networks (Kelder et al., 2010), we have produced a Cytoscape plugin called Mosaic to support interactive network annotation and visualization that includes partitioning, layout and coloring based on biologically relevant ontologies (Fig. 1). Mosaic shows slices of a given network in the visual language of biological pathways, which are familiar to any biologist and are ideal frameworks for integrating knowledge.

[Fig. 1 omitted}

Cytoscape is a free and open source network visualization platform that actively supports independent plugin development (Smoot et al., 2011). For annotation, Mosaic relies primarily on the full gene ontology (GO) or simplified ‘slim’ versions ( The cellular layout of partitioned subnetworks strictly depends on the cellular component branch of GO, but the other two functions, partitioning and coloring, can be driven by any annotation associated with a major gene or protein identifier system.

You will need:

As per the Mosaic project page.

The Mosaic page offers additional documentation, which will take a while to process. I am particularly interested in annotations of the network driving partitioning.

Indexing the content of Gene Ontology with apache SOLR

Sunday, April 8th, 2012

Indexing the content of Gene Ontology with apache SOLR by Pierre Lindenbaum.

Pierre walks you through the use of Solr to index GeneOntology. As with all of his work, impressive!

Of course, one awesome post deserves another! So Pierre follows with:

Apache SOLR and GeneOntology: Creating the JQUERY-UI client (with autocompletion)

So you get to learn JQuery/UI stuff as well.

Inside the Variation Toolkit: Tools for Gene Ontology

Tuesday, January 31st, 2012

Inside the Variation Toolkit: Tools for Gene Ontology by Pierre Lindenbaum.

From the post:

GeneOntologyDbManager is a C++ tool that is part of my experimental Variation Toolkit.

This program is a set of tools for GeneOntology, it is based on the sqlite3 library.

Pierre walks through building and using his GeneOntologyDbManager.

Rather appropriate to mention an area (bioinformatics) that is exploding with information on the same day as GPU and database posts. Plus I am sure you will find the Gene Ontology useful for topic map purposes.