Semantics in Support of Biodiversity Knowledge Discovery: An Introduction to the Biological Collections Ontology and Related Ontologies by Walls RL, Deck J, Guralnick R, Baskauf S, Beaman R, et al. (2014). (Walls RL, Deck J, Guralnick R, Baskauf S, Beaman R, et al. (2014) Semantics in Support of Biodiversity Knowledge Discovery: An Introduction to the Biological Collections Ontology and Related Ontologies. PLoS ONE 9(3): e89606. doi:10.1371/journal.pone.0089606).
The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques), as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1) individual organisms, including voucher specimens from ecological studies and museum specimens, 2) bulk or environmental samples (e.g., gut contents, soil, water) that include DNA, other molecules, and potentially many organisms, especially microbes, and 3) survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers.
I want to call to your attention a great description of the current state of biodiversity data:
Assembling the data sets needed for global biodiversity initiatives remains challenging. Biodiversity data are highly heterogeneous, including information about organisms, their morphology and genetics, life history and habitats, and geographical ranges. These data almost always either contain or are linked to spatial, temporal, and environmental data. Biodiversity science seeks to understand the origin, maintenance, and function of this variation and thus requires integrated data on the spatiotemporal dynamics of organisms, populations, and species, together with information on their ecological and environmental context. Biodiversity knowledge is generated across multiple disciplines, each with its own community practices. As a consequence, biodiversity data are stored in a fragmented network of resource silos, in formats that impede integration. The means to properly describe and interrelate these different data sources and types is essential if such resources are to fulfill their potential for flexible use and re-use in a wide variety of monitoring, scientific, and policy-oriented applications . (From the introduction)
Contrast that with the final claim in the abstract:
We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers. (emphasis added)
I am very confident that both of those statements, from the introduction and from the abstract, are as true as human speakers can achieve.
However, the displacement of an unknown number of communities of practice, which vary even within disciplines, to say nothing of between disciplines, by these ontologies, seems highly unlikely. Not to mention planning for the fate of data from soon to be previous community practices.
Or perhaps I should observe that such a displacement has never happened. True, over time a community of practice may die, only to be replaced by another one but I take that as different in kind from an artificial construct that is made by one group and urged upon all others.
Think of it this way, what if the top 100 members of the biodiversity community kept their current community practices but used these ontologies as conversion targets? Followers of those various members could use their community leader’s practice as their conversion target. Reasoning it is easier to follow someone in your own community.
Rather than arguments that will outlast the ontologies that are convenient conversion targets about those ontologies, once a basis for mapping is declared, conversion to any other target becomes immeasurably easier.
Reducing the semantic friction inherent in conversion to an ontology or data format in an investment in the future.
Battling semantic friction for a conversion to an ontology or data format is an investment you will make over and over again.