Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

August 8, 2014

Genomic Encyclopedia of Bacteria…

Filed under: Bioinformatics,Biology,Genomics — Patrick Durusau @ 4:03 pm

Genomic Encyclopedia of Bacteria and Archaea: Sequencing a Myriad of Type Strains by Nikos C. Kyrpides, et al. (Kyrpides NC, Hugenholtz P, Eisen JA, Woyke T, Göker M, et al. (2014) Genomic Encyclopedia of Bacteria and Archaea: Sequencing a Myriad of Type Strains. PLoS Biol 12(8): e1001920. doi:10.1371/journal.pbio.1001920)

Abstract:

Microbes hold the key to life. They hold the secrets to our past (as the descendants of the earliest forms of life) and the prospects for our future (as we mine their genes for solutions to some of the planet’s most pressing problems, from global warming to antibiotic resistance). However, the piecemeal approach that has defined efforts to study microbial genetic diversity for over 20 years and in over 30,000 genome projects risks squandering that promise. These efforts have covered less than 20% of the diversity of the cultured archaeal and bacterial species, which represent just 15% of the overall known prokaryotic diversity. Here we call for the funding of a systematic effort to produce a comprehensive genomic catalog of all cultured Bacteria and Archaea by sequencing, where available, the type strain of each species with a validly published name (currently~11,000). This effort will provide an unprecedented level of coverage of our planet’s genetic diversity, allow for the large-scale discovery of novel genes and functions, and lead to an improved understanding of microbial evolution and function in the environment.

While I am a standards advocate, I have to disagree with some of the claims for standards:

Accurate estimates of diversity will require not only standards for data but also standard operating procedures for all phases of data generation and collection [33],[34]. Indeed, sequencing all archaeal and bacterial type strains as a unified international effort will provide an ideal opportunity to implement international standards in sequencing, assembly, finishing, annotation, and metadata collection, as well as achieve consistent annotation of the environmental sources of these type strains using a standard such as minimum information about any (X) sequence (MixS) [27],[29]. Methods need to be rigorously challenged and validated to ensure that the results generated are accurate and likely reproducible, without having to reproduce each point. With only a few exceptions [27],[29], such standards do not yet exist, but they are in development under the auspices of the Genomics Standards Consortium (e.g., the M5 initiative) (http://gensc.org/gc_wiki/index.php/M5) [35]. Without the vehicle of a grand-challenge project such as this one, adoption of international standards will be much less likely.

Some standardization will no doubt be beneficial but for the data that is collected, a topic map informed approach where critical subjects are identified not be surface tokens but by key/value pairs would be much better.

In part because there is always legacy data and too little time and funding to back fit every change in present terminology to past names. Or should I say it hasn’t happen outside of one specialized chemical index that comes to mind.

1 Comment

  1. […] Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity « Genomic Encyclopedia of Bacteria… […]

    Pingback by Genomics Standards Consortium « Another Word For It — August 8, 2014 @ 4:13 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress