Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 6, 2014

Finding needles in haystacks:…

Filed under: Bioinformatics,Biology,Names,Taxonomy — Patrick Durusau @ 4:54 pm

Finding needles in haystacks: linking scientific names, reference specimens and molecular data for Fungi by Conrad L. Schoch, et al. (Database (2014) 2014 : bau061 doi: 10.1093/database/bau061).

Abstract:

DNA phylogenetic comparisons have shown that morphology-based species recognition often underestimates fungal diversity. Therefore, the need for accurate DNA sequence data, tied to both correct taxonomic names and clearly annotated specimen data, has never been greater. Furthermore, the growing number of molecular ecology and microbiome projects using high-throughput sequencing require fast and effective methods for en masse species assignments. In this article, we focus on selecting and re-annotating a set of marker reference sequences that represent each currently accepted order of Fungi. The particular focus is on sequences from the internal transcribed spacer region in the nuclear ribosomal cistron, derived from type specimens and/or ex-type cultures. Re-annotated and verified sequences were deposited in a curated public database at the National Center for Biotechnology Information (NCBI), namely the RefSeq Targeted Loci (RTL) database, and will be visible during routine sequence similarity searches with NR_prefixed accession numbers. A set of standards and protocols is proposed to improve the data quality of new sequences, and we suggest how type and other reference sequences can be used to improve identification of Fungi.

Database URL: http://www.ncbi.nlm.nih.gov/bioproject/PRJNA177353

If you are interested in projects to update and correct existing databases, this is the article for you.

Fungi may not be on your regular reading list but consider one aspect of the problem described:

It is projected that there are ~400 000 fungal names already in existence. Although only 100 000 are accepted taxonomically, it still makes updates to the existing taxonomic structure a continuous task. It is also clear that these named fungi represent only a fraction of the estimated total, 1–6 million fungal species (93–95).

I would say that computer science isn’t the only discipline where “naming things” is hard.

You?

PS: The other lesson from this paper (and many others) is that semantic accuracy is not easy nor is it cheap. Anyone who says differently is lying.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress