Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 5, 2010

SIMCOMP: A Hybrid Soft Clustering of Metagenome Reads

Filed under: Bioinformatics,Biomedical,Subject Identity — Patrick Durusau @ 6:54 pm

SIMCOMP: A Hybrid Soft Clustering of Metagenome Reads Authors: Shruthi Prabhakara, Raj Acharya

Abstract:

A major challenge facing metagenomics is the development of tools for the characterization of functional and taxonomic content of vast amounts of short metagenome reads. In this paper, we present a two pass semi-supervised algorithm, SimComp, for soft clustering of short metagenome reads, that is a hybrid of comparative and composition based methods. In the first pass, a comparative analysis of the metagenome reads against BLASTx extracts the reference sequences from within the metagenome to form an initial set of seeded clusters. Those reads that have a significant match to the database are clustered by their phylogenetic provenance. In the second pass, the remaining fraction of reads are characterized by their species-specific composition based characteristics. SimComp groups the reads into overlapping clusters, each with its read leader. We make no assumptions about the taxonomic distribution of the dataset. The overlap between the clusters elegantly handles the challenges posed by the nature of the metagenomic data. The resulting cluster leaders can be used as an accurate estimate of the phylogenetic composition of the metagenomic dataset. Our method enriches the dataset into a small number of clusters, while accurately assigning fragments as small as 100 base pairs.

I cite this article for the proposition that subject identity may be a multi-pass thing. 😉

Seriously, as topic maps spread out we are going encounter any number of subject identity practices that don’t involve string match.

No only do we need to have passing familiarity but also the flexibility to incorporate the user’s expectations about subject identity into our topic maps.

Questions:

  1. Search on the phrase “metagenomic analysis software”.
  2. Become familiar with any one of the software packages listed.
  3. Of the techniques used by the software in #2, which one would you use in another context and why? (3-5 pages, no citations)

PS: I realize that some students have little or no interest in bioinformatics. The important lesson is learning to generalize the application of a technique in one area to its application in apparently dissimilar areas.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress