Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 29, 2014

VSEARCH

Filed under: Bioinformatics,Clustering,Merging,Topic Maps — Patrick Durusau @ 4:15 pm

VSEARCH: Open and free 64-bit multithreaded tool for processing metagenomic sequences, including searching, clustering, chimera detection, dereplication, sorting, masking and shuffling

From the webpage:

The aim of this project is to create an alternative to the USEARCH tool developed by Robert C. Edgar (2010). The new tool should:

  • have open source code with an appropriate open source license
  • be free of charge, gratis
  • have a 64-bit design that handles very large databases and much more than 4GB of memory
  • be as accurate or more accurate than usearch
  • be as fast or faster than usearch

We have implemented a tool called VSEARCH which supports searching, clustering, chimera detection, dereplication, sorting and masking (commands --usearch_global, --cluster_smallmem, --cluster_fast, --uchime_ref, --uchime_denovo, --derep_fulllength, --sortbysize, --sortbylength and --maskfasta, as well as almost all their options).

VSEARCH stands for vectorized search, as the tool takes advantage of parallelism in the form of SIMD vectorization as well as multiple threads to perform accurate alignments at high speed. VSEARCH uses an optimal global aligner (full dynamic programming Needleman-Wunsch), in contrast to USEARCH which by default uses a heuristic seed and extend aligner. This results in more accurate alignments and overall improved sensitivity (recall) with VSEARCH, especially for alignments with gaps.

The same option names as in USEARCH version 7 has been used in order to make VSEARCH an almost drop-in replacement.

The reconciliation of characteristics that are different is the only way that merging in topic maps varies from the clustering found in bioinformatics programs like VSEARCH. The results are a cluster of items deemed “similar” on some basis and with topic maps, subject to further processing.

Scaling isn’t easy in bioinformatics but it hasn’t been found daunting either.

There is much to be learned from projects such as VSEARCH to inform the processing of topic maps.

I first saw this in a tweet by Torbjørn Rognes.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress