Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 25, 2014

…[S]uffix array construction algorithms

Filed under: Bioinformatics,String Matching,Suffix Array,Suffix Tree — Patrick Durusau @ 6:20 pm

A bioinformatician’s guide to the forefront of suffix array construction algorithms by Anish Man Singh Shrestha, Martin C. Frith, and Paul Horton.

Abstract:

The suffix array and its variants are text-indexing data structures that have become indispensable in the field of bioinformatics. With the uninitiated in mind, we provide an accessible exposition of the SA-IS algorithm, which is the state of the art in suffix array construction. We also describe DisLex, a technique that allows standard suffix array construction algorithms to create modified suffix arrays designed to enable a simple form of inexact matching needed to support ‘spaced seeds’ and ‘subset seeds’ used in many biological applications.

If this doesn’t sound like a real page turner, consider the authors’ concluding paragraph:

The suffix array and its variants are text-indexing data structures that have become indispensable in the field of bioinformatics. With the uninitiated in mind, we provide an accessible exposition of the SA-IS algorithm, which is the state of the art in suffix array construction. We also describe DisLex, a technique that allows standard suffix array construction algorithms to create modified suffix arrays designed to enable a simple form of inexact matching needed to support ‘spaced seeds’ and ‘subset seeds’ used in many biological applications.

Reminds me that computer science departments need to start offering courses in “string theory,” to capitalize on the popularity of that phrase. 😉

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress