FM-Indexes and Backwards Search
From the post:
Last time (way back in June! I have got to start blogging consistently again) I discussed a gorgeous data structure called the Wavelet Tree. When a Wavelet Tree is stored using RRR sequences, it can answer rank and select operations in O(log A) time, where A is the size of the alphabet. If the size of the alphabet is 2, we could just use RRR by itself, which answers rank and select in O(1) time for binary strings. RRR also compresses the binary strings, and hence compresses a Wavelet Tree which is stored using RRR.
So far so good, but I suspect rank and select queries seem to be of limited use right now (although once you are familiar with the available structures, applications show up often). One of the neatest uses of rank that I’ve seen is in substring search, which is certainly a wide reaching problem (for a very recent application to genome assembly, see Jared Simpson’s paper from 2010 called Efficient construction of an assembly string graph using the FM-index)
Also, my apologies but I am using 1-basing instead of 0-basing, because that is how I did my diagrams a year ago 🙂 (and bear with my lack of nicely typeset math, I am migrating to GitHub pages where I will be allowed to use Mathjax soon)
If you are interested in stretching your indexing muscles a bit, this post will get them limbered up!