Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

June 16, 2012

Wavelet Trees for All

Filed under: Graphs,Indexing,Wavelet Trees — Patrick Durusau @ 1:51 pm

Wavelet Trees for All (free version) (official publication reference)

Abstract:

The wavelet tree is a versatile data structure that serves a number of purposes, from string processing to geometry. It can be regarded as a device that represents a sequence, a reordering, or a grid of points. In addition, its space adapts to various entropy measures of the data it encodes, enabling compressed representations. New competitive solutions to a number of problems, based on wavelet trees, are appearing every year. In this survey we give an overview of wavelet trees and the surprising number of applications in which we have found them useful: basic and weighted point grids, sets of rectangles, strings, permutations, binary relations, graphs, inverted indexes, document retrieval indexes, full-text indexes, XML indexes, and general numeric sequences.

Good survey article but can be tough sledding depending on your math skills. Fortunately the paper covers enough uses and has references to freely available applications of this technique. I am sure you will find one that trips your understanding of wavelet trees.

January 18, 2012

Compact Binary Relation Representations with Rich Functionality

Filed under: Algorithms,Binary Relations,Graphs,Trees,Wavelet Trees — Patrick Durusau @ 7:57 pm

Compact Binary Relation Representations with Rich Functionality by Jérémy Barbay, Francisco Claude, Gonzalo Navarro.

Abstract:

Binary relations are an important abstraction arising in many data representation problems. The data structures proposed so far to represent them support just a few basic operations required to fit one particular application. We identify many of those operations arising in applications and generalize them into a wide set of desirable queries for a binary relation representation. We also identify reductions among those operations. We then introduce several novel binary relation representations, some simple and some quite sophisticated, that not only are space-efficient but also efficiently support a large subset of the desired queries.

Read the introduction (runs about two of the thirty-two pages) and tell me you aren’t interested. Go ahead.

Just the start of what looks like a very interesting line of research.

October 29, 2011

FM-Indexes and Backwards Search

Filed under: FM-Indexes,Indexing,Wavelet Trees — Patrick Durusau @ 7:27 pm

FM-Indexes and Backwards Search

From the post:

Last time (way back in June! I have got to start blogging consistently again) I discussed a gorgeous data structure called the Wavelet Tree. When a Wavelet Tree is stored using RRR sequences, it can answer rank and select operations in O(log A) time, where A is the size of the alphabet. If the size of the alphabet is 2, we could just use RRR by itself, which answers rank and select in O(1) time for binary strings. RRR also compresses the binary strings, and hence compresses a Wavelet Tree which is stored using RRR.

So far so good, but I suspect rank and select queries seem to be of limited use right now (although once you are familiar with the available structures, applications show up often). One of the neatest uses of rank that I’ve seen is in substring search, which is certainly a wide reaching problem (for a very recent application to genome assembly, see Jared Simpson’s paper from 2010 called Efficient construction of an assembly string graph using the FM-index)

Also, my apologies but I am using 1-basing instead of 0-basing, because that is how I did my diagrams a year ago 🙂 (and bear with my lack of nicely typeset math, I am migrating to GitHub pages where I will be allowed to use Mathjax soon)

If you are interested in stretching your indexing muscles a bit, this post will get them limbered up!

Powered by WordPress