Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 25, 2011

A Faster Grammar-Based Self-Index

Filed under: Bioinformatics,Biomedical,Indexing — Patrick Durusau @ 6:06 pm

A Faster Grammar-Based Self-Index by Travis Gagie, Paweł Gawrychowski, Juha Kärkkäinen, Yakov Nekrich, Simon J. Puglisi.

Abstract:

To store and search genomic databases efficiently, researchers have recently started building compressed self-indexes based on straight-line programs and LZ77. In this paper we show how, given a balanced straight-line program for a string (S [1..n]) whose LZ77 parse consists of $z$ phrases, we can add $\Oh{z \log \log z}$ words and obtain a compressed self-index for $S$ such that, given a pattern (P [1..m]), we can list the $\occ$ occurrences of $P$ in $S$ in $\Oh{m^2 + (m + \occ) \log \log n}$ time. All previous self-indexes are either larger or slower in the worst case.

Updated version of the paper I covered at: A Faster LZ77-Based Index.

In a very real sense, indexing is fundamental to information retrieval. That is to say that when information is placed in electronic storage, the only way to retrieve it is via indexing. The index may be one to one with a memory location and hence not terribly efficient, but the fact remains that an index is part of every information retrieval transaction.

1 Comment

  1. […] Update: See the updated version of this paper: A Faster Grammar-Based Self-Index. […]

    Pingback by A Faster LZ77-Based Index « Another Word For It — December 25, 2011 @ 10:21 am

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress