Archive for the ‘Pattern Compression’ Category

Tying up the loose ends in fully LZW-compressed pattern matching

Tuesday, September 27th, 2011

Tying up the loose ends in fully LZW-compressed pattern matching by Pawel Gawrychowski.

Abstract:

We consider a natural generalization of the classical pattern matching problem: given compressed representations of a pattern p[1..M] and a text t[1..N] of sizes m and n, respectively, does p occur in t? We develop an optimal linear time solution for the case when both p and t are compressed using the LZW method. This improves the previously known O((n+m)log(n+m)) time solution of Gasieniec and Rytter, and essentially closes the line of research devoted to studying LZW-compressed exact pattern matching.

I don’t know of any topic maps that are yet of the size that they and queries against them need to take advantage of compressed queries against compress data but this paper outlines “an optimal linear time solution … for fully LZW-compressed pattern matching.” (page 2)

I suspect it may be more relevant to data mining prior to the construction of a topic map. But in either case, when needed it will be a welcome solution.

Minimum Description Length (MDL)

Monday, November 22nd, 2010

mdl-research.org

From the website:

The purpose of statistical modeling is to discover regularities in observed data. The success in finding such regularities can be measured by the length with which the data can be described. This is the rationale behind the Minimum Description Length (MDL) Principle introduced by Jorma Rissanen (Rissanen, 1978).

” The MDL Principle is a relatively recent method for inductive inference. The fundamental idea behind the MDL Principle is that any regularity in a given set of data can be used to compress the data, i.e. to describe it using fewer symbols than needed to describe the data literally. ” (Grünwald, 1998)

The website offers a reading list on MDL, demonstrations (with links to software), a list of researchers, related topics and upcoming conferences.

Pattern Compression – 7 Magnitudes of Reduction

Monday, November 22nd, 2010

Making Pattern Mining Useful.

Jilles Vreeken’s dissertation was a runner-up for the 2010 ACM SIGKDD Dissertation Award.

Vreeken proposes “compression” of data patterns on the basis of Minimum Description Length (MDL) (see The Minimum Description Length Principle) and KRIMP, “a heuristic parameter-free algorithm for finding the optimal set of frequent itemsets.” (SIGKDD, vol. 12, issue 1, page 76)

Readers should take note that experience indicates that KRIMP achieves 7 magnitudes of reduction in patterns. Let me say that again: KRIMP achieves 7 magnitudes of reduction in patterns. In practice, not theory.

Vreeken’s homepage has other materials of interest on this topic.

Questions:

  1. Application of “minimum description length” in library science? (report for class)
  2. How would you apply “minimum description length” techniques in library science? (3-5 pages, citations)
  3. Introduction to “Minimum Description Length For Librarians (class presentation, examples relevant to librarians)