Monday, November 22nd, 2010

From the website:

The purpose of statistical modeling is to discover regularities in observed data. The success in finding such regularities can be measured by the length with which the data can be described. This is the rationale behind the Minimum Description Length (MDL) Principle introduced by Jorma Rissanen (Rissanen, 1978).

” The MDL Principle is a relatively recent method for inductive inference. The fundamental idea behind the MDL Principle is that any regularity in a given set of data can be used to compress the data, i.e. to describe it using fewer symbols than needed to describe the data literally. ” (Grünwald, 1998)

The website offers a reading list on MDL, demonstrations (with links to software), a list of researchers, related topics and upcoming conferences.

Pattern Compression – 7 Magnitudes of Reduction

Monday, November 22nd, 2010

Making Pattern Mining Useful.

Jilles Vreeken’s dissertation was a runner-up for the 2010 ACM SIGKDD Dissertation Award.

Vreeken proposes “compression” of data patterns on the basis of Minimum Description Length (MDL) (see The Minimum Description Length Principle) and KRIMP, “a heuristic parameter-free algorithm for finding the optimal set of frequent itemsets.” (SIGKDD, vol. 12, issue 1, page 76)

Readers should take note that experience indicates that KRIMP achieves 7 magnitudes of reduction in patterns. Let me say that again: KRIMP achieves 7 magnitudes of reduction in patterns. In practice, not theory.

Vreeken’s homepage has other materials of interest on this topic.


