Mining Interesting Subgraphs by Output Space Sampling
Mohammad Al Hasan’s dissertation was the winner of the SIGKDD Ph.D. Dissertation Award.
From the dissertation:
Output space sampling is an entire paradigm shift in frequent pattern mining (FPM) that holds enormous promise. While traditional FPM strives for completeness, OSS targets to obtain a few interesting samples. The definition of interestingness can be very generic, so user can sample patterns from different target distributions by choosing different interestingness functions. This is very beneficial as mined patterns are subject to subsequent use in various knowledge discovery tasks, like classification, clustering, outlier detection, etc. and the interestingness score of a pattern varies for various tasks. OSS can adapt to this requirement just by changing the interestingness function. OSS also solves pattern redundancy problem by finding samples that are very different from each other. Note that, pattern redundancy hurts any knowledge based system that builds metrics based on the structural similarity of the patterns.
Nice to see recognition that for some data sets we don’t need (or require) full enumeration of all occurrences.
Something topic map advocates need to remember when proselytizing for topic maps.
The goal is not all the information known about a subject.
The goal is all the information a user wants about a subject.
Not the same thing.
Questions:
- What criteria of “interestingness” would you apply in gathering data for easy access by your patrons? (3-5 pages, no citations)
- How would you use this technique for authoring and/or testing a topic map? (3-5 pages, not citations. Think of “testing” a topic map as its representativeness of a particular data collection.
- Bibliography of material citing the paper or applying this technique.