Approximate Methods for Scalable Data Mining by Andrew Clegg.
Slides from a presentation at: Data Science London 24/04/13.
To get your interest, a nice illustration of HyperLogLog algorithm, “Billions of distinct values in 1.5KB of RAM with 2% relative error.”
Has a number of other useful illustrations and great references.
[…] Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity « Approximate Methods for Scalable Data Mining […]
Pingback by HyperLogLog — Cornerstone of a Big Data Infrastructure « Another Word For It — May 2, 2013 @ 10:45 am