Scaling Jaccard Distance for Document Deduplication: Shingling, MinHash and Locality-Sensitive Hashing – Post

Scaling Jaccard Distance for Document Deduplication: Shingling, MinHash and Locality-Sensitive Hashing

Bob Carpenter of Ling-Pipe Blog points out the treatment of Jaccard distance in Mining Massive Datasets by Anand Rajaraman and Jeffrey D. Ullman.

Worth a close look.

One Response to “Scaling Jaccard Distance for Document Deduplication: Shingling, MinHash and Locality-Sensitive Hashing – Post”

  1. [...] Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity « Scaling Jaccard Distance for Document Deduplication: Shingling, MinHash and Locality-Sensitive Hashi… [...]