Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 13, 2011

Scaling Jaccard Distance for Document Deduplication: Shingling, MinHash and Locality-Sensitive Hashing – Post

Filed under: Data Mining,Similarity — Patrick Durusau @ 5:42 am

Scaling Jaccard Distance for Document Deduplication: Shingling, MinHash and Locality-Sensitive Hashing

Bob Carpenter of Ling-Pipe Blog points out the treatment of Jaccard distance in Mining Massive Datasets by Anand Rajaraman and Jeffrey D. Ullman.

Worth a close look.

1 Comment

  1. […] Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity « Scaling Jaccard Distance for Document Deduplication: Shingling, MinHash and Locality-Sensitive Hashi… […]

    Pingback by Document Indexing – Wrong Level? « Another Word For It — January 13, 2011 @ 8:17 am

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress