Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 9, 2014

…Locality Sensitive Hashing for Unstructured Data

Filed under: Hashing,Jaccard Similarity,Similarity,Subject Identity — Patrick Durusau @ 6:51 pm

Practical Applications of Locality Sensitive Hashing for Unstructured Data by Jake Drew.

From the post:

The purpose of this article is to demonstrate how the practical Data Scientist can implement a Locality Sensitive Hashing system from start to finish in order to drastically reduce the search time typically required in high dimensional spaces when finding similar items. Locality Sensitive Hashing accomplishes this efficiency by exponentially reducing the amount of data required for storage when collecting features for comparison between similar item sets. In other words, Locality Sensitive Hashing successfully reduces a high dimensional feature space while still retaining a random permutation of relevant features which research has shown can be used between data sets to determine an accurate approximation of Jaccard similarity [2,3].

Complete with code and references no less!

How “similar” do two items need to be to count as the same item?

If two libraries own a physical copy of the same book, for some purposes they are distinct items but for annotations/reviews, you could treat them as one item.

If that sounds like a topic map-like question, your right!

What measures of similarity are you applying to what subjects?

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress