Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 3, 2011

Processing Tweets with LingPipe #3: Near duplicate detection and evaluation – Post

Filed under: Duplicates,Natural Language Processing,Similarity,String Matching — Patrick Durusau @ 3:03 pm

Processing Tweets with LingPipe #3: Near duplicate detection and evaluation

Good coverage of tokenization of tweets and the use of the Jaccard Distance measure to determine similarity.

Of course, for a topic map, similarity may not lead to being discarded but trigger other operations instead.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress