Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 3, 2010

Detecting “Duplicates” (same subject?)

Filed under: Authoring Topic Maps,Duplicates,String Matching,Subject Identity — Patrick Durusau @ 4:43 pm

A couple of interesting posts from the LingPipe blog:

Processing Tweets with LingPipe #1: Search and CSV Data Structures

Processing Tweets with LingPipe #2: Finding Duplicates with Hashing and Normalization

The second one on duplicates being the one that caught my eye.

After all, what are merging conditions the in TMDM other than the detection of duplicates?

Of course, I am interested in TMDM merging but also in the detection of fuzzy subject identity.

Whether than is then represented by an IRI or kept as a native merging condition being an implementation type issue.

This could be very important for some future leak of diplomatic tweets. 😉

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress