Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 14, 2013

DeleteDuplicates based on crawlDB only [Nutch-656]

Filed under: Nutch,Search Engines — Patrick Durusau @ 5:37 pm

DeleteDuplicates based on crawlDB only [Nutch-656]

As of today, Nutch, well, the nightly build after tonight, will have the ability to delete duplicate URLs.

Step in the right direction!

Now if duplicates could be declared on more than duplicate URLs and relationships maintained across deletions. 😉

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress