Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 13, 2012

Duke 0.4

Filed under: Deduplication,Entity Resolution,Record Linkage — Patrick Durusau @ 8:17 pm

Duke 0.4

New release of deduplication software written in Java on top of Lucene by Lars Marius Garshol.

From the release notes:

This version of Duke introduces:

  • Added JNDI data source for connecting to databases via JNDI (thanks to FMitzlaff).
  • In-memory data source added (thanks to FMitzlaff).
  • Record linkage mode now more flexible: can implement different strategies for choosing optimal links (with FMitzlaff).
  • Record linkage API refactored slightly to be more flexible (with FMitzlaff).
  • Added utilities for building equivalence classes from Duke output.
  • Made the XML config loader more robust.
  • Added a special cleaner for English person names.
  • Fixed bug in NumericComparator ( issue 66 )
  • Uses own Lucene query parser to avoid issues with search strings.
  • Upgraded to Lucene 3.5.0.
  • Added many more tests.
  • Many small bug fixes to core, NTriples reader, ec.

BTW, the documentation is online only: http://code.google.com/p/duke/wiki/GettingStarted.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress