Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 6, 2014

Jellyfish

Filed under: Duke,Levenshtein Distance,String Matching — Patrick Durusau @ 7:06 pm

Jellyfish by James Turk and Michael Stephens.

From the webpage:

Jellyfish is a python library for doing approximate and phonetic matching of strings.

String comparison:

  • Levenshtein Distance
  • Damerau-Levenshtein Distance
  • Jaro Distance
  • Jaro-Winkler Distance
  • Match Rating Approach Comparison
  • Hamming Distance

Phonetic encoding:

  • American Soundex
  • Metaphone
  • NYSIIS (New York State Identification and Intelligence System)
  • Match Rating Codex

You might want to consider the string matching offered by Duke (written on top of Lucene):

String comparators

  • Levenshtein
  • WeightedLevenshtein
  • JaroWinkler
  • QGramComparator

Simple comparators

  • ExactComparator
  • DifferentComparator

Specialized comparators

  • GeopositionComparator
  • NumericComparator
  • PersonNameComparator

Phonetic comparators

  • SoundexComparator
  • MetaphoneComparator
  • NorphoneComparator

Token set comparators

  • DiceCoefficientComparator
  • JaccardIndexComparator

Enjoy!

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress