SecondString is a Java library of string matching techniques.
The Levenshtein distance test mentioned in the LIMES post is an example of a string matching technique.
The results are not normalized so compare results from the techniques cautiously.
Questions:
- Suggest 1 – 2 survey articles on string matching for the class. (The Navarro article cited in Wikipedia on the Levenshtein distance is almost ten years old and despite numerous exclusions, still runs 58 pages. Excellent article but needs updating with more recent material.)
- What one technique would you use in constructing your topic map? Why? (2-3 pages, citing examples of why it would be the best for your data)