Damn Cool Algorithms: Levenshtein Automata by Nick Johnson.
From the post:
In a previous Damn Cool Algorithms post, I talked about BK-trees, a clever indexing structure that makes it possible to search for fuzzy matches on a text string based on Levenshtein distance – or any other metric that obeys the triangle inequality. Today, I’m going to describe an alternative approach, which makes it possible to do fuzzy text search in a regular index: Levenshtein automata.
Introduction
The basic insight behind Levenshtein automata is that it’s possible to construct a Finite state automaton that recognizes exactly the set of strings within a given Levenshtein distance of a target word. We can then feed in any word, and the automaton will accept or reject it based on whether the Levenshtein distance to the target word is at most the distance specified when we constructed the automaton. Further, due to the nature of FSAs, it will do so in O(n) time with the length of the string being tested. Compare this to the standard Dynamic Programming Levenshtein algorithm, which takes O(mn) time, where m and n are the lengths of the two input words! It’s thus immediately apparrent that Levenshtein automaton provide, at a minimum, a faster way for us to check many words against a single target word and maximum distance – not a bad improvement to start with!
Of course, if that were the only benefit of Levenshtein automata, this would be a short article. There’s much more to come, but first let’s see what a Levenshtein automaton looks like, and how we can build one.
Not recent but I think you will enjoy the post anyway.
I first saw this at DZone.
[…] Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity « Damn Cool Algorithms: Levenshtein Automata […]
Pingback by Damn Cool Algorithms: Cardinality Estimation « Another Word For It — September 22, 2012 @ 3:12 pm