Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 23, 2010

Speller Challenge II

Filed under: Marketing — Patrick Durusau @ 1:56 pm

After posting Speller Challenge, it occurred to me that the contest name is mis-leading.

It really isn’t a speller contest as much as it is a spelling-check contest.

That is a speller implies being able to correctly spell words in some language. A semester NLP or AI type project.

What is needed for this contest is a spelling-check that recognizes likely completions and reports for any given completion, all the variant spellings.

Deriving that solution would have two parts:

First, data mining to determine all the variants (and their frequency) for any given completion. With search logs it should be possible to keep track of variants by locale but the contest did not ask for that. Save that for a future refinement.

Second, the topic map part, would be to represent all the variants of a completion as an association. Such that the retrieval of any one completion includes pointers to all of its variant completions.

I would treat all the completions as variants with incidence/frequency values. That is they play the role of variant in a variant-of association.

Since we are talking about internal representation, I would not represent either the association type or the roles in the data structure. There isn’t any merging or interchange going on so optimize for speed of response.

Will still need to contend with completions that are members of different associations. That is the completion stands for a different subject.

In any given variant-of association, all the variants represent the same subject.

Will have to give some thought on how to distinguish identical completions that are members of different associations.

(more to follow)

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress