Evaluating Text Extraction Algorithms
From the post:
Lately I’ve been working on evaluating and comparing algorithms, capable of extracting useful content from arbitrary html documents. Before continuing I encourage you to pass trough some of my previous posts, just to get a better feel of what we’re dealing with; I’ve written a short overview, compiled a list of resources if you want to dig deeper and made a feature wise comparison of related software and APIs.
If you’re not simply creating topic map content, you are mining content from other sources, such as texts, to point to or include in a topic map. A good set of posts on tools and issues surrounding that task.