GENIA Project: Mining literature for knowledge in molecular biology.
From the webpage:
The GENIA project seeks to automatically extract useful information from texts written by scientists to help overcome the problems caused by information overload. We intend that while the methods are customized for application in the micro-biology domain, the basic methods should be generalisable to knowledge acquisition in other scientific and engineering domains.
We are currently working on the key task of extracting event information about protein interactions. This type of information extraction requires the joint effort of many sources of knowledge, which we are now developing. These include a parser, ontology, thesaurus and domain dictionaries as well as supervised learning models.
Be aware that the project uses the acronym of “TM” for “text mining.” Anyone can clearly see that “TM” should be expand to “topic map.” 😉 Just teasing.
GENIA has a corpus of texts and a number of tools for mining texts.