Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity by David Nadeau, Peter D. Turney and Stan Matwin.
Abstract:
In this paper, we propose a named-entity recognition (NER) system that addresses two major limitations frequently discussed in the field. First, the system requires no human intervention such as manually labeling training data or creating gazetteers. Second, the system can handle more than the three classical named-entity types (person, location, and organization). We describe the system’s architecture and compare its performance with a supervised system. We experimentally evaluate the system on a standard corpus, with the three classical named-entity types, and also on a new corpus, with a new named-entity type (car brands).
The authors confide successful application of their techniques to more than 50 named-entity types.
They also recite heuristics that they apply to texts during the mining process.
Is there a common repository of observations or heuristics for mining texts? Just curious.
Source code for the project: http://balie.sourceforge.net.
Answer to the question I just posed?