Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

June 3, 2012

Semi-Supervised Named Entity Recognition:… [Marketing?]

Filed under: Entities,Entity Extraction,Entity Resolution,Marketing — Patrick Durusau @ 3:40 pm

Semi-Supervised Named Entity Recognition: Learning to Recognize 100 Entity Types with Little Supervision by David Nadeau (PhD Thesis, University of Ottawa, 2007).

Abstract:

Named Entity Recognition (NER) aims to extract and to classify rigid designators in text such as proper names, biological species, and temporal expressions. There has been growing interest in this field of research since the early 1990s. In this thesis, we document a trend moving away from handcrafted rules, and towards machine learning approaches. Still, recent machine learning approaches have a problem with annotated data availability, which is a serious shortcoming in building and maintaining large-scale NER systems. In this thesis, we present an NER system built with very little supervision. Human supervision is indeed limited to listing a few examples of each named entity (NE) type. First, we introduce a proof-of-concept semi-supervised system that can recognize four NE types. Then, we expand its capacities by improving key technologies, and we apply the system to an entire hierarchy comprised of 100 NE types. Our work makes the following contributions: the creation of a proof-of-concept semi-supervised NER system; the demonstration of an innovative noise filtering technique for generating NE lists; the validation of a strategy for learning disambiguation rules using automatically identified, unambiguous NEs; and finally, the development of an acronym detection algorithm, thus solving a rare but very difficult problem in alias resolution. We believe semi-supervised learning techniques are about to break new ground in the machine learning community. In this thesis, we show that limited supervision can build complete NER systems. On standard evaluation corpora, we report performances that compare to baseline supervised systems in the task of annotating NEs in texts.

Nadeau demonstrates the successful construction of a Named Entity Recognition (NER) system using a few supplied examples for each entity.

But what explains the lack of annotation where the entities are well known? The King James Bible? Search for “Joseph.” We know not all of the occurrences of “Joseph” represent the same entity.

Looking at the client list for Infoglutton, is there a lack of interest in named entity recognition?

Have we focused on techniques and issues that interest us, and then, as an afterthought, tried to market the results to consumers?

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress