Jumping NLP Curves: A Review of Natural Language Processing Research by Erik Cambria and Bebo White.
From the post:
Natural language processing (NLP) is a theory-motivated range of computational techniques for the automatic analysis and representation of human language. NLP research has evolved from the era of punch cards and batch processing (in which the analysis of a sentence could take up to 7 minutes) to the era of Google and the likes of it (in which millions of webpages can be processed in less than a second). This review paper draws on recent developments in NLP research to look at the past, present, and future of NLP technology in a new light. Borrowing the paradigm of `jumping curves’ from the eld of business management and marketing prediction, this survey article reinterprets the evolution of NLP research as the intersection of three overlapping curves –namely Syntactics, Semantics, and Pragmatics Curves– which will eventually lead NLP research to evolve into natural language understanding.
This is not your average review of the literature as the authors point out:
…this review paper focuses on the evolution of NLP research according to three different paradigms, namely: the bag-of-words, bag-of-concepts, and bag-of-narratives models.
But what caught my eye was:
All such capabilities are required to shift from mere NLP to what is usually referred to as natural language understanding (Allen, 1987). Today, most of the existing approaches are still based on the syntactic representation of text, a method which mainly relies on word co-occurrence frequencies. Such algorithms are limited by the fact that they can process only the information that they can `see’. As human text processors, we do not have such limitations as every word we see activates a cascade of semantically related concepts, relevant episodes, and sensory experiences, all of which enable the completion of complex NLP tasks &endash; such as word-sense disambiguation, textual entailment, and semantic role labeling &endash; in a quick and effortless way. (emphasis added)
The phrase, “only the information that they can `see’” captures the essence of the problem that topic maps address. A program can only see the surface of a text, nothing more.
The next phrase summarizes the promise of topic maps, to capture “…a cascade of semantically related concepts, relevant episodes, and sensory experiences…” related to a particular subject.
Not that any topic map could capture the full extent of related information to any subject but it can capture information to the extent plausible and useful.
I first saw this in a tweet by Marin Dimitrov.