In The Geometry of Constrained Structured Prediction: Applications to Inference and Learning of Natural Language Syntax André Martins proposes advances in inferencing and learning for NLP processing. And it is important work for that reason.
But in his introduction to recent (and rapid) progress in language technologies, the following text caught my eye:
So, what is the driving force behind the aforementioned progress? Essentially, it is the alliance of two important factors: the massive amount of data that became available with the advent of the Web, and the success of machine learning techniques to extract statistical models from the data (Mitchell, 1997; Manning and Schötze, 1999; Schölkopf and Smola, 2002; Bishop, 2006; Smith, 2011). As a consequence, a new paradigm has emerged in the last couple of decades, which directs attention to the data itself, as opposed to the explicit representation of knowledge (Abney, 1996; Pereira, 2000; Halevy et al., 2009). This data-centric paradigm has been extremely fruitful in natural language processing (NLP), and came to replace the classic knowledge representation methodology which was prevalent until the 1980s, based on symbolic rules written by experts. (emphasis added)
Are RDF, Linked Data, topic maps, and other semantic technologies caught in a 1980’s “symbolic rules” paradigm?
Are we ready to make the same break that NLP did, what, thirty (30) years ago now?
To get started on the literature, consider André’s sources:
Abney, S. (1996). Statistical methods and linguistics. In The balancing act: Combining symbolic and statistical approaches to language, pages 1–26. MIT Press, Cambridge, MA.
A more complete citation: Steven Abney. Statistical Methods and Linguistics. In: Judith Klavans and Philip Resnik (eds.), The Balancing Act: Combining Symbolic and Statistical Approaches to Language. The MIT Press, Cambridge, MA. 1996. (Link is to PDF of Abney’s paper.)
Pereira, F. (2000). Formal grammar and information theory: together again? Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 358(1769):1239–1253.
I added a pointer to the Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences abstract for the article. You can see it at: Formal grammar and information theory: together again? (PDF file).
Halevy, A., Norvig, P., and Pereira, F. (2009). The unreasonable effectiveness of data. Intelligent Systems, IEEE, 24(2):8–12.
I added a pointer to the Intelligent Systems, IEEE abstract for the article. You can see it at: The unreasonable effectiveness of data (PDF file).
The Halevy article doesn’t have an abstract per se but the ACM reports one as:
Problems that involve interacting with humans, such as natural language understanding, have not proven to be solvable by concise, neat formulas like F = ma. Instead, the best approach appears to be to embrace the complexity of the domain and address it by harnessing the power of data: if other humans engage in the tasks and generate large amounts of unlabeled, noisy data, new algorithms can be used to build high-quality models from the data. [ACM]
That sounds like a challenge to me. You?
PS: I saw the pointer to this thesis at Christophe Lalanne’s A bag of tweets / September 2012