Text Mining Methods Applied to Mathematical Texts (slides) by Yannis Haralambous, Département Informatique, Télécom Bretagne.
Abstract:
Up to now, flexiform mathematical text has mainly been processed with the intention of formalizing mathematical knowledge so that proof engines can be applied to it. This approach can be compared with the symbolic approach to natural language processing, where methods of logic and knowledge representation are used to analyze linguistic phenomena. In the last two decades, a new approach to natural language processing has emerged, based on statistical methods and, in particular, data mining. This method, called text mining, aims to process large text corpora, in order to detect tendencies, to extract information, to classify documents, etc. In this talk I will present math mining, namely the potential applications of text mining to mathematical texts. After reviewing some existing works heading in that direction, I will formulate and describe several roadmap suggestions for the use and applications of statistical methods to mathematical text processing: (1) using terms instead of words as the basic unit of text processing, (2) using topics instead of subjects (“topics” in the sense of “topic models” in natural language processing, and “subjects” in the sense of various mathematical subject classifications), (3) using and correlating various graphs extracted from mathematical corpora, (4) use paraphrastic redundancy, etc. The purpose of this talk is to give a glimpse on potential applications of the math mining approach on large mathematical corpora, such as arXiv.org.
An invited presentation at CICM 2012.
I know Yannis from a completely different context and may comment on that in another post.
No paper but 50+ slides showing existing text mining tools can deliver useful search results, while waiting for a unified and correct index to all of mathematics. 😉
Varying semantics, as in all human enterprises, is an opportunity for topic map based assistance.