Automatic transcription of 17th century English text in Contemporary English with NooJ: Method and Evaluation by Odile Piton (SAMM), Slim Mesfar (RIADI), and Hélène Pignot (SAMM).
Abstract:
Since 2006 we have undertaken to describe the differences between 17th century English and contemporary English thanks to NLP software. Studying a corpus spanning the whole century (tales of English travellers in the Ottoman Empire in the 17th century, Mary Astell’s essay A Serious Proposal to the Ladies and other literary texts) has enabled us to highlight various lexical, morphological or grammatical singularities. Thanks to the NooJ linguistic platform, we created dictionaries indexing the lexical variants and their transcription in CE. The latter is often the result of the validation of forms recognized dynamically by morphological graphs. We also built syntactical graphs aimed at transcribing certain archaic forms in contemporary English. Our previous research implied a succession of elementary steps alternating textual analysis and result validation. We managed to provide examples of transcriptions, but we have not created a global tool for automatic transcription. Therefore we need to focus on the results we have obtained so far, study the conditions for creating such a tool, and analyze possible difficulties. In this paper, we will be discussing the technical and linguistic aspects we have not yet covered in our previous work. We are using the results of previous research and proposing a transcription method for words or sequences identified as archaic.
Everyone working on search engines needs to print a copy of this article and read it at least once a month.
Seriously, the senses of both words and grammar evolve over centuries and even more quickly. What seem like correct search results from as recently as the 1950’s may be quite incorrect.
For example (I don’t have the episode reference, perhaps someone can suppy it) there was an “I Love Lucy” episode where Lucy says on the phone to RIcky that some visitor (at home) is “making love to her,” which meant nothing more than sweet talk. Not sexual intercourse.
I leave it for your imagination how large the semantic gap may be between English texts and originals composed in another language, culture, historical context and between 2,000 to 6,000 years ago. Flattening the complexities of ancient texts to bumper sticker snippets does a disservice them and ourselves.
Dear Patric,
Thank you for your very useful comments. Indeed the semantic gap between 17th century English and contemporary English is huge and we are not denying this.
I agree with you that this article will not revolutionize linguistics, it was posted on the Net by my colleague without my approval and my remarks and criticisms were not taken into account (long story that I will not tell you here).
What I just wanted to say here is that it would be nice for non-native speakers if there was a free resource on the Net such as a dictionary of 17th century English (the OED is quite expensive).
The aim of my colleague’s work was very modest: she wanted to see if it might be possible for a software like NooJ to identify archaic spellings and suggest a transcription. However, she came up against many technical difficulties, and spent months trying to solve them (with us). I think the best tool for that is the sofware WARD2 designed by our colleagues at Lancaster university.
Hoping these remarks will help!
All the best,
Hélène
Comment by Helene Pignot — September 29, 2013 @ 12:55 am
Hélène,
I saw the paper as an antidote to the usual flat reading of all texts as having the same context (ours). As you and I both know, that’s not true but it is a common mistake.
Hope your projects are going well!
Patrick
Comment by Patrick Durusau — October 2, 2013 @ 2:05 pm