Archive for the ‘Literature’ Category

Data Curation in the Networked Humanities [Semantic Curation?]

Tuesday, October 16th, 2012

Data Curation in the Networked Humanities by Michael Ullyot.

From the post:

These talks are the first phase of Encoding Shakespeare: my SSHRC-funded project for the next three years. Between now and 2015, I’m working to improve the automated encoding of early modern English texts, to enable text analysis.

This post’s three parts are brought to you by the letter p. First I outline the potential of algorithmic text analysis; then the problem of messy data; and finally the protocols for a networked-humanities data curation system.

This third part is the most tentative, as of this writing; Fall 2012 is about defining my protocols and identifying which tags the most text-analysis engines require for the best results — whatever that entails. (So I welcome your comments and resource links.)

A project that promises to touch on many of the issues in modern digital humanities. Do review and contribute if possible.

I have a lingering uneasiness with the notion of “data curation.” With the data and not curation part.

To say “data curation” implies we can identify the “data” that merits curation.

I don’t doubt we can identify some data that needs curation. The question being is it the only data that merits curation?

We know from the early textual history of the Bible that the text was curated and in that process, variant traditions and entire works were lost.

Just my take on it but rather than “data curation,” with the implication of a “correct” text, we need semantic curation.

Semantic curation attempts to preserve the semantics we see in a text, without attempting to find the correct semantics.

Wolfram Plays In Streets of Shakespeare’s London

Monday, April 23rd, 2012

I should have been glad to read: To Compute or Not to Compute—Wolfram|Alpha Analyzes Shakespeare’s Plays. Promoting Shakespeare has to be a first for Wolfram.

But the post reports word counts, unique words, and similar measures as master strokes of engineering, all things familiar since SNOBOL and before. And then makes this “bold” suggestion:

Asking Wolfram|Alpha for information about specific characters is where things really begin to get interesting. We took the dialog from each play and organized them into dialog timelines that show when each character talks within a specific play. For example, if you look at the dialog timeline of Julius Caesar, you’ll notice that Brutus and Cassius have steady dialog throughout the whole play, but Caesar’s dialog stops about halfway through. I wonder why that is?

That sort of analysis was old hat in the 1980’s.

Wolfram needs to catch up on the history of literary and linguistic computing rather than repeating it.

The back issues of Computational Linguistics or Literary and Linguistic Computing should help in that regard. To say nothing of Shakespeare, Computers, and the Mystery of Authorship and similar works.

On digital humanities projects in general, see: Digital Humanities Spotlight: 7 Important Digitization Projects by Maria Popova, for a small sample.