Archive for the ‘Humanities’ Category

Data Curation in the Networked Humanities [Semantic Curation?]

Tuesday, October 16th, 2012

Data Curation in the Networked Humanities by Michael Ullyot.

From the post:

These talks are the first phase of Encoding Shakespeare: my SSHRC-funded project for the next three years. Between now and 2015, I’m working to improve the automated encoding of early modern English texts, to enable text analysis.

This post’s three parts are brought to you by the letter p. First I outline the potential of algorithmic text analysis; then the problem of messy data; and finally the protocols for a networked-humanities data curation system.

This third part is the most tentative, as of this writing; Fall 2012 is about defining my protocols and identifying which tags the most text-analysis engines require for the best results — whatever that entails. (So I welcome your comments and resource links.)

A project that promises to touch on many of the issues in modern digital humanities. Do review and contribute if possible.

I have a lingering uneasiness with the notion of “data curation.” With the data and not curation part.

To say “data curation” implies we can identify the “data” that merits curation.

I don’t doubt we can identify some data that needs curation. The question being is it the only data that merits curation?

We know from the early textual history of the Bible that the text was curated and in that process, variant traditions and entire works were lost.

Just my take on it but rather than “data curation,” with the implication of a “correct” text, we need semantic curation.

Semantic curation attempts to preserve the semantics we see in a text, without attempting to find the correct semantics.

Perseus Gives Big Humanities Data Wings

Saturday, October 6th, 2012

Perseus Gives Big Humanities Data Wings by Ian Armas Foster.

From the post:

“How do we think about the human record when our brains are not capable of processing all the data in isolation?” asked Professor Gregory Crane of students in a lecture hall at the University of Kansas.

But when he posed this question, Crane wasn’t referencing modern big data to a bunch of computer science majors. Rather, he was discussing data from ancient texts with a group of those studying the humanities (and one computer science major).

Crane, a professor of classics, adjunct professor of computer science, and chair of Technology and Entrepreneurship at Tufts University, spoke about the efforts of the Perseus Project, a project whose goals include storing and analyzing ancient texts with an eye toward building a global humanities model.

(video omitted)

The next step in humanities is to create that Crane calls “a dialogue among civilizations.” With regard to the study of humanities, it is to connect those studying classical Greek with those studying classical Latin, Arabic, and even Chinese. Like physicists want to model the universe, Crane wants to model the progression of intelligence and art on a global scale throughout human history.

… (a bit later)

Surprisingly, the biggest barrier is not actually the amount of space occupied by the data of the ancient texts, but rather the language barriers. Currently, the Perseus Project covers over a trillion words, but those words are split up into 400 languages. To give a specific example, Crane presented a 12th century Arabic document. It was pristine and easily readable—to anyone who can read ancient Arabic.

Substitute “semantic” for “language” in “language barriers” and I think the comment is right on the mark.

Assuming that you could read the “12th century Arabic document” and understand its semantics, where would you record your reading to pass it along to others?

Say you spot the name of a well known 12th figure. Must every reader duplicate your feat of reading and understanding the document to make that same discovery?

Or can we preserve your “discovery” for other readers?

Topic maps anyone?

NEH Institute Working With Text In a Digital Age

Saturday, September 1st, 2012

NEH Institute Working With Text In a Digital Age

From the webpage:

The goal of this demo/sample code is to provide a platform which institute participants can use to complete an exercise to create a miniature digital edition. We will use these editions as concrete examples for discussion of decisions and issues to consider when creating digital editions from TEI XML, annotations and other related resources.

Some specific items for consideration and discussion through this exercise :

  • Creating identifiers for your texts.
  • Establishing markup guidelines and best practices.
  • Use of inline annotations versus standoff markup.
  • Dealing with overlapping hierarchies.
  • OAC (Open Annotation Collaboration)
  • Leveraging annotation tools.
  • Applying Linked Data concepts.
  • Distribution formats: optimzing for display vs for enabling data reuse.

Excellent resource!

Offers a way to learn/test digital edition skills.

You can use it as a template to produce similar materials with texts of greater interest to you.

The act of encoding asks what subjects you are going to recognize and under what conditions? Good practice for topic map construction.

Not to mention that historical editions of a text have made similar, possibly differing decisions on the same text.

Topic maps are a natural way to present such choices on their own merits, as well as being able to compare and contrast those choices.

I first saw this at The banquet of the digital scholars.

The banquet of the digital scholars

Saturday, September 1st, 2012

The banquet of the digital scholars

The actual workshop title: Humanities Hackathon on editing Athenaeus and on the Reinvention of the Edition in a Digital Space


September 30, 2012 Registration Deadline

October 10-12, 2012
Universität Leipzig (ULEI) & Deutsches Archäologisches Institut (DAI) Berlin

Abstract:

The University of Leipzig will host a hackathon that addresses two basic tasks. On the one hand, we will focus upon the challenges of creating a digital edition for the Greek author Athenaeus, whose work cites more than a thousand earlier sources and is one of the major sources for lost works of Greek poetry and prose. At the same time, we use the case Athenaeus to develop our understanding of to organize a truly born-digital edition, one that not only includes machine actionable citations and variant readings but also collations of multiple print editions, metrical analyses, named entity identification, linguistic features such as morphology, syntax, word sense, and co-reference analysis, and alignment between the Greek original and one or more later translations.

After some details:

Overview:
The Deipnosophists (Δειπνοσοφισταί, or “Banquet of the Sophists”) by Athenaeus of Naucratis is a 3rd century AD fictitious account of several banquet conversations on food, literature, and arts held in Rome by twenty-two learned men. This complex and fascinating work is not only an erudite and literary encyclopedia of a myriad of curiosities about classical antiquity, but also an invaluable collection of quotations and text re-uses of ancient authors, ranging from Homer to tragic and comic poets and lost historians. Since the large majority of the works cited by Athenaeus is nowadays lost, this compilation is a sort of reference tool for every scholar of Greek theater, poetry, historiography, botany, zoology, and many other topics.

Athenaeus’ work is a mine of thousands of quotations, but we still lack a comprehensive survey of its sources. The aim of this “humanities hackathon” is to provide a case study for drawing a spectrum of quoting habits of classical authors and their attitude to text reuse. Athenaeus, in fact, shapes a library of forgotten authors, which goes beyond the limits of a physical building and becomes an intellectual space of human knowledge. By doing so, he is both a witness of the Hellenistic bibliographical methods and a forerunner of the modern concept of hypertext, where sequential reading is substituted by hierarchical and logical connections among words and fragments of texts. Quantity, variety, and precision of Athenaeus’ citations make the Deipnosophists an excellent training ground for the development of a digital system of reference linking for primary sources. Athenaeus’ standard citation includes (a) the name of the author with additional information like ethnic origin and literary category, (b) the title of the work, and (c) the book number (e.g., Deipn. 2.71b). He often remembers the amount of papyrus scrolls of huge works (e.g., 6.229d-e; 6.249a), while distinguishing various editions of the same comedy (e.g., 1.29a; 4.171c; 6.247c; 7.299b; 9.367f) and different titles of the same work (e.g., 1.4e).

He also adds biographical information to identify homonymous authors and classify them according to literary genres, intellectual disciplines and schools (e.g., 1.13b; 6.234f; 9.387b). He provides chronological and historical indications to date authors (e.g., 10.453c; 13.599c), and he often copies the first lines of a work following a method that probably goes back to the Pinakes of Callimachus (e.g., 1.4e; 3.85f; 8.342d; 5.209f; 13.573f-574a).

Last but not least, the study of Athenaeus’ “citation system” is also a great methodological contribution to the domain of “fragmentary literature”, since one of the main concerns of this field is the relation between the fragment (quotation) and its context of transmission. Having this goal in mind, the textual analysis of the Deipnosophists will make possible to enumerate a series of recurring patterns, which include a wide typology of textual reproductions and linguistic features helpful to identify and classify hidden quotations of lost authors.

The 21st century has “big data” in the form of sensor streams and Twitter feeds, but “complex data” in the humanities pre-dates “big data” by a considerable margin.

If you are interested in being challenged by complexity and not simply the size of your data, take a closer look at this project.

Greek is a little late to be of interest to me but there are older texts that could benefit from a similar treatment.

BTW, while you are thinking about this project/text, consider how you would merge prior scholarship, digital and otherwise, with what originates here and what follows it in the decades to come.

One Culture. Computationally Intensive Research in the Humanities and Social Sciences…

Monday, July 2nd, 2012

One Culture. Computationally Intensive Research in the Humanities and Social Sciences, A Report on the Experiences of First Respondents to the Digging Into Data Challenge by Christa Williford and Charles Henry. Research Design by Amy Friedlander.

From the webpage:

This report culminates two years of work by CLIR staff involving extensive interviews and site visits with scholars engaged in international research collaborations involving computational analysis of large data corpora. These scholars were the first recipients of grants through the Digging into Data program, led by the NEH, who partnered with JISC in the UK, SSHRC in Canada, and the NSF to fund the first eight initiatives. The report introduces the eight projects and discusses the importance of these cases as models for the future of research in the academy. Additional information about the projects is provided in the individual case studies below (this additional material is not included in the print or PDF versions of the published report).

Main Report Online

or

PDF file.

Case Studies:

Humanists played an important role the development of digital computers. That role has diminished over time to the disadvantage of both humanists and computer scientists. Perhaps efforts such as this one will rekindle what was once a rich relationship.

Evolutionary Subject Tagging in the Humanities…

Saturday, December 3rd, 2011

Evolutionary Subject Tagging in the Humanities; Supporting Discovery and Examination in Digital Cultural Landscapes by JackAmmerman, Vika Zafrin, Dan Benedetti, Garth W. Green.

Abstract:

In this paper, the authors attempt to identify problematic issues for subject tagging in the humanities, particularly those associated with information objects in digital formats. In the third major section, the authors identify a number of assumptions that lie behind the current practice of subject classification that we think should be challenged. We move then to propose features of classification systems that could increase their effectiveness. These emerged as recurrent themes in many of the conversations with scholars, consultants, and colleagues. Finally, we suggest next steps that we believe will help scholars and librarians develop better subject classification systems to support research in the humanities.

Truly remarkable piece of work!

Just to entice you into reading the entire paper, the authors challenge the assumption that knowledge is analogue. Successfully in my view but I already held that position so I was an easy sell.

BTW, if you are in my topic maps class, this paper is required reading. Summarize what you think are the strong/weak points of the paper in 2 to 3 pages.