Announcing Digital Pedagogy in the Humanities: Concepts, Models, and Experiments

Tuesday, December 23rd, 2014

Announcing Digital Pedagogy in the Humanities: Concepts, Models, and Experiments by Rebecca Frost Davis.



I’m elated today to announce, along with my fellow editors, Matt Gold, Katherine D. Harris, and Jentery Sayers, and in conjunction with the Modern Language Association Digital Pedagogy in the Humanities: Concepts, Models, and Experiments, an open-access, curated collection of downloadable, reusable, and remixable pedagogical resources for humanities scholars interested in the intersections of digital technologies with teaching and learning. This is a book in a new form. Taken as a whole, this collection will document the richly-textured culture of teaching and learning that responds to new digital learning environments, research tools, and socio-cultural contexts, ultimately defining the heterogeneous nature of digital pedagogy. You can see the full announcement here:

Many of you may have heard of this born-digital project under some other names (Digital Pedagogy Keywords) and hashtags (#digipedkit). Since it was born at the MLA convention in 2012 it has been continually evolving. You can trace that evolution, in part, through my earlier presentations:

For the future, please follow Digital Pedagogy in the Humanities on Twitter through the hashtag #curateteaching and visit our news page for updates. And if you know of a great pedagogical artifact to share, please help us curate teaching by tweeting it to the hashtag #curateteaching. We’ll be building an archive of those tweets, as well.

After looking at the list of keywords: Draft List of Keywords for Digital Pedagogy in the Humanities: Concepts, Models, and Experiments, I am hopeful those of you with a humanities background can suggest additional terms.

I didn’t see “topic maps” listed. 😉 Maybe that should be under Annotation? In any event, this looks like an exciting project.


Digital Humanities in the Southeast 2014

Tuesday, December 9th, 2014

Digital Humanities in the Southeast 2014

Big data is challenging because of the three or four V’s, depending on who you believe. (Originally, volume, variety, and velocity. At some later point, veracity was added.) When big data fully realizes the need for semantics, they will need to add a capital S.

If you want to prepare for that eventuality, the humanities have projects where the data sets are small compared to big data but suffer from the big S, as in semantics.

A number of workshop presentations are listed, most with both audio and slides. Ranging from Latin and history to war and Eliot.

A great opportunity to see problems that are not difficult in the four Vs sense but are difficult none the less.

I first saw this in a tweet by Brian Croxall.

JudaicaLink released

Wednesday, July 30th, 2014

JudaicaLink released



Data extractions from two encyclopediae from the domain of Jewish culture and history have been released as Linked Open Data within our JudaicaLink project.

JudaicaLink now provides access to 22,808 concepts in English (~ 10%) and Russian (~ 90%), mostly locations and persons.

See here for further information:

Next steps in this project include “…the creation of links between the two encyclopedias and links to external sources like DBpedia or Geonames.”

In case you are interested, the two encyclopedias are:

The YIVO Encyclopedia of Jews in Eastern Europe, courtesy of the YIVO Institute of Jewish Research, NY. provides an Internet version of the Encyclopedia of Russian Jewry, which is published in Moscow since 1994, giving a comprehensive, objective picture of the life and activity of the Jews of Russia, the Soviet Union and the CIS.

For more details: Encyclopediae

If you are looking to contribute content or time to a humanities project, this should be on your short list.

Digital Humanities and Computer Science

Sunday, July 27th, 2014

Chicago Colloquium on Digital Humanities and Computer Science


1 August 2014, abstracts of ~ 750 words and a minimal bio sent to

31 August 2014, Deadline for Early Registration Discount.

19 September 2014, Dealing for group rate reservations at the Orrington Hotel.

23-24 October, 2014 Colloquium.



The ninth annual meeting of the Chicago Colloquium on Digital Humanities and Computer Science (DHCS) will be hosted by Northwestern University on October 23-24, 2014.

The DHCS Colloquium has been a lively regional conference (with non-trivial bi-coastal and overseas sprinkling), rotating since 2006 among the University of Chicago (where it began), DePaul, IIT, Loyola, and Northwestern. At the first Colloquium Greg Crane asked his memorable question “What to do with a million books?” Here are some highlights that I remember across the years:

  • An NLP programmer at Los Alamos talking about the ways security clearances prevented CIA analysts and technical folks from talking to each other.
  • A demonstration that if you replaced all content words in Arabic texts and focused just on stop words you could determine with a high degree of certainty the geographical origin of a given piece of writing.
  • A visualization of phrases like “the king’s daughter” in a sizable corpus, telling you much about who owned what.
  • A social network analysis of Alexander the Great and his entourage.
  • An amazingly successful extraction of verbal parallels from very noisy data.
  • Did you know that Jane Austen was a game theorist before her time and that her characters were either skillful or clueless practitioners of this art?

And so forth. Given my own interests, I tend to remember “Text as Data” stuff, but there was much else about archaeology, art, music, history, and social or political life. You can browse through some of the older programs at


One of the weather sites promises that October is between 42 F for the low and 62 F for the high (on average). Sounds like a nice time to visit Northwestern University!

To say nothing of an exciting conference!

I first saw this in a tweet by David Bamman.

SAMUELS [English Historical Semantic Tagger]

Wednesday, July 9th, 2014

SAMUELS (Semantic Annotation and Mark-Up for Enhancing Lexical Searches)



The SAMUELS project (Semantic Annotation and Mark-Up for Enhancing Lexical Searches) is funded by the Arts and Humanities Research Council in conjunction with the Economic and Social Research Council (grant reference AH/L010062/1) from January 2014 to April 2015. It will deliver a system for automatically annotating words in texts with their precise meanings, disambiguating between possible meanings of the same word, ultimately enabling a step-change in the way we deal with large textual data. It uses the Historical Thesaurus of English as its core dataset, and will provide for each word in a text the Historical Thesaurus reference code for that concept. Textual data tagged in this way can then be accurately searched and precisely investigated, producing results which can be automatically aggregated at a range of levels of precision. The project also draws on a series of research sub-projects which will employ the software thus developed, testing and validating the utility of the SAMUELS tagger as a tool for wide-ranging further research.

To really appreciate this project, visit SAMUELS English Semantic Tagger Test Site.

There you can enter up to 2000 English words and select low/upper year boundaries!

Just picking a text at random, ;-), I chose:

Greenpeace flew its 135-foot-long thermal airship over the Bluffdale, UT, data center early Friday morning, carrying the message: “NSA Illegal Spying Below” along with a link steering people to a new web site,, which the three groups launched with the support of a separate, diverse coalition of over 20 grassroots advocacy groups and Internet companies. The site grades members of Congress on what they have done, or often not done, to rein in the NSA.

Some terms and Semtag3 by time period:


  • congress: C09d01 [Sexual intercourse]; E07e16 [Inclination]; E08e12 [Movement towards a thing/person/position]
  • data: 04.10[Unrecognised]
  • thermal: 04.10[Unrecognised]
  • UT: 04.10[Unrecognised]
  • web: B06a07 [Disorders of eye/vision]; B22h08 [Class Arachnida (spiders, scorpions)]; B10 [Biological Substance];


  • congress: S06k17a [Diplomacy]; C09d01 [Sexual intercourse]; E07e16 [Inclination];
  • data: 04.10[Unrecognised]
  • thermal: 04.10[Unrecognised]
  • UT: 04.10[Unrecognised]
  • web: B06a07 [Disorders of eye/vision]; B22h08 [Class Arachnida (spiders, scorpions)]; B10 [Biological Substance];


  • congress: S06k17a [Diplomacy]; C09d01 [Sexual intercourse]; O07 [Conversation];
  • data: H55a [Attestation, witness, evidence];
  • thermal: A04b02 [Spring]; C09a [Sexual desire]; D03c02 [Heat];
  • UT: 04.10[Unrecognised]
  • web: B06a07 [Disorders of eye/vision]; B06d01 [Deformities of specific parts]; B25d [Tools and implements];


  • congress: S06k17a [Diplomacy]; C09d01 [Sexual intercourse]; O07 [Conversation];
  • data: F04v04 [Data]; H55a [Attestation, witness, evidence]; W05 [Information];
  • thermal: A04b02 [Spring]; B28b [Types/styles of clothing]; D03c02 [Heat];
  • UT: 04.10[Unrecognised]
  • web: B06d01 [Deformities of specific parts]; B22h08 [Class Arachnida (spiders, scorpions)]; B10 [Biological Substance];


  • congress: 04.10[Unrecognised]
  • data: 04.10[Unrecognised]
  • thermal: 04.10[Unrecognised]
  • UT: 04.10[Unrecognised]
  • web: 04.10[Unrecognised]

I am assuming that the “04.10[unrecognized]” for all terms in 2000-2014 means there is no usage data for that time period.

I have never heard anyone deny that meanings of words change over time and domain.

What remains a mystery is why the value-add of documenting the meanings of words isn’t obvious?

I say “words,” I should be saying “data.” Remembering the loss of the $125 Million Mars Climate Orbiter. One system read a value as “pounds of force” and another read the same data as “newtons.” In that scenario, ET doesn’t get to call home.

So let’s rephrase the question to: Why isn’t the value-add of documenting the semantics of data obvious?



Wednesday, July 2nd, 2014

Palladio – Humanities thinking about data visualization



Palladio is a web-based platform for the visualization of complex, multi-dimensional data. It is a product of the "Networks in History" project that has its roots in another humanities research project based at Stanford: Mapping the Republic of Letters (MRofL). MRofL produced a number of unique visualizations tied to individual case studies and specific research questions. You can see the tools on this site and read about the case studies at

With "Networks in History" we are taking the insights gained and lessons learned from MRofL and applying them to a set of visualizations that reflect humanistic thinking about data. Palladio is our first step toward opening data visualization to any researcher by making it possible to upload data and visualize within the browser without any barriers. There is no need to create an account and we do not store the data. On the visualization side, we have emphasized tools for filtering. There is a timeline filter that allows for filtering on discontinuous time periods. There is a facet filter based on Moritz Stefaner's Elastic Lists that is particularly useful when exploring multidimensional data sets.

The correspondence networks in the Mapping the Republic of Letters (MRofL) project will be of particular interest to humanists.

Quite challenging on their own but imagine the utility of exploding every letter into different subjects and statements about subjects, which automatically map to other identified subjects and statements about subjects in other correspondence.

Scholars already know about many such relationships in intellectual history but those associations are captured in journals, monographs, identified in various ways and lack in many cases, explicit labeling of roles. To say nothing of having to re-tread the path of an author to discover their recording of such associations in full text form.

If such paths were easy to follow, the next generation of scholars would develop new paths, as opposed to making known ones well-worn.

Digital Mapping + Geospatial Humanities

Monday, June 16th, 2014

Digital Mapping + Geospatial Humanities by Fred Gibbs.



We are in the midst of a major paradigm shift in human consciousness and society caused by our ubiquitous connectedness via the internet and smartphones. These globalizing forces have telescoped space and time to an unprecedented degree, while paradoxically heightening the importance of local places.

The course explores the technologies, tools, and workflows that can help collect, connect, and present online interpretations of the spaces around us. Throughout the week, we’ll discuss the theoretical and practical challenges of deep mapping (producing rich, interactive maps with multiple layers of information). Woven into our discussions will be numerous technical tutorials that will allow us to tell map-based stories about Albuquerque’s fascinating past.

This course combines cartography, geography, GIS, history, sociology, ethnography, computer science, and graphic design. While we cover some of the basics of each of these, the course eschews developing deep expertise in any of these in favor of exploring their intersections with each other, and formulating critical questions that span these normally disconnected disciplines. By the end, you should be able to think more critically about maps, place, and our online experiences with them.

We’ll move from creating simple maps with Google Maps/Earth to creating your own custom, interactive online maps with various open source tools like QGIS, Open Street Map, and D3 that leverage the power of open data from local and national repositories to provide new perspectives on the built environment. We’ll also use various mobile apps for data collection, online exhibit software, (physical and digital) historical archives at the Center for Southwest Research. Along the way we’ll cover the various data formats (KML, XML, GeoJSON, TopoJSON) used by different tools and how to move between them, allowing you to craft the most efficient workflow for your mapping purposes.

Course readings that aren’t freely availabe online (and even some that are) can be accessed via the course Zotero Library. You’ll need to be invited to join the group since we use it to distribute course readings. If you are not familiar with Zotero, here are some instructions.

All of that in a week! This week as a matter of fact.

One of the things I miss about academia are the occasions when you can concentrate on one subject to the exclusion of all else. Of course, being unmarried at that age, unemployed, etc. may have contributed to the ability to focus. 😉

Just sampled some of the readings and this appears to be a really rocking course!

Avoid Philosophy?

Thursday, May 8th, 2014

Why Neil deGrasse Tyson is a philistine by Damon Linker.



Neil deGrasse Tyson may be a gifted popularizer of science, but when it comes to humanistic learning more generally, he is a philistine. Some of us suspected this on the basis of the historically and theologically inept portrayal of Giordano Bruno in the opening episode of Tyson’s reboot of Carl Sagan’s Cosmos.

But now it’s been definitively demonstrated by a recent interview in which Tyson sweepingly dismisses the entire history of philosophy. Actually, he doesn’t just dismiss it. He goes much further — to argue that undergraduates should actively avoid studying philosophy at all. Because, apparently, asking too many questions “can really mess you up.”

Yes, he really did say that. Go ahead, listen for yourself, beginning at 20:19 — and behold the spectacle of an otherwise intelligent man and gifted teacher sounding every bit as anti-intellectual as a corporate middle manager or used-car salesman. He proudly proclaims his irritation with “asking deep questions” that lead to a “pointless delay in your progress” in tackling “this whole big world of unknowns out there.” When a scientist encounters someone inclined to think philosophically, his response should be to say, “I’m moving on, I’m leaving you behind, and you can’t even cross the street because you’re distracted by deep questions you’ve asked of yourself. I don’t have time for that.”

“I don’t have time for that.”

With these words, Tyson shows he’s very much a 21st-century American, living in a perpetual state of irritated impatience and anxious agitation. Don’t waste your time with philosophy! (And, one presumes, literature, history, the arts, or religion.) Only science will get you where you want to go! It gets results! Go for it! Hurry up! Don’t be left behind! Progress awaits!

There are many ways to respond to this indictment. One is to make the case for progress in philosophical knowledge. This would show that Tyson is wrong because he fails to recognize the real advances that happen in the discipline of philosophy over time.


I remember thinking the first episode of Tyson’s Cosmos was rather careless with its handling of Bruno and the Enlightenment. But at the time I thought that was due to it being a “popular” presentation and not meant to be precise in every detail.

Damon has an excellent defense of philosophy and for that you should read his post.

I have a more pragmatic reason for recommending both philosophy in particular and the humanities in general to CS majors. You will waste less time in programming than you will from “deep questions.”

For example, why have intelligent to the point of being gifted CS types tried repeatedly to solve the issues of naming by proposing universal naming systems?

You don’t have to be very aware to know that naming systems are like standards. If you don’t like this one, make up another one.

That being the case, what makes anyone think their naming system will displace all others for any significant period of time? Considering there has never been a successful one.

Oh, I forgot, if you don’t know any philosophy, one place this issue gets discussed, or the humanities in general, you won’t be exposed to the long history of language and naming discussions. And the failures recorded there.

I would urge CS types to read and study both philosophy and the humanities for purely pragmatic reasons. CS pioneers were able to write the first FORTRAN compiler not because they had taken a compiler MOOC but because they had studied mathematics, linguistics, language, philosophy, history, etc.

Are you a designer (CS pioneers were) or are you a mechanic?

PS: If you are seriously interested in naming issues, my first suggestion would be to read The Search for the Perfect Language by Umberto Eco. It’s not all that needs to be read in this area but it is easily accessible.

I first saw this in a tweet by Christopher Phipps.

DH Tools for Beginners

Wednesday, April 2nd, 2014

DH Tools for Beginners by Quinn Warnick.

A short collection of tutorials for “digital humanities novices.”

It is a good start and if you know of other resources or want to author such tutorials, please do.

I don’t know that I was ever entirely comfortable with the phrase “digital humanities.”

In part because it creates an odd division between humanists and humanists who use digital tools.

We don’t call literature scholars who use concordances “concordance humanists.”

Any more than we call scholars who use bibliographic materials “bibliographic humanists.”

Mostly because concordances and bibliographic materials are tools by which one does humanities research and scholarship.

Shouldn’t that be the same for “digital” humanities?

That digital tools are simply more tools for doing humanities research and scholarship?

Given the recent and ongoing assaults on the humanities in general, standing closer together and not further apart as humanists sounds like a good idea.

Citizen Science and the Modern Web…

Friday, March 21st, 2014

Citizen Science and the Modern Web – Talk by Amit Kapadia by Bruce Berriman.



Amit Kapadia gave this excellent talk at CERN on Citizen Science and The Modern Web. From Amit’s abstract: “Beginning as a research project to help scientists communicate, the Web has transformed into a ubiquitous medium. As the sciences continue to transform, new techniques are needed to analyze the vast amounts of data being produced by large experiments. The advent of the Sloan Digital Sky Survey increased throughput of astronomical data, giving rise to Citizen Science projects such as Galaxy Zoo. The Web is no longer exclusively used by researchers, but rather, a place where anyone can share information, or even, partake in citizen science projects.

As the Web continues to evolve, new and open technologies enable web applications to become more sophisticated. Scientific toolsets may now target the Web as a platform, opening an application to a wider audience, and potentially citizen scientists. With the latest browser technologies, scientific data may be consumed and visualized, opening the browser as a new platform for scientific analysis.”

Bruce points to the original presentation here.

The emphasis is on astronomy but many good points on citizen science.

Curious if citizen involvement in the sciences and humanities could lead to greater awareness and support for them?

Web Scraping: working with APIs

Tuesday, March 18th, 2014

Web Scraping: working with APIs by Rolf Fredheim.



APIs present researchers with a diverse set of data sources through a standardised access mechanism: send a pasted together HTTP request, receive JSON or XML in return. Today we tap into a range of APIs to get comfortable sending queries and processing responses.

These are the slides from the final class in Web Scraping through R: Web scraping for the humanities and social sciences

This week we explore how to use APIs in R, focusing on the Google Maps API. We then attempt to transfer this approach to query the Yandex Maps API. Finally, the practice section includes examples of working with the YouTube V2 API, a few ‘social’ APIs such as LinkedIn and Twitter, as well as APIs less off the beaten track (Cricket scores, anyone?).

The final installment of Rolf’s course for humanists. He promises to repeat it next year. Should be interesting to see how techniques and resources evolve over the next year.

Forward the course link to humanities and social science majors.

Web Scraping part2: Digging deeper

Tuesday, February 25th, 2014

Web Scraping part2: Digging deeper by Rolf Fredheim.



Slides from the second web scraping through R session: Web scraping for the humanities and social sciences.

In which we make sure we are comfortable with functions, before looking at XPath queries to download data from newspaper articles. Examples including BBC news and Guardian comments.

Download the .Rpres file to use in Rstudio here.

A regular R script with the code only can be accessed here.

A great part 2 on web scrapers!

Web-Scraping: the Basics

Friday, February 21st, 2014

Web-Scraping: the Basics by Rolf Fredheim.



Slides from the first session of my course about web scraping through R: Web scraping for the humanities and social sciences

Includes an introduction to the paste function, working with URLs, functions and loops.

Putting it all together we fetch data in JSON format about Wikipedia page views from

Solutions here:

Download the .Rpres file to use in Rstudio here

Hard to say how soon but eventually data in machine readable formats is going to be the default and web scraping will be a historical footnote.

But it hasn’t happened yet so pass this on to newbies who need advice.

Research Opportunities and …

Tuesday, February 18th, 2014

Research Opportunities and Themes in Digital Scholarship by Professor Andrew Prescott.

Unlike death-by-powerpoint-slides, only four or five of these slides have much text at all.

Which makes them more difficult to interpret, absent the presentation. (So there is a downside to low-text slides.)

But the slides reference such a wide range and depth of humanities projects that you are likely to find them very useful.

Either as pointers to present projects or as inspiration for variations or entirely new projects.


Digital Humanities?

Monday, January 20th, 2014

xkcd on digital humanities

I saw this mentioned by Ted Underwood in a tweet saying:

An xkcd that could just as well be titled “Digital Humanities.”

Not to be too harsh on the digital humanists, they have bad role models in programming projects, the maintenance of which is called “job security.”

More digital than thou

Thursday, January 9th, 2014

More digital than thou by Michael Sperberg-McQueen.



An odd thing has started happening in reviews for the Digital Humanities conference: reviewers are objecting to papers if the reviewer thinks it has relevance beyond the field of DH, apparently on the grounds that the topic is then insufficiently digital. It doesn’t matter how relevant the topic is to work in DH, or how deeply embedded the topic is in a core DH topic like text encoding — if some reviewers don’t see a computer in the proposal, they want to exclude it from the conference.

Michael’s focus on the TEI (Text Encoding Initiative), XML Schema at the W3C, and other projects, kept him from seeing the ramparts being thrown up around digital humanities.

Well, and Michael is just Michael. Whether you are a long time XML hacker or a new comer, Michael is just Michael. When you are really good, you don’t need to cloak yourself in disciplinary robes, boundaries and secret handshakes.

You don’t have to look far in the “digital humanities” to find forums where hand wringing over the discipline of digital humanities is a regular feature. As opposed to concern over what digital technologies have, can, will contribute to the humanities.

Digital technologies should be as much a part of each humanities discipline as the more traditional periodical indexes, concordances, dictionaries and monographs.

After all, I thought there was general agreement that “separate but equal” was a poor policy.

Interesting times for literary theory

Sunday, August 4th, 2013

Interesting times for literary theory by Ted Underwood.



This could be the beginning of a beautiful friendship. I realize a marriage between machine learning and literary theory sounds implausible: people who enjoy one of these things are pretty likely to believe the other is fraudulent and evil.** But after reading through a couple of ML textbooks,*** I’m convinced that literary theorists and computer scientists wrestle with similar problems, in ways that are at least loosely congruent. Neither field is interested in the mere accumulation of data; both are interested in understanding the way we think and the kinds of patterns we recognize in language. Both fields are interested in problems that lack a single correct answer, and have to be mapped in shades of gray (ML calls these shades “probability”). Both disciplines are preoccupied with the danger of overgeneralization (literary theorists call this “essentialism”; computer scientists call it “overfitting”). Instead of saying “every interpretation is based on some previous assumption,” computer scientists say “every model depends on some prior probability,” but there’s really a similar kind of self-scrutiny involved.

Computer science and the humanities could enrich each other greatly.

This could be a starting place for that enrichment.

Data Curation in the Networked Humanities [Semantic Curation?]

Tuesday, October 16th, 2012

Data Curation in the Networked Humanities by Michael Ullyot.



These talks are the first phase of Encoding Shakespeare: my SSHRC-funded project for the next three years. Between now and 2015, I’m working to improve the automated encoding of early modern English texts, to enable text analysis.

This post’s three parts are brought to you by the letter p. First I outline the potential of algorithmic text analysis; then the problem of messy data; and finally the protocols for a networked-humanities data curation system.

This third part is the most tentative, as of this writing; Fall 2012 is about defining my protocols and identifying which tags the most text-analysis engines require for the best results — whatever that entails. (So I welcome your comments and resource links.)

A project that promises to touch on many of the issues in modern digital humanities. Do review and contribute if possible.

I have a lingering uneasiness with the notion of “data curation.” With the data and not curation part.

To say “data curation” implies we can identify the “data” that merits curation.

I don’t doubt we can identify some data that needs curation. The question being is it the only data that merits curation?

We know from the early textual history of the Bible that the text was curated and in that process, variant traditions and entire works were lost.

Just my take on it but rather than “data curation,” with the implication of a “correct” text, we need semantic curation.

Semantic curation attempts to preserve the semantics we see in a text, without attempting to find the correct semantics.

Perseus Gives Big Humanities Data Wings

Saturday, October 6th, 2012

Perseus Gives Big Humanities Data Wings by Ian Armas Foster.



“How do we think about the human record when our brains are not capable of processing all the data in isolation?” asked Professor Gregory Crane of students in a lecture hall at the University of Kansas.

But when he posed this question, Crane wasn’t referencing modern big data to a bunch of computer science majors. Rather, he was discussing data from ancient texts with a group of those studying the humanities (and one computer science major).

Crane, a professor of classics, adjunct professor of computer science, and chair of Technology and Entrepreneurship at Tufts University, spoke about the efforts of the Perseus Project, a project whose goals include storing and analyzing ancient texts with an eye toward building a global humanities model.

(video omitted)

The next step in humanities is to create that Crane calls “a dialogue among civilizations.” With regard to the study of humanities, it is to connect those studying classical Greek with those studying classical Latin, Arabic, and even Chinese. Like physicists want to model the universe, Crane wants to model the progression of intelligence and art on a global scale throughout human history.

… (a bit later)

Surprisingly, the biggest barrier is not actually the amount of space occupied by the data of the ancient texts, but rather the language barriers. Currently, the Perseus Project covers over a trillion words, but those words are split up into 400 languages. To give a specific example, Crane presented a 12th century Arabic document. It was pristine and easily readable—to anyone who can read ancient Arabic.

Substitute “semantic” for “language” in “language barriers” and I think the comment is right on the mark.

Assuming that you could read the “12th century Arabic document” and understand its semantics, where would you record your reading to pass it along to others?

Say you spot the name of a well known 12th figure. Must every reader duplicate your feat of reading and understanding the document to make that same discovery?

Or can we preserve your “discovery” for other readers?

Topic maps anyone?

NEH Institute Working With Text In a Digital Age

Saturday, September 1st, 2012

NEH Institute Working With Text In a Digital Age



The goal of this demo/sample code is to provide a platform which institute participants can use to complete an exercise to create a miniature digital edition. We will use these editions as concrete examples for discussion of decisions and issues to consider when creating digital editions from TEI XML, annotations and other related resources.

Some specific items for consideration and discussion through this exercise :

  • Creating identifiers for your texts.
  • Establishing markup guidelines and best practices.
  • Use of inline annotations versus standoff markup.
  • Dealing with overlapping hierarchies.
  • OAC (Open Annotation Collaboration)
  • Leveraging annotation tools.
  • Applying Linked Data concepts.
  • Distribution formats: optimzing for display vs for enabling data reuse.

Excellent resource!

Offers a way to learn/test digital edition skills.

You can use it as a template to produce similar materials with texts of greater interest to you.

The act of encoding asks what subjects you are going to recognize and under what conditions? Good practice for topic map construction.

Not to mention that historical editions of a text have made similar, possibly differing decisions on the same text.

Topic maps are a natural way to present such choices on their own merits, as well as being able to compare and contrast those choices.

I first saw this at The banquet of the digital scholars.

The banquet of the digital scholars

Saturday, September 1st, 2012

The banquet of the digital scholars

The actual workshop title: Humanities Hackathon on editing Athenaeus and on the Reinvention of the Edition in a Digital Space

September 30, 2012 Registration Deadline

October 10-12, 2012
Universität Leipzig (ULEI) & Deutsches Archäologisches Institut (DAI) Berlin


The University of Leipzig will host a hackathon that addresses two basic tasks. On the one hand, we will focus upon the challenges of creating a digital edition for the Greek author Athenaeus, whose work cites more than a thousand earlier sources and is one of the major sources for lost works of Greek poetry and prose. At the same time, we use the case Athenaeus to develop our understanding of to organize a truly born-digital edition, one that not only includes machine actionable citations and variant readings but also collations of multiple print editions, metrical analyses, named entity identification, linguistic features such as morphology, syntax, word sense, and co-reference analysis, and alignment between the Greek original and one or more later translations.

After some details:

The Deipnosophists (Δειπνοσοφισταί, or “Banquet of the Sophists”) by Athenaeus of Naucratis is a 3rd century AD fictitious account of several banquet conversations on food, literature, and arts held in Rome by twenty-two learned men. This complex and fascinating work is not only an erudite and literary encyclopedia of a myriad of curiosities about classical antiquity, but also an invaluable collection of quotations and text re-uses of ancient authors, ranging from Homer to tragic and comic poets and lost historians. Since the large majority of the works cited by Athenaeus is nowadays lost, this compilation is a sort of reference tool for every scholar of Greek theater, poetry, historiography, botany, zoology, and many other topics.

Athenaeus’ work is a mine of thousands of quotations, but we still lack a comprehensive survey of its sources. The aim of this “humanities hackathon” is to provide a case study for drawing a spectrum of quoting habits of classical authors and their attitude to text reuse. Athenaeus, in fact, shapes a library of forgotten authors, which goes beyond the limits of a physical building and becomes an intellectual space of human knowledge. By doing so, he is both a witness of the Hellenistic bibliographical methods and a forerunner of the modern concept of hypertext, where sequential reading is substituted by hierarchical and logical connections among words and fragments of texts. Quantity, variety, and precision of Athenaeus’ citations make the Deipnosophists an excellent training ground for the development of a digital system of reference linking for primary sources. Athenaeus’ standard citation includes (a) the name of the author with additional information like ethnic origin and literary category, (b) the title of the work, and (c) the book number (e.g., Deipn. 2.71b). He often remembers the amount of papyrus scrolls of huge works (e.g., 6.229d-e; 6.249a), while distinguishing various editions of the same comedy (e.g., 1.29a; 4.171c; 6.247c; 7.299b; 9.367f) and different titles of the same work (e.g., 1.4e).

He also adds biographical information to identify homonymous authors and classify them according to literary genres, intellectual disciplines and schools (e.g., 1.13b; 6.234f; 9.387b). He provides chronological and historical indications to date authors (e.g., 10.453c; 13.599c), and he often copies the first lines of a work following a method that probably goes back to the Pinakes of Callimachus (e.g., 1.4e; 3.85f; 8.342d; 5.209f; 13.573f-574a).

Last but not least, the study of Athenaeus’ “citation system” is also a great methodological contribution to the domain of “fragmentary literature”, since one of the main concerns of this field is the relation between the fragment (quotation) and its context of transmission. Having this goal in mind, the textual analysis of the Deipnosophists will make possible to enumerate a series of recurring patterns, which include a wide typology of textual reproductions and linguistic features helpful to identify and classify hidden quotations of lost authors.

The 21st century has “big data” in the form of sensor streams and Twitter feeds, but “complex data” in the humanities pre-dates “big data” by a considerable margin.

If you are interested in being challenged by complexity and not simply the size of your data, take a closer look at this project.

Greek is a little late to be of interest to me but there are older texts that could benefit from a similar treatment.

BTW, while you are thinking about this project/text, consider how you would merge prior scholarship, digital and otherwise, with what originates here and what follows it in the decades to come.

One Culture. Computationally Intensive Research in the Humanities and Social Sciences…

Monday, July 2nd, 2012

One Culture. Computationally Intensive Research in the Humanities and Social Sciences, A Report on the Experiences of First Respondents to the Digging Into Data Challenge by Christa Williford and Charles Henry. Research Design by Amy Friedlander.



This report culminates two years of work by CLIR staff involving extensive interviews and site visits with scholars engaged in international research collaborations involving computational analysis of large data corpora. These scholars were the first recipients of grants through the Digging into Data program, led by the NEH, who partnered with JISC in the UK, SSHRC in Canada, and the NSF to fund the first eight initiatives. The report introduces the eight projects and discusses the importance of these cases as models for the future of research in the academy. Additional information about the projects is provided in the individual case studies below (this additional material is not included in the print or PDF versions of the published report).

Main Report Online


PDF file.

Case Studies:

Humanists played an important role the development of digital computers. That role has diminished over time to the disadvantage of both humanists and computer scientists. Perhaps efforts such as this one will rekindle what was once a rich relationship.

Evolutionary Subject Tagging in the Humanities…

Saturday, December 3rd, 2011

Evolutionary Subject Tagging in the Humanities; Supporting Discovery and Examination in Digital Cultural Landscapes by JackAmmerman, Vika Zafrin, Dan Benedetti, Garth W. Green.


In this paper, the authors attempt to identify problematic issues for subject tagging in the humanities, particularly those associated with information objects in digital formats. In the third major section, the authors identify a number of assumptions that lie behind the current practice of subject classification that we think should be challenged. We move then to propose features of classification systems that could increase their effectiveness. These emerged as recurrent themes in many of the conversations with scholars, consultants, and colleagues. Finally, we suggest next steps that we believe will help scholars and librarians develop better subject classification systems to support research in the humanities.

Truly remarkable piece of work!

Just to entice you into reading the entire paper, the authors challenge the assumption that knowledge is analogue. Successfully in my view but I already held that position so I was an easy sell.

BTW, if you are in my topic maps class, this paper is required reading. Summarize what you think are the strong/weak points of the paper in 2 to 3 pages.