Archive for the ‘Digital Research’ Category

Getting your hands dirty with the Digital Manuscripts Toolkit

Thursday, February 16th, 2017

Getting your hands dirty with the Digital Manuscripts Toolkit by Emma Stanford. (3 March 2017 3.00pm — 5.00pm Venue: Centre for Digital Scholarship, Weston Library (Map)

From the webpage:

In this workshop offered jointly by Bodleian Digital Library Systems and Services and the Centre for Digital Scholarship, you’ll learn how to make the most of the digitized resources at the Bodleian, the BnF, the Vatican Library and a host of other institutions, using software tools built around the International Image Interoperability Framework (IIIF). After a brief introduction to the main concepts of IIIF, you’ll learn how to use Mirador and the Digital Manuscripts Toolkit to gather images from different institutions into a single viewer; rearrange, remix and enhance image sequences and add new descriptive metadata; add transcriptions and annotations to digitized images; and embed zoomable images or whole manuscripts into your own website or blog. You’ll leave with your own virtual workspace, stocked with the images you’re using.

This event is open to all. No technological or scholarly expertise is necessary. The workshop will be most useful if you already have a few digitized books or manuscripts in mind that you’d like to work with, but if you don’t, we can help you find some. In addition to manuscripts, the tools can be applied to digitized printed books, maps, paintings and ephemera.

To participate in the workshop, you will need your own laptop, with internet access via eduroam or the Bodleian Libraries network.

If you are planning on being at the Bodleian on 3 March 2017, call ahead to reserve a seat for this free event!

If not, explore Mirador and the Digital Manuscripts Toolkit on your own.

Digital Humanities / Studies: U.Pitt.Greenberg

Wednesday, February 1st, 2017

Digital Humanities / Studies: U.Pitt.Greenberg maintained by Elisa E. Beshero-Bondar.

I discovered this syllabus and course materials by accident when one of its modules on XQuery turned up in a search. Backing out of that module I discovered this gem of a digital humanities course.

The course description:

Our course in “digital humanities” and “digital studies” is designed to be interdisciplinary and practical, with an emphasis on learning through “hands-on” experience. It is a computer course, but not a course in which you learn programming for the sake of learning a programming language. It’s a course that will involve programming, and working with coding languages, and “putting things online,” but it’s not a course designed to make you, in fifteen weeks, a professional website designer. Instead, this is a course in which we prioritize what we can investigate in the Humanities and related Social Sciences fields about cultural, historical, and literary research questions through applications in computer coding and programming, which you will be learning and applying as you go in order to make new discoveries and transform cultural objects—what we call “texts” in their complex and multiple dimensions. We think of “texts” as the transmittable, sharable forms of human creativity (mainly through language), and we interface with a particular text in multiple ways through print and electronic “documents.” When we refer to a “document,” we mean a specific instance of a text, and much of our work will be in experimenting with the structures of texts in digital document formats, accessing them through scripts we write in computer code—scripts that in themselves are a kind of text, readable both by humans and machines.

Your professors are scholars and teachers of humanities, not computer programmers by trade, and we teach this course from our backgrounds (in literature and anthropology, respectively). We teach this course to share coding methods that are highly useful to us in our fields, with an emphasis on working with texts as artifacts of human culture shaped primarily with words and letters—the forms of “written” language transferable to many media (including image and sound) that we can study with computer modelling tools that we design for ourselves based on the questions we ask. We work with computers in this course as precision instruments that help us to read and process great quantities of information, and that lead us to make significant connections, ask new kinds of questions, and build models and interfaces to change our reading and thinking experience as people curious about human history, culture, and creativity.

Our focus in this course is primarily analytical: to apply computer technologies to represent and investigate cultural materials. As we design projects together, you will gain practical experience in editing and you will certainly fine-tune your precision in writing and thinking. We will be working primarily with eXtensible Markup Language (XML) because it is a powerful tool for modelling texts that we can adapt creatively to our interests and questions. XML represents a standard in adaptability and human-readability in digital code, and it works together with related technologies with which you will gain working experience: You’ll learn how to write XPath expressions: a formal language for searching and extracting information from XML code which serves as the basis for transforming XML into many publishable forms, using XSLT and XQuery. You’ll learn to write XSLT: a programming “stylesheet” transforming language designed to convert XML to publishable formats, as well as XQuery, a query (or search) language for extracting information from XML files bundled collectively. You will learn how to design your own systematic coding methods to work on projects, and how to write your own rules in schema languages (like Schematron and Relax-NG) to keep your projects organized and prevent errors. You’ll gain experience with an international XML language called TEI (after the Text Encoding Initiative) which serves as the international standard for coding digital archives of cultural materials. Since one of the best and most widely accessible ways to publish XML is on the worldwide web, you’ll gain working experience with HTML code (a markup language that is a kind of XML) and styling HTML with Cascading Stylesheets (CSS). We will do all of this with an eye to your understanding how coding works—and no longer relying without question on expensive commercial software as the “only” available solution, because such software is usually not designed with our research questions in mind.

We think you’ll gain enough experience at least to become a little dangerous, and at the very least more independent as investigators and makers who wield computers as fit instruments for your own tasks. Your success will require patience, dedication, and regular communication and interaction with us, working through assignments on a daily basis. Your success will NOT require perfection, but rather your regular efforts throughout the course, your documenting of problems when your coding doesn’t yield the results you want. Homework exercises are a back-and-forth, intensive dialogue between you and your instructors, and we plan to spend a great deal of time with you individually over these as we work together. Our guiding principle in developing assignments and working with you is that the best way for you to learn and succeed is through regular practice as you hone your skills. Our goal is not to make you expert programmers (as we are far from that ourselves)! Rather, we want you to learn how to manipulate coding technologies for your own purposes, how to track down answers to questions, how to think your way algorithmically through problems and find good solutions.

Skimming the syllabus rekindles an awareness of the distinction between the “hard” sciences and the “difficult” ones.



After yesterday’s post, Elisa Beshero-Bondar tweeted this one course is now two:

At a new homepage: newtFire {dh|ds}!


Challenges of Electronic Dictionary Publication

Wednesday, February 17th, 2016

Challenges of Electronic Dictionary Publication

From the webpage:

April 8-9th, 2016

Venue: University of Leipzig, GWZ, Beethovenstr. 15; H1.5.16

This April we will be hosting our first Dictionary Journal workshop. At this workshop we will give an introduction to our vision of „Dictionaria“, introduce our data model and current workflow and will discuss (among others) the following topics:

  • Methodology and concept: How are dictionaries of „small“ languages different from those of „big“ languages and what does this mean for our endeavour? (documentary dictionaries vs. standard dictionaries)
  • Reviewing process and guidelines: How to review and evaluate a dictionary database of minor languages?
  • User-friendliness: What are the different audiences and their needs?
  • Submission process and guidelines: reports from us and our first authors on how to submit and what to expect
  • Citation: How to cite dictionaries?

If you are interested in attending this event, please send an e-mail to dictionary.journal[AT]

Workshop program

Our workshop program can now be downloaded here.

See the webpage for a list of confirmed participants, some with submitted abstracts.

Any number of topic map related questions arise in a discussion of dictionaries.

  • How to represent dictionary models?
  • What properties should be used to identify the subjects that represent dictionary models?
  • On what basis, if any, should dictionary models be considered the same or different? And for what purposes?
  • What data should be captured by dictionaries and how should it be identified?
  • etc.

Those are only a few of the questions that could be refined into dozens, if not hundreds of more, when you reach the details of constructing a dictionary.

I won’t be attending but wait with great anticipation the output from this workshop!

Digital Approaches to Hebrew Manuscripts

Friday, May 8th, 2015

Digital Approaches to Hebrew Manuscripts

Monday 18th – Tuesday 19th of May 2015

From the webpage:

We are delighted to announce the programme for On the Same Page: Digital Approaches to Hebrew Manuscripts at King’s College London. This two-day conference will explore the potential for the computer-assisted study of Hebrew manuscripts; discuss the intersection of Jewish Studies and Digital Humanities; and share methodologies. Amongst the topics covered will be Hebrew palaeography and codicology, the encoding and transcription of Hebrew texts, the practical and theoretical consequences of the use of digital surrogates and the visualisation of manuscript evidence and data. For the full programme and our Call for Posters, please see below.

Organised by the Departments of Digital Humanities and Theology & Religious Studies (Jewish Studies)
Co-sponsor: Centre for Late Antique & Medieval Studies (CLAMS), King’s College London

I saw this at the blog for DigiPal: Digital Resource and Database of Palaeolography, Manuscript Studies and Diplomatic. Confession, I have never understood how the English derive acronyms and this confounds me as much as you. 😉

Be sure to look around at the DigiPal site. There are numerous manuscript images, annotation techniques, and other resources for those who foster scholarship by contributing to it.

The Past, Present and Future of Scholarly Publishing

Saturday, January 3rd, 2015

The Past, Present and Future of Scholarly Publishing By Michael Eisen.

Michael made this presentation to the Commonwealth Club of California on March 12, 2013. This post is from the written text for the presentation and you can catch the audio here.

Michael does a great job tracing the history of academic publishing, the rise of open access and what is holding us back from a more productive publishing environment for everyone.

I disagree with his assessment of classification:

And as for classification, does anyone really think that assigning every paper to one of 10,000 journals, organized in a loose and chaotic hierarchy of topics and importance, is really the best way to help people browse the literature? This is a pure relic of a bygone era – an artifact of the historical accident that Gutenberg invented the printing press before Al Gore invented the Internet.

but will pass over that to address the more serious issue of open access publishing in the humanities.

Michael notes:

But the battle is by no means won. Open access collectively represents only around 10% of biomedical publishing, has less penetration in other sciences, and is almost non-existent in the humanities. And most scientists still send their best papers to “high impact” subscription-based journals.

There are open access journals in the humanities but it is fair to say they are few and far in between. If prestige is one of the drivers in scientific publishing, where large grant programs abound for some times of research, prestige is about the only driver for humanities publishing.

There are grant programs for the humanities but nothing on the scale of funding in the sciences. Salaries in the humanities are for the most part nothing to write home about. Humanities publishing really comes down to prestige.

Prestige from publication may be a dry, hard bone but it is the only bone that most humanities scholars will ever have. Try to take that away and you are likely to get bitten.

For instance, have you ever wondered about the proliferation of new translations of the Bible? Have we discovered new texts? New discoveries about biblical languages? Discovery of major mistakes in a prior edition? What if I said none of the above? To what would you assign the publication of new translations of the Bible?

If you compare the various translations you will find different “editors,” unless you are looking at a common source for bibles. Some sources do that as well. They create different “versions” for different target audiences.

With the exception of new versions like the New Revised Standard Version, which was undertaken to account for new information from the Dead Sea Scrolls, new editions of the Bible are primarily scholarly churn.

The humanities aren’t going to move any closer to open access publishing until their employers (universities) and funders, insist on open access publishing as a condition for tenure and funding.

I will address Michael’s mis-impressions about the value of classification another time. 😉

The Machines in the Valley Digital History Project

Friday, January 2nd, 2015

The Machines in the Valley Digital History Project by Jason Heppler.

From the post:

I am excited to finally release the digital component of my dissertation, Machines in the Valley.

My dissertation, Machines in the Valley, examines the environmental, economic, and cultural conflicts over suburbanization and industrialization in California’s Santa Clara Valley–today known as Silicon Valley–between 1945 and 1990. The high technology sector emerged as a key component of economic and urban development in the postwar era, particularly in western states seeking to diversify their economic activities. Industrialization produced thousands of new jobs, but development proved problematic when faced with competing views about land use. The natural allure that accompanied the thousands coming West gave rise to a modern environmental movement calling for strict limitations on urban growth, the preservation of open spaces, and the reduction of pollution. Silicon Valley stood at the center of these conflicts as residents and activists criticized the environmental impact of suburbs and industry in the valley. Debates over the Santa Clara Valley’s landscape tells the story not only of Silicon Valley’s development, but Americans’ changing understanding of nature and the environmental costs of urban and industrial development.

A great example of a digital project in the humanities!

How does Jason’s dissertation differ from a collection of resources on the same topic?

A collection of resources requires each of us to duplicate Jason’s work to extract the same information. Jason has curated the data, that is he has separated out the useful from the not so useful, eliminated duplicate sources that don’t contribute to the story, and provided his own analysis as a value-add to the existing data that he has organized. That means we don’t have to duplicate Jason’s work, for which we are all thankful.

How does Jason’s dissertation differ from a topic map on the same topic?

Take one of the coming soon topics for comparison:

“The Stanford Land Machine has Gone Berserk!” Stanford University and the Stanford Industrial Park (Coming Soon)

Stanford University is the largest landholder on the San Francisco Peninsula, controlling nearly 9,000 acres. In the 1950s, Stanford started acting as a real estate developer, first with the establishment of the Stanford Industrial Park in 1953 and later through several additional land development programs. These programs, however, ran into conflict with surrounding neighborhoods whose ideas for the land did not include industrialization.

Universities are never short on staff and alumni that they would prefer being staff and/or alumni from some other university. Jason will be writing about one or more such individuals under this topic. In the process of curation, he will select known details about such individuals as are appropriate for his discussion. It isn’t possible to include every known detail about any person, location, event, artifact, etc. No one would have time to read the argument being made in the dissertation.

In addition to the curation/editing process, there will be facts that Jason doesn’t uncover and/or that are unknown to anyone at present. If the governor of California can conceal an illegitimate child for ten years, it won’t be surprising to find other details about the people Jason discusses in his dissertation.

When such new information comes out, how do we put that together with the information already collected in Jason’s dissertation?

Unless you are expecting a second edition of Jason’s dissertation, the quick answer is we’re not. Not today, not tomorrow, not ever.

The current publishing paradigm is designed for republication, not incremental updating of publications. If new facts do appear and more likely enough time has passes that Jason’s dissertation is no longer “new,” some new PhD candidate will add new data, dig out the same data as Jason, and fashion a new dissertation.

If instead of imprisoning his data in prose, if Jason had his prose presentation for the dissertation and topics (as in topic maps) for the individuals, deeds, events, etc., then as more information is discovered, it could be fitted into his existing topic map of that data. Unlike the prose, a topic map doesn’t require re-publication in order to add new information.

In twenty or thirty years when Jason is advising some graduate student who wants to extend his dissertation, Jason can give them the topic map that has up to date data (or to be updated), making the next round of scholarship on this issue cumulative and not episodic.

DH Tools for Beginners

Wednesday, April 2nd, 2014

DH Tools for Beginners by Quinn Warnick.

A short collection of tutorials for “digital humanities novices.”

It is a good start and if you know of other resources or want to author such tutorials, please do.

I don’t know that I was ever entirely comfortable with the phrase “digital humanities.”

In part because it creates an odd division between humanists and humanists who use digital tools.

We don’t call literature scholars who use concordances “concordance humanists.”

Any more than we call scholars who use bibliographic materials “bibliographic humanists.”

Mostly because concordances and bibliographic materials are tools by which one does humanities research and scholarship.

Shouldn’t that be the same for “digital” humanities?

That digital tools are simply more tools for doing humanities research and scholarship?

Given the recent and ongoing assaults on the humanities in general, standing closer together and not further apart as humanists sounds like a good idea.

Research Opportunities and …

Tuesday, February 18th, 2014

Research Opportunities and Themes in Digital Scholarship by Professor Andrew Prescott.

Unlike death-by-powerpoint-slides, only four or five of these slides have much text at all.

Which makes them more difficult to interpret, absent the presentation. (So there is a downside to low-text slides.)

But the slides reference such a wide range and depth of humanities projects that you are likely to find them very useful.

Either as pointers to present projects or as inspiration for variations or entirely new projects.


More digital than thou

Thursday, January 9th, 2014

More digital than thou by Michael Sperberg-McQueen.

From the post:

An odd thing has started happening in reviews for the Digital Humanities conference: reviewers are objecting to papers if the reviewer thinks it has relevance beyond the field of DH, apparently on the grounds that the topic is then insufficiently digital. It doesn’t matter how relevant the topic is to work in DH, or how deeply embedded the topic is in a core DH topic like text encoding — if some reviewers don’t see a computer in the proposal, they want to exclude it from the conference.

Michael’s focus on the TEI (Text Encoding Initiative), XML Schema at the W3C, and other projects, kept him from seeing the ramparts being thrown up around digital humanities.

Well, and Michael is just Michael. Whether you are a long time XML hacker or a new comer, Michael is just Michael. When you are really good, you don’t need to cloak yourself in disciplinary robes, boundaries and secret handshakes.

You don’t have to look far in the “digital humanities” to find forums where hand wringing over the discipline of digital humanities is a regular feature. As opposed to concern over what digital technologies have, can, will contribute to the humanities.

Digital technologies should be as much a part of each humanities discipline as the more traditional periodical indexes, concordances, dictionaries and monographs.

After all, I thought there was general agreement that “separate but equal” was a poor policy.

9th International Digital Curation Conference

Tuesday, October 15th, 2013

Commodity, catalyst or change-agent? Data-driven transformations in research, education, business & society.

From the post:

24 – 27 February 2014
Omni San Francisco Hotel, San Francisco



The 9th International Digital Curation Conference (IDCC) will be held from Monday 24 February to Thursday 27 February 2014 at the Omni San Francisco Hotel (at Montgomery).

The Omni hotel is in the heart of downtown San Francisco. It is located right on the cable car line and is only a short walk to Union Square, the San Francisco neighborhood that has become a mecca for high-end shopping and art galleries.

This year the IDCC will focus on how data-driven developments are changing the world around us, recognising that the growing volume and complexity of data provides institutions, researchers, businesses and communities with a range of exciting opportunities and challenges. The Conference will explore the expanding portfolio of tools and data services, as well as the diverse skills that are essential to explore, manage, use and benefit from valuable data assets. The programme will reflect cultural, technical and economic perspectives and will illustrate the progress made in this arena in recent months

There will be a programme of workshops on Monday 24 and Thursday 27 February. The main conference programme will run from Tuesday 25 – Wednesday 26 February.

Registration will open in October (but it doesn’t say when in October).

While you are waiting:

Our last IDCC took place in Amsterdam, 14-17 January 2013. If you were not able to attend you can now access all the presentations, videos and photos online, and much more!


Computational Folkloristics

Friday, January 18th, 2013

JAF Special Issue 2014 : Computational Folkloristics – Special Issue of the Journal of American Folklore

I wasn’t able to confirm this call at the Journal of American Folklore, but wanted to pass it along anyway.

There are few areas with the potential for semantic mappings as rich as folklore. A natural for topic maps.

From the call I cite above:

Submission Deadline Jun 15, 2013
Notification Due Aug 1, 2013
Final Version Due Oct 1, 2013

Over the course of the past decade, a revolution has occurred in the materials available for the study of folklore. The scope of digital archives of traditional expressive forms has exploded, and the magnitude of machine-readable materials available for consideration has increased by many orders of magnitude. Many national archives have made significant efforts to make their archival resources machine-readable, while other smaller initiatives have focused on the digitization of archival resources related to smaller regions, a single collector, or a single genre. Simultaneously, the explosive growth in social media, web logs (blogs), and other Internet resources have made previously hard to access forms of traditional expressive culture accessible at a scale so large that it is hard to fathom. These developments, coupled to the development of algorithmic approaches to the analysis of large, unstructured data and new methods for the visualization of the relationships discovered by these algorithmic approaches – from mapping to 3-D embedding, from time-lines to navigable visualizations – offer folklorists new opportunities for the analysis of traditional expressive forms. We label approaches to the study of folklore that leverage the power of these algorithmic approaches “Computational Folkloristics” (Abello, Broadwell, Tangherlini 2012).

The Journal of American Folklore invites papers for consideration for inclusion in a special issue of the journal edited by Timothy Tangherlini that focuses on “Computational Folkloristics.” The goal of the special issue is to reveal how computational methods can augment the study of folklore, and propose methods that can extend the traditional reach of the discipline. To avoid confusion, we term those approaches “computational” that make use of algorithmic methods to assist in the interpretation of relationships or structures in the underlying data. Consequently, “Computational Folkloristics” is distinct from Digital Folklore in the application of computation to a digital representation of a corpus.

We are particularly interested in papers that focus on: the automatic discovery of narrative structure; challenges in Natural Language Processing (NLP) related to unlabeled, multilingual data including named entity detection and resolution; topic modeling and other methods that explore latent semantic aspects of a folklore corpus; the alignment of folklore data with external historical datasets such as census records; GIS applications and methods; network analysis methods for the study of, among other things, propagation, community detection and influence; rapid classification of unlabeled folklore data; search and discovery on and across folklore corpora; modeling of folklore processes; automatic labeling of performance phenomena in visual data; automatic classification of audio performances. Other novel approaches to the study of folklore that make use of algorithmic approaches will also be considered.

A significant challenge of this special issue is to address these issues in a manner that is directly relevant to the community of folklorists (as opposed to computer scientists). Articles should be written in such a way that the argument and methods are accessible and understandable for an audience expert in folklore but not expert in computer science or applied mathematics. To that end, we encourage team submissions that bridge the gap between these disciplines. If you are in doubt about whether your approach or your target domain is appropriate for consideration in this special issue, please email the issue editor, Timothy Tangherlini at, using the subject line “Computational Folkloristics query”. Deadline for all queries is April 1, 2013.

Timothy Tangherlini homepage.

Something to look forward to!

Where to start with text mining

Wednesday, August 15th, 2012

Where to start with text mining by Ted Underwood.

From the post:

This post is less a coherent argument than an outline of discussion topics I’m proposing for a workshop at NASSR2012 (a conference of Romanticists). But I’m putting this on the blog since some of the links might be useful for a broader audience. Also, we won’t really cover all this material, so the blog post may give workshop participants a chance to explore things I only gestured at in person.

In the morning I’ll give a few examples of concrete literary results produced by text mining. I’ll start the afternoon workshop by opening two questions for discussion: first, what are the obstacles confronting a literary scholar who might want to experiment with quantitative methods? Second, how do those methods actually work, and what are their limits?

I’ll also invite participants to play around with a collection of 818 works between 1780 and 1859, using an R program I’ve provided for the occasion. Links for these materials are at the end of this post.

Something to pass along to any humanities scholars that you know, who aren’t already into text mining.

I first saw this at: primer for digital humanities.

A Workflow for Digital Research Using Off-the-Shelf Tools

Monday, August 15th, 2011

A Workflow for Digital Research Using Off-the-Shelf Tools by William J. Turkel.

An excellent overview of useful tools for digital research.

One or more of these will be useful in authoring your next topic map.