Archive for the ‘Humanities’ Category

Academic Torrents Update

Friday, November 3rd, 2017

When I last mentioned Academic Torrents, in early 2014, it had 1.67TB of research data.

I dropped by Academic Torrents this week to find it now has 25.53TB of research data!

Some arbitrary highlights:

Richard Feynman’s Lectures on Physics (The Messenger Lectures)

A collection of sport activity datasets for data analysis and data mining 2017a

[Coursera] Machine Learning (Stanford University) (ml)

UC Berkeley Computer Science Courses (Full Collection)

[Coursera] Mining Massive Datasets (Stanford University) (mmds)

Wikilinks: A Large-scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia (Original Dataset)

Your arbitrary highlights are probably different than mine so visit Academic Torrents to see what data captures your eye.

Enjoy!

It’s more than just overlap: Text As Graph

Wednesday, August 2nd, 2017

It’s more than just overlap: Text As Graph – Refining our notion of what text really is—this time for sure! by Ronald Haentjens Dekker and David J. Birnbaum.

Abstract:

The XML tree paradigm has several well-known limitations for document modeling and processing. Some of these have received a lot of attention (especially overlap), and some have received less (e.g., discontinuity, simultaneity, transposition, white space as crypto-overlap). Many of these have work-arounds, also well known, but—as is implicit in the term “work-around”—these work-arounds have disadvantages. Because they get the job done, however, and because XML has a large user community with diverse levels of technological expertise, it is difficult to overcome inertia and move to a technology that might offer a more comprehensive fit with the full range of document structures with which researchers need to interact both intellectually and programmatically. A high-level analysis of why XML has the limitations it has can enable us to explore how an alternative model of Text as Graph (TAG) might address these types of structures and tasks in a more natural and idiomatic way than is available within an XML paradigm.

Hyperedges, texts and XML, what more could you need? 😉

This paper merits a deep read and testing by everyone interested in serious text modeling.

You can’t read the text but here is a hypergraph visualization of an excerpt from Lewis Carroll’s “The hunting of the Snark:”

The New Testament, the Hebrew Bible, to say nothing of the Rabbinic commentaries on the Hebrew Bible and centuries of commentary on other texts could profit from this approach.

Put your text to the test and share how to advance this technique!

Tired of Chasing Ephemera? Open Greek and Latin Design Sprint (bids in August, 2017)

Thursday, July 27th, 2017

Tired of reading/chasing the ephemera explosion in American politics?

I’ve got an opportunity for you to contribute to a project with texts preserved by hand for thousands of years!

Design Sprint for Perseus 5.0/Open Greek and Latin

From the webpage:

We announced in June that Center for Hellenic Studies had signed a contract with Intrepid.io to conduct a design sprint that would support Perseus 5.0 and the Open Greek and Latin collection that it will include. Our goal was to provide a sample model for a new interface that would support searching and reading of Greek, Latin, and other historical languages. The report from that sprint was handed over to CHS yesterday and we, in turn, have made these materials available, including both the summary presentation and associated materials. The goal is to solicit comment and to provide potential applicants to the planned RFP with access to this work as soon as possible.

The sprint took just over two weeks and was an intensive effort. An evolving Google Doc with commentary on the Intrepid Wrap-up slides for the Center for Hellenic studies should now be visible. Readers of the report will see that questions remain to be answered. How will we represent Perseus, Open Greek and Latin, Open Philology, and other efforts? One thing that we have added and that will not change will be the name of the system that this planned implementation phase will begin: whether it is Perseus, Open Philology or some other name, it will be powered by the Scaife Digital Library Viewer, a name that commemorates Ross Scaife, pioneer of Digital Classics and a friend whom many of us will always miss.

The Intrepid report also includes elements that we will wish to develop further — students of Greco-Roman culture may not find “relevance” a helpful way to sort search reports. The Intrepid Sprint greatly advanced our own thinking and provided us with a new starting point. Anyone may build upon the work presented here — but they can also suggest alternate approaches.

The core deliverables form an impressive list:

At the moment we would summarize core deliverables as:

  1. A new reading environment that captures the basic functionality of the Perseus 4.0 reading environment but that is more customizable and that can be localized efficiently into multiple modern languages, with Arabic, Persian, German and English as the initial target languages. The overall Open Greek and Latin team is, of course, responsible for providing the non-English content. The Scaife DL Viewer should make it possible for us to localize into multiple languages as efficiently as possible.
  2. The reading environment should be designed to support any CTS-compliant collection and should be easily configured with a look and feel for different collections.
  3. The reading environment should contain a lightweight treebank viewer — we don’t need to support editing of treebanks in the reading environment. The functionality that the Alpheios Project provided for the first book of the Odyssey would be more than adequate. Treebanks are available under the label “diagram” when you double-click on a Greek word.
  4. The reading environment should support dynamic word/phrase level alignments between source text and translation(s). Here again, the The functionality that the Alpheios Project provided for the first book of the Odyssey would be adequate. More recent work implementing this functionality is visible at Tariq Yousef’s work at http://divan-hafez.com/ and http://ugarit.ialigner.com/.
  5. The system must be able to search for both specific inflected forms and for all forms of a particular word (as in Perseus 4.0) in CTS-compliant epiDoc TEI XML. The search will build upon the linguistically analyzed texts available in https://github.com/gcelano/CTSAncientGreekXML. This will enable searching by dictionary entry, by part of speech, and by inflected form. For Greek, the base collection is visible at the First Thousand Years of Greek website (which now has begun to accumulate a substantial amount of later Greek). CTS-compliant epiDoc Latin texts can be found at https://github.com/OpenGreekAndLatin/csel-dev/tree/master/data and https://github.com/PerseusDL/canonical-latinLit/tree/master/data.
  6. The system should ideally be able to search Greek and Latin that is available only as uncorrected OCR-generated text in hOCR format. Here the results may follow the image-front strategy familiar to academics from sources such as Jstor. If it is not feasible to integrate this search within the three months of core work, then we need a plan for subsequent integration that Leipzig and OGL members can implement later.
  7. The new system must be scalable and updating from Lucene to Elasticsearch is desirable. While these collections may not be large by modern standards, they are substantial. Open Greek and Latin currently has c. 67 million words of Greek and Latin at various stages of post-processing and c. 90 million words of addition translations from Greek and Latin into English,French, German and Italian, while the Lace Greek OCR Project has OCR-generated text for 1100 volumes.
  8. The system integrate translations and translation alignments into the searching system, so that users can search either in the original or in modern language translations where we provide this data. This goes back to work by David Bamman in the NEH-funded Dynamic Lexicon Project (when he was a researcher at Perseus at Tufts). For more recent examples of this, see http://divan-hafez.com/ and Ugarit. Note that one reason to adopt CTS URNs is to simplify the task of display translations of source texts — the system is only responsible for displaying translations insofar as they are available via the CTS API.
  9. The system must provide initial support for a user profile. One benefit of the profile is that users will be able to define their own reading lists — and the Scaife DL Viewer will then be able to provide personalized reading support, e.g., word X already showed up in your reading at places A, B, and C, while word Y, which is new to you, will appear 12 times in the rest of your planned readings (i.e., you should think about learning that word). By adopting the CTS data model, we can make very precise reading lists, defining precise selections from particular editions of particular works. We also want to be able to support an initial set of user contributions that are (1) easy to implement technically and (2) easy for users to understand and perform. Thus we would support fixing residual data entry errors, creating alignments between source texts and translations, improving automated part of speech tagging and lemmatization but users would go to external resources to perform more complex tasks such as syntactic markup (treebanking).
  10. We would welcome a bids that bring to bear expertise in the EPUB format and that could help develop a model for representing for representing CTS-compliant Greek and Latin sources in EPUB as a mechanism to make these materials available on smartphones. We can already convert our TEI XML into EPUB. The goal here is to exploit the easiest ways to optimize the experience. We can, for example, convert one or more of our Greek and Latin lexica into the EPUB Dictionary format and use our morphological analyses to generate links from particular forms in a text to the right dictionary entry or entries. Can we represent syntactically analyzed sentences with SVG? Can we include dynamic translation alignments?
  11. Bids should consider including a design component. We were very pleased with the Design Sprint that took place in July 2017 and would like to include a follow-up Design Sprint in early 2018 that will consider (1) next steps for Greek and Latin and (2) generalizing our work to other historical languages. This Design Sprint might well go to a separate contractor (thus providing us also with a separate point of view on the work done so far).
  12. Work must be build upon the Canonical Text Services Protocol. Bids should be prepared to build upon https://github.com/Capitains, but should also be able to build upon other CTS servers (e.g., https://github.com/ThomasK81/LightWeightCTSServer and cts.informatik.uni-leipzig.de).
  13. All source code must be available on Github under an appropriate open license so that third parties can freely reuse and build upon it.
  14. Source code must be designed and documented to facilitate actual (not just legally possible) reuse.
  15. The contractor will have the flexibility to get the job done but will be expected to work as closely as possible with, and to draw wherever possible upon the on-going work done by, the collaborators who are contributing to Open Greek and Latin. The contractor must have the right to decide how much collaboration makes sense.

You can use your data science skills to sell soap, cars, ED treatments, or even apocalyptically narcissistic politicians, or, you can advance Perseus 5.0.

Your call.

Locate Your Representative/Senator In Hell

Thursday, July 13th, 2017

Mapping Dante’s Inferno, One Circle of Hell at a Time by Anika Burgess.

From the post:

I found myself, in truth, on the brink of the valley of the sad abyss that gathers the thunder of an infinite howling. It was so dark, and deep, and clouded, that I could see nothing by staring into its depths.”

This is the vision that greets the author and narrator upon entry the first circle of Hell—Limbo, home to honorable pagans—in Dante Alighieri’s Inferno, the first part of his 14th-century epic poem, Divine Comedy. Before Dante and his guide, the classical poet Virgil, encounter Purgatorio and Paradiso, they must first journey through a multilayered hellscape of sinners—from the lustful and gluttonous of the early circles to the heretics and traitors that dwell below. This first leg of their journey culminates, at Earth’s very core, with Satan, encased in ice up to his waist, eternally gnawing on Judas, Brutus, and Cassius (traitors to God) in his three mouths. In addition to being among the greatest Italian literary works, Divine Comedy also heralded a craze for “infernal cartography,” or mapping the Hell that Dante had created.
… (emphasis in original)

Burgess has collected seven (7) traditional maps of the Inferno. I take them to be early essays in the art of visualization. They are by no means, individually or collectively, the definitive visualizations of the Inferno.

The chief deficit of all seven, to me, is the narrowness of the circles/ledges. As I read the Inferno, Dante and Virgil are not pressed for space. Expanding and populating the circles more realistically is one starting point.

The Inferno has no shortage of characters in each circle, Dante predicting the fate of Pope Boniface VIII, to place him in the eight circle of Hell (simoniacs A subclass of fraud.). (Use the online Britannica with caution. It’s entry for Boniface VIII doesn’t even mention the Inferno. (As of July 13, 2017.)

I would like to think being condemned to Hell by no less than Dante would rate at least a mention in my biography!

Sadly, Dante is no longer around to add to the populace of the Inferno but new visualizations could take the opportunity to update the resident list for Hell!

It’s an exercise in visualization, mapping, 14th century literature, and, an excuse to learn the name of your representative and senators.

Enjoy!

The Classical Language Toolkit

Tuesday, July 11th, 2017

The Classical Language Toolkit

From the webpage:

The Classical Language Toolkit (CLTK) offers natural language processing (NLP) support for the languages of Ancient, Classical, and Medieval Eurasia. Greek and Latin functionality are currently most complete.

Goals

  • compile analysis-friendly corpora;
  • collect and generate linguistic data;
  • act as a free and open platform for generating scientific research.

You are sure to find one or more languages of interest:

Collecting, analyzing and mapping Tweets can be profitable and entertaining, but tomorrow or perhaps by next week, almost no one will read them again.

The texts in this project survived by hand preservation for thousands of years. People are still reading them.

How about you?

Roman Roads (Drawn Like The London Subway)

Thursday, June 8th, 2017

Roman Roads by Sasha Trubetskoy.

See Trubetskoy’s website for a much better rendering of this map of Roman roads, drawn in subway-style.

From the post:

It’s finally done. A subway-style diagram of the major Roman roads, based on the Empire of ca. 125 AD.

Creating this required far more research than I had expected—there is not a single consistent source that was particularly good for this. Huge shoutout to: Stanford’s ORBIS model, The Pelagios Project, and the Antonine Itinerary (found a full PDF online but lost the url).

The lines are a combination of actual, named roads (like the Via Appia or Via Militaris) as well as roads that do not have a known historic name (in which case I creatively invented some names). Skip to the “Creative liberties taken” section for specifics.

How long would it actually take to travel this network? That depends a lot on what method of transport you are using, which depends on how much money you have. Another big factor is the season – each time of year poses its own challenges. In the summer, it would take you about two months to walk on foot from Rome to Byzantium. If you had a horse, it would only take you a month.

However, no sane Roman would use only roads where sea travel is available. Sailing was much cheaper and faster – a combination of horse and sailboat would get you from Rome to Byzantium in about 25 days, Rome to Carthage in 4-5 days. Check out ORBIS if you want to play around with a “Google Maps” for Ancient Rome. I decided not to include maritime routes on the map for simplicity’s sake.

Subway-style drawing lose details but make relationships between routes clearer. Or at least that is one of the arguments in their favor.

Thoughts on a subway-style drawing that captures the development of the Roman road system? To illustrate how that corresponds in broad strokes to the expansion of Rome?

Be sure to visit Trubetskoy’s homepage. Lot’s of interesting maps and projects.

Medieval illuminated manuscripts

Thursday, June 8th, 2017

Medieval illuminated manuscripts by Robert Miller (reference and instruction librarian at the University of Maryland University College)

From the post:

With their rich representation of medieval life and thought, illuminated manuscripts serve as primary sources for scholars in any number of fields: history, literature, art history, women’s studies, religious studies, philosophy, the history of science, and more.

But you needn’t be conducting research to immerse yourself in the world of medieval manuscripts. The beauty, pathos, and earthy humor of illuminated manuscripts make them a delight for all. Thanks to digitization efforts by libraries and museums worldwide, the colorful creations of the medieval imagination—dreadful demons, armies of Amazons, gardens, gems, bugs, birds, celestial vistas, and simple scenes of everyday life—are easily accessible online.

I count:

  • 10 twitter accounts to follow/search
  • 11 sites with manuscript collections
  • 15 blogs and other manuscript sites

A great resource for students of all ages who are preparing research papers!

Enjoy and pass this one along!

Where the Greeks and Romans White Supremacists?

Wednesday, June 7th, 2017

Why We Need to Start Seeing the Classical World in Color by Sarah E. Bond.

From the post:

Modern technology has revealed an irrefutable, if unpopular, truth: many of the statues, reliefs, and sarcophagi created in the ancient Western world were in fact painted. Marble was a precious material for Greco-Roman artisans, but it was considered a canvas, not the finished product for sculpture. It was carefully selected and then often painted in gold, red, green, black, white, and brown, among other colors.

A number of fantastic museum shows throughout Europe and the US in recent years have addressed the issue of ancient polychromy. The Gods in Color exhibit travelled the world between 2003–15, after its initial display at the Glyptothek in Munich. (Many of the photos in this essay come from that exhibit, including the famed Caligula bust and the Alexander Sarcophagus.) Digital humanists and archaeologists have played a large part in making those shows possible. In particular, the archaeologist Vinzenz Brinkmann, whose research informed Gods in Color, has done important work, applying various technologies and ultraviolet light to antique statues in order to analyze the minute vestiges of paint on them and then recreate polychrome versions.

Acceptance of polychromy by the public is another matter. A friend peering up at early-20th-century polychrome terra cottas of mythological figures at the Philadelphia Museum of Art once remarked to me: “There is no way the Greeks were that gauche.” How did color become gauche? Where does this aesthetic disgust come from? To many, the pristine whiteness of marble statues is the expectation and thus the classical ideal. But the equation of white marble with beauty is not an inherent truth of the universe. Where this standard came from and how it continues to influence white supremacist ideas today are often ignored.

Most museums and art history textbooks contain a predominantly neon white display of skin tone when it comes to classical statues and sarcophagi. This has an impact on the way we view the antique world. The assemblage of neon whiteness serves to create a false idea of homogeneity — everyone was very white! — across the Mediterranean region. The Romans, in fact, did not define people as “white”; where, then, did this notion of race come from?

A great post and reminder that learning history (or current events) through a particular lens isn’t the same as the only view of history (or current events).

I originally wrote “an accurate view of history….” but that’s not true. At best we have one or more views and when called upon to act, make decisions upon those views. “Accuracy” is something that lies beyond our human grasp.

The reminder I would add to this post is that recognition of a lens, in this case, the absence of color in our learning of history, isn’t overcome by our naming it and perhaps nodding in agreement, yes, that was a short fall in our learning.

“Knowing” about the coloration of familiar art work doesn’t erase centuries of considering it without color. No amount of pretending will make it otherwise.

Humanists should learn about and promote the use of colorization so the youth of today learn different traditions than the ones we learned.

Digital Humanities / Studies: U.Pitt.Greenberg

Wednesday, February 1st, 2017

Digital Humanities / Studies: U.Pitt.Greenberg maintained by Elisa E. Beshero-Bondar.

I discovered this syllabus and course materials by accident when one of its modules on XQuery turned up in a search. Backing out of that module I discovered this gem of a digital humanities course.

The course description:

Our course in “digital humanities” and “digital studies” is designed to be interdisciplinary and practical, with an emphasis on learning through “hands-on” experience. It is a computer course, but not a course in which you learn programming for the sake of learning a programming language. It’s a course that will involve programming, and working with coding languages, and “putting things online,” but it’s not a course designed to make you, in fifteen weeks, a professional website designer. Instead, this is a course in which we prioritize what we can investigate in the Humanities and related Social Sciences fields about cultural, historical, and literary research questions through applications in computer coding and programming, which you will be learning and applying as you go in order to make new discoveries and transform cultural objects—what we call “texts” in their complex and multiple dimensions. We think of “texts” as the transmittable, sharable forms of human creativity (mainly through language), and we interface with a particular text in multiple ways through print and electronic “documents.” When we refer to a “document,” we mean a specific instance of a text, and much of our work will be in experimenting with the structures of texts in digital document formats, accessing them through scripts we write in computer code—scripts that in themselves are a kind of text, readable both by humans and machines.

Your professors are scholars and teachers of humanities, not computer programmers by trade, and we teach this course from our backgrounds (in literature and anthropology, respectively). We teach this course to share coding methods that are highly useful to us in our fields, with an emphasis on working with texts as artifacts of human culture shaped primarily with words and letters—the forms of “written” language transferable to many media (including image and sound) that we can study with computer modelling tools that we design for ourselves based on the questions we ask. We work with computers in this course as precision instruments that help us to read and process great quantities of information, and that lead us to make significant connections, ask new kinds of questions, and build models and interfaces to change our reading and thinking experience as people curious about human history, culture, and creativity.

Our focus in this course is primarily analytical: to apply computer technologies to represent and investigate cultural materials. As we design projects together, you will gain practical experience in editing and you will certainly fine-tune your precision in writing and thinking. We will be working primarily with eXtensible Markup Language (XML) because it is a powerful tool for modelling texts that we can adapt creatively to our interests and questions. XML represents a standard in adaptability and human-readability in digital code, and it works together with related technologies with which you will gain working experience: You’ll learn how to write XPath expressions: a formal language for searching and extracting information from XML code which serves as the basis for transforming XML into many publishable forms, using XSLT and XQuery. You’ll learn to write XSLT: a programming “stylesheet” transforming language designed to convert XML to publishable formats, as well as XQuery, a query (or search) language for extracting information from XML files bundled collectively. You will learn how to design your own systematic coding methods to work on projects, and how to write your own rules in schema languages (like Schematron and Relax-NG) to keep your projects organized and prevent errors. You’ll gain experience with an international XML language called TEI (after the Text Encoding Initiative) which serves as the international standard for coding digital archives of cultural materials. Since one of the best and most widely accessible ways to publish XML is on the worldwide web, you’ll gain working experience with HTML code (a markup language that is a kind of XML) and styling HTML with Cascading Stylesheets (CSS). We will do all of this with an eye to your understanding how coding works—and no longer relying without question on expensive commercial software as the “only” available solution, because such software is usually not designed with our research questions in mind.

We think you’ll gain enough experience at least to become a little dangerous, and at the very least more independent as investigators and makers who wield computers as fit instruments for your own tasks. Your success will require patience, dedication, and regular communication and interaction with us, working through assignments on a daily basis. Your success will NOT require perfection, but rather your regular efforts throughout the course, your documenting of problems when your coding doesn’t yield the results you want. Homework exercises are a back-and-forth, intensive dialogue between you and your instructors, and we plan to spend a great deal of time with you individually over these as we work together. Our guiding principle in developing assignments and working with you is that the best way for you to learn and succeed is through regular practice as you hone your skills. Our goal is not to make you expert programmers (as we are far from that ourselves)! Rather, we want you to learn how to manipulate coding technologies for your own purposes, how to track down answers to questions, how to think your way algorithmically through problems and find good solutions.

Skimming the syllabus rekindles an awareness of the distinction between the “hard” sciences and the “difficult” ones.

Enjoy!

Update:

After yesterday’s post, Elisa Beshero-Bondar tweeted this one course is now two:

At a new homepage: newtFire {dh|ds}!

Enjoy!

Humanities Digital Library [A Ray of Hope]

Friday, January 13th, 2017

Humanities Digital Library (Launch Event)

From the webpage:

Date
17 Jan 2017, 18:00 to 17 Jan 2017, 19:00

Venue

IHR Wolfson Conference Suite, NB01/NB02, Basement, IHR, Senate House, Malet Street, London WC1E 7HU

Description

6-7pm, Tuesday 17 January 2017

Wolfson Conference Suite, Institute of Historical Research

Senate House, Malet Street, London, WC1E 7HU

www.humanities-digital-library.org

About the Humanities Digital Library

The Humanities Digital Library is a new Open Access platform for peer reviewed scholarly books in the humanities.

The Library is a joint initiative of the School of Advanced Study, University of London, and two of the School’s institutes—the Institute of Historical Research and the Institute of Advanced Legal Studies.

From launch, the Humanities Digital Library offers scholarly titles in history, law and classics. Over time, the Library will grow to include books from other humanities disciplines studied and researched at the School of Advanced Study. Partner organisations include the Royal Historical Society whose ‘New Historical Perspectives’ series will appear in the Library, published by the Institute of Historical Research.

Each title is published as an open access PDF, with copies also available to purchase in print and EPUB formats. Scholarly titles come in several formats—including monographs, edited collections and longer and shorter form works.
(emphasis in the original)

Timely evidence that not everyone in the UK is barking mad! “Barking mad” being the only explanation I can offer for the Investigatory Powers Bill.

I won’t be attending but if you can, do and support the Humanities Digital Library after it opens.

War and Peace & R

Friday, December 2nd, 2016

No, not a post about R versus Python but about R and Tolstoy‘s War and Peace.

Using R to Gain Insights into the Emotional Journeys in War and Peace by Wee Hyong Tok.

From the post:

How do you read a novel in record time, and gain insights into the emotional journey of main characters, as they go through various trials and tribulations, as an exciting story unfolds from chapter to chapter?

I remembered my experiences when I start reading a novel, and I get intrigued by the story, and simply cannot wait to get to the last chapter. I also recall many conversations with friends on some of the interesting novels that I have read awhile back, and somehow have only vague recollection of what happened in a specific chapter. In this post, I’ll work through how we can use R to analyze the English translation of War and Peace.

War and Peace is a novel by Leo Tolstoy, and captures the salient points about Russian history from the period 1805 to 1812. The novel consists of the stories of five families, and captures the trials and tribulations of various characters (e.g. Natasha and Andre). The novel consists of about 1400 pages, and is one of the longest novels that have been written.

We hypothesize that if we can build a dashboard (shown below), this will allow us to gain insights into the emotional journey undertaken by the characters in War and Peace.

Impressive work, even though I would not use it as a short-cut to “read a novel in record time.”

Rather I take this as an alternative way of reading War and Peace, one that can capture insights a casual reader may miss.

Moreover, the techniques demonstrated here could be used with other works of literature, or even non-fictional works.

Imagine conducting this analysis over the reportedly more than 7,000 page full CIA Torture Report, for example.

A heatmap does not connect any dots, but points a user towards places where interesting dots may be found.

Certainly a tool for exploring large releases/leaks of text data.

Enjoy!

PS: Large, tiresome, obscure-on-purpose, government reports to practice on with this method?

Practical Palaeography: Recreating the Exeter Book in a Modern Day ‘Scriptorium’

Tuesday, November 22nd, 2016

Practical Palaeography: Recreating the Exeter Book in a Modern Day ‘Scriptorium’

From the post:

Dr Johanna Green is a lecturer in Book History and Digital Humanities at the University of Glasgow. Her PhD (English Language, University of Glasgow 2012) focused on a palaeographical study of the textual division and subordination of the Exeter Book manuscript. Here, she tells us about the first of two sessions she led for the Society of Northumbrian Scribes, a group of calligraphers based in North East England, bringing palaeographic research and modern-day calligraphy together for the public.
(emphasis in original)

Not phrased in subject identity language, but concerns familiar to the topic map community are not far away:


My own research centres on the scribal hand of the manuscript, specifically the ways in which the poems are divided and subdivided from one another and the decorative designs used for these litterae notabiliores throughout. For much of my research, I have spent considerable time (perhaps more than I am willing to admit) wondering where one ought to draw the line with palaeography. When do the details become so tiny to no longer be of any significance? When are they just important enough to mean something significant for our understanding of how the manuscript was created and arranged? How far am I willing to argue that these tiny features have significant impact? Is, for example, this littera notabilior Đ on f. 115v (Judgement Day I, left) different enough in a significant way to this H on f.97v, (The Partridge, bottom right), and in turn are both of these litterae notabiliores performing a different function than the H on f.98r (Soul and Body II, far right)?[5]
(emphasis in original, footnote omitted)

When Dr. Green says:

…When do the details become so tiny to no longer be of any significance?…

I would say: When do the subjects (details) become so tiny we want to pass over them in silence? That is they could be but are not represented in a topic map.

Green ends her speculation, to a degree, by enlisting scribes to re-create the manuscript of interest under her observation.

I’ll leave her conclusions for her post but consider a secondary finding:


The experience also made me realise something else: I had learned much by watching them write and talking to them during the process, but I had also learned much by trying to produce the hand myself. Rather than return to Glasgow and teach my undergraduates the finer details of the script purely through verbal or written description, perhaps providing space for my students to engage in the materials of manuscript production, to try out copying a script/exemplar for themselves would help increase their understanding of the process of writing and, in turn, deepen their knowledge of the constituent parts of a letter and their significance in palaeographic endeavour. This last is something I plan to include in future palaeography teaching.

Dr. Green’s concern over palaeographic detail illustrates two important points about topic maps:

  1. Potential subjects for a topic map are always unbounded.
  2. Different people “see” different subjects.

Which also account for my yawn when Microsoft drops the Microsoft Concept Graph of more than 5.4 million concepts.

…[M]ore than 5.4 million concepts[?]

Hell, Copleston’s History of Western Philosophy easily has more concepts.

But the Microsoft Concept Graph is more useful than a topic map of Copleston in your daily, shallow, social sea.

What subjects do you see and how would capturing them and their identities make a difference in your life (professional or otherwise)?

S20-211a Hebrew Bible Technology Buffet – November 20, 2016 (save that date!)

Tuesday, October 18th, 2016

S20-211a Hebrew Bible Technology Buffet

From the webpage:

On Sunday, November 20th 2016, from 1:00 PM to 3:30 PM, GERT will host a session with the theme “Hebrew Bible Technology Buffet” at the SBL Annual Meeting in room 305 of the Convention Center. Barry Bandstra of Hope College will preside.

The session has four presentations:

Presentations will be followed by a discussion session.

You will need to register for the Annual Meeting to attend the session.

Assuming they are checking “badges” to make sure attendees have registered. Registration is very important to those who “foster” biblical scholarship by comping travel and rooms for their close friends.

PS: The website reports non-member registration is $490.00. I would like to think that is a mis-print but I suspect its not.

That’s one way to isolate yourself from an interested public. By way of contrast, snail-mail Biblical Greek courses in the 1890’s had tens of thousands of subscribers. When academics complain of being marginalized, use this as an example of self-marginalization.

DATNAV: …Navigate and Integrate Digital Data in Human Rights Research [Ethics]

Wednesday, August 24th, 2016

DATNAV: New Guide to Navigate and Integrate Digital Data in Human Rights Research by Zara Rahman.

From the introduction in the Guide:

From online videos of rights violations, to satellite images of environmental degradation, to eyewitness accounts disseminated on social media, we have access to more relevant data today than ever before. When used responsibly, this data can help human rights professionals in the courtroom, when working with governments and journalists, and in documenting historical record.

Acquiring, disseminating and storing digital data is also becoming increasingly affordable. As costs continue to decrease and new platforms are
developed, opportunities for harnessing these data sources for human rights work increase.

But integrating data collection and management into the day to day work of human rights research and documentation can be challenging, even overwhelming, for individuals and organisations. This guide is designed to help you navigate and integrate new data forms into your human rights work.

It is the result of a collaboration between Amnesty International, Benetech, and The Engine Room that began in late 2015. We conducted a series of interviews, community consultations, and surveys to understand whether digital data was being integrated into human rights work. In the vast majority of cases, we found that it wasn’t. Why?

Mainly, human rights researchers appeared to be overwhelmed by the possibilities. In the face of limited resources, not knowing how to get started or whether it would be worthwhile, most people we spoke to refrained from even attempting to strengthen their work with digital data.

To support everyone in the human rights field in navigating this complex environment, we convened a group of 16 researchers and technical experts in a castle outside Berlin, Germany in May 2016 to draft this guide over four days of intense reflection and writing.

There are additional reading resources at: https://engn.it/datnav.

The issue of ethics comes up quickly in human rights research and here the authors write:

Seven things to consider before using digital data for human rights

  1. Would digital data genuinely help answer your research questions? What are the pros and cons of the particular source or medium? What might you learn from past uses of similar technology?
  2. What sources are likely to be collecting or capturing the kinds of information you need? What is the context in which it is being produced and used? Will the people or organisations on which your work is focused be receptive to these types of data?
  3. How easily will new forms of data integrate into your existing workflow? Do you realistically have the time and money to collect, store, analyze and especially to verify this data? Can anyone on your team comfortably support the technology?
  4. Who owns or controls the data you will be using? Companies, government, or adversaries? How difficult is it to get? Is it a fair or legal collection method? What is the internal stance on this? Do you have true informed consent from individuals?
  5. How will digital divides and differences in local access to online platforms, computers or phones, affect representation of different populations? Would conclusions based on the data reinforce inequalities, stereotypes or blind spots?
  6. Are organisational protocols for confidentiality and security in digital communication and data handling sufficiently robust to deal with risks to you, your partners and sources? Are security tools and processes updated frequently enough?
  7. Do you have safeguards in place to prevent and deal with any secondary trauma from viewing digital content that you or your partners may experience at personal and organisational levels?

(Page 15)

Before I reveal my #0 consideration, consider the following story as setting the background.

At a death penalty seminar (certainly a violation of human rights), a practitioner reported a case where the prosecuting attorney said a particular murder case was a question of “good versus evil.” In the course of preparing for that case, it was discovered that while teaching a course for paralegals, the prosecuting attorney had a sexual affair with one of his students. Affidavits were obtained, etc., and a motion was filed in the pending criminal case entitled: Motion To Define Good and Evil.

There was a mix of opinions on whether blind-siding the prosecuting attorney with his personal failings, with the fallout for his family, was a legitimate approach?

My question was: Did they consider asking the prosecuting attorney to take the death penalty off the table, in exchange for not filing the Motion To Define Good and Evil? A question of effective use of the information and not about the legitimacy of using it.

For human rights violations, my #0 Question would be:

0. Can the information be used to stop and/or redress human rights violations without harming known human rights victims?

The other seven questions, like “…all deliberate speed…,” are a game played by non-victims.

Digital Humanities In the Library

Sunday, July 31st, 2016

Digital Humanities In the Library / Of the Library: A dh+lib Special Issue

A special issue of dh + lib introduced by Sarah Potvin, Thomas Padilla and Caitlin Christian-Lamb in their essay: Digital Humanities In the Library / Of the Library, saying:

What are the points of contact between digital humanities and libraries? What is at stake, and what issues arise when the two meet? Where are we, and where might we be going? Who are “we”? By posing these questions in the CFP for a new dh+lib special issue, the editors hoped for sharp, provocative meditations on the state of the field. We are proud to present the result, ten wide-ranging contributions by twenty-two authors, collectively titled “Digital Humanities In the Library / Of the Library.”

We make the in/of distinction pointedly. Like the Digital Humanities (DH), definitions of library community are typically prefigured by “inter-” and “multi-” frames, rendered as work and values that are interprofessional, interdisciplinary, and multidisciplinary. Ideally, these characterizations attest to diversified yet unified purpose, predicated on the application of disciplinary expertise and metaknowledge to address questions that resist resolution from a single perspective. Yet we might question how a combinatorial impulse obscures the distinct nature of our contributions and, consequently, our ability to understand and respect individual agency. Working across the similarly encompassing and amorphous contours of the Digital Humanities compels the library community to reckon with its composite nature.

All of the contributions merit your attention but I was especially taken by: When Metadata Becomes Outreach: Indexing, Describing, and Encoding For DH by Emma Annette Wilson and Mary Alexander has this gem that will resonate with topic map fans:


DH projects require high-quality metadata in order to thrive, and the bigger the project, the more important that metadata becomes to make data discoverable, navigable, and open to computational analysis. The functions of all metadata are to allow our users to identify and discover resources through records acting as surrogates of resources, and to discover similarities, distinctions, and other nuances within single texts or across a corpus. High quality metadata brings standardization to the project by recording elements’ definitions, obligations, repeatability, rules for hierarchical structure, and attributes. Input guidelines and the use of controlled vocabularies bring consistencies that promote findability for researchers and users alike.

Modulo my reservations about the data/metadata distinction depending upon a point of view and all of them being subjects in any event, its hard to think of a clearer statement of the value that a topic map could bring to a DH project.

Consistencies can peacefully co-exist with with historical or present-day inconsistencies, at least so long as you are using a topic map.

I commend the entire issue to your for reading!

Electronic Literature Organization

Sunday, June 19th, 2016

Electronic Literature Organization

From the “What is E-Lit” page:

Electronic literature, or e-lit, refers to works with important literary aspects that take advantage of the capabilities and contexts provided by the stand-alone or networked computer. Within the broad category of electronic literature are several forms and threads of practice, some of which are:

  • Hypertext fiction and poetry, on and off the Web
  • Kinetic poetry presented in Flash and using other platforms
  • Computer art installations which ask viewers to read them or otherwise have literary aspects
  • Conversational characters, also known as chatterbots
  • Interactive fiction
  • Literary apps
  • Novels that take the form of emails, SMS messages, or blogs
  • Poems and stories that are generated by computers, either interactively or based on parameters given at the beginning
  • Collaborative writing projects that allow readers to contribute to the text of a work
  • Literary performances online that develop new ways of writing

The ELO showcase, created in 2006 and with some entries from 2010, provides a selection outstanding examples of electronic literature, as do the two volumes of our Electronic Literature Collection.

The field of electronic literature is an evolving one. Literature today not only migrates from print to electronic media; increasingly, “born digital” works are created explicitly for the networked computer. The ELO seeks to bring the literary workings of this network and the process-intensive aspects of literature into visibility.

The confrontation with technology at the level of creation is what distinguishes electronic literature from, for example, e-books, digitized versions of print works, and other products of print authors “going digital.”

Electronic literature often intersects with conceptual and sound arts, but reading and writing remain central to the literary arts. These activities, unbound by pages and the printed book, now move freely through galleries, performance spaces, and museums. Electronic literature does not reside in any single medium or institution.

I was looking for a recent presentation by Allison Parrish on bots when I encountered Electronic Literature Organization (ELO).

I was attracted by the bot discussion at a recent conference but as you can see, the range of activities of the ELO is much broader.

Enjoy!

Exploratory Programming for the Arts and Humanities

Wednesday, April 6th, 2016

Exploratory Programming for the Arts and Humanities by Nick Montfort.

From the webpage:

This book introduces programming to readers with a background in the arts and humanities; there are no prerequisites, and no knowledge of computation is assumed. In it, Nick Montfort reveals programming to be not merely a technical exercise within given constraints but a tool for sketching, brainstorming, and inquiring about important topics. He emphasizes programming’s exploratory potential—its facility to create new kinds of artworks and to probe data for new ideas.

The book is designed to be read alongside the computer, allowing readers to program while making their way through the chapters. It offers practical exercises in writing and modifying code, beginning on a small scale and increasing in substance. In some cases, a specification is given for a program, but the core activities are a series of “free projects,” intentionally underspecified exercises that leave room for readers to determine their own direction and write different sorts of programs. Throughout the book, Montfort also considers how computation and programming are culturally situated—how programming relates to the methods and questions of the arts and humanities. The book uses Python and Processing, both of which are free software, as the primary programming languages.

Full Disclosure: I haven’t seen a copy of Exploratory Programming.

I am reluctant to part with $40.00 US for either print or an electronic version where the major heads in the table of contents read as follows:

1 Modifying a Program

2 Calculating

3 Double, Double

4 Programming Fundamentals

5 Standard Starting Points

6 Text I

7 Text II

8 Image I

9 Image II

10 Text III

11 Statistics and Visualization

12 Animation

13 Sound

14 Interaction

15 Onward

The table of contents shows more than one hundred pages out of two hundred and sixty-three are spend on introduction to computer programming topics.

Text, which has a healthy section on string operations, merits a mere seventy pages. The other one hundred pages is split between visualization, sound, animation, etc.

Compare that table of contents with this one*:

Chapter One – Modular Programming: An Approach

Chapter Two – Data Entry and Text Verification

Chapter Three – Index and Concordance

Chapter Four – Text Criticism

Chapter Five – Improved Searching Techniques

Chapter Six – Morphological Analysis

Which table of contents promises to be more useful for exploration?

Personal computers are vastly more powerful today than when the second table of contents was penned.

Yet, students start off as though they are going to write their own tools from scratch. Unlikely and certainly not the best use of their time.

In depth coverage of the NLTK Toolkit historical or contemporary texts, in depth, would teach them a useful tool. A tool they could apply to other material.

To cover machine learning, consider Weka. A tool students can learn in class and then apply in new and different situations.

There are tools for image and sound analysis but the important term is tool.

Just as we don’t teach students to make their own paper, we should focus on enabling them to reap the riches that modern software tools offer.

Or to put it another way, let’s stop repeating the past and move forward.

* Oh, the second table of contents? Computer Programs for Literary Analysis, John R. Abercrombie, Philadelphia : Univ. of Philadelphia Press, ©1984. Yes, 1984.

Laypersons vs. Scientists – “…laypersons may be prone to biases…”

Saturday, March 12th, 2016

The “distinction” between laypersons and scientists is more a world view about some things than “all scientists are rational” or “all laypersons are irrational.” Scientists and laypersons can be just as rational and/or irrational, depending upon the topic at hand.

Having said that, The effects of social identity threat and social identity affirmation on laypersons’ perception of scientists by Peter Nauroth, et al., finds, unsurprisingly, that if a layperson’s social identity is threatened by research, they have a less favorable view of the scientists involved.

Abstract:

Public debates about socio-scientific issues (e.g. climate change or violent video games) are often accompanied by attacks on the reputation of the involved scientists. Drawing on the social identity approach, we report a minimal group experiment investigating the conditions under which scientists are perceived as non-prototypical, non-reputable, and incompetent. Results show that in-group affirming and threatening scientific findings (compared to a control condition) both alter laypersons’ evaluations of the study: in-group affirming findings lead to more positive and in-group threatening findings to more negative evaluations. However, only in-group threatening findings alter laypersons’ perceptions of the scientists who published the study: scientists were perceived as less prototypical, less reputable, and less competent when their research results imply a threat to participants’ social identity compared to a non-threat condition. Our findings add to the literature on science reception research and have implications for understanding the public engagement with science.

Perceived attacks on personal identity have negative consequences for the “reception” of science.

Implications for public engagement with science

Our findings have immediate implications for public engagement with science activities. When laypersons perceive scientists as less competent, less reputable, and not representative of the scientific community and the scientist’s opinion as deviating from the current scientific state-of-the-art, laypersons might be less willing to participate in constructive discussions (Schrodt et al., 2009). Furthermore, our mediation analysis suggests that these negative perceptions deepen the trench between scientists and laypersons concerning the current scientific state-of-the-art. We speculate that these biases might actually even lead to engagement activities to backfire: instead of developing a mutual understanding they might intensify laypersons’ misconceptions about the scientific state-of-the-art. Corroborating this hypothesis, Binder et al. (2011) demonstrated that discussions about controversial science topics may in fact polarize different groups around a priori positions. Additional preliminary support for this hypothesis can also be found in case studies about public engagement activities in controversial socio-scientific issues. Some of these reports (for two examples, see Lezaun and Soneryd, 2007) indicate problems to maintain a productive atmosphere between laypersons and experts in the discussion sessions.

Besides these practical implications, our results also add further evidence to the growing body of literature questioning the validity of the deficit model in science communication according to which people’s attitudes toward science are mainly determined by their knowledge about science (Sturgis and Allum, 2004). We demonstrated that social identity concerns profoundly influence laypersons’ perceptions and evaluations of scientific results regardless of laypersons’ knowledge. However, our results also question whether involving laypersons in policy decision processes based upon scientific evidence is reasonable in all socio-scientific issues. Particularly when the scientific evidence has potential negative consequences for social groups, our research suggests that laypersons may be prone to biases based upon their social affiliations. For example, if regular video game players were involved in decision-making processes concerning potential sales restrictions of violent video games, they would be likely to perceive scientific evidence demonstrating detrimental effects of violent video games as shoddy and the respective researchers as disreputable (Greitemeyer, 2014; Nauroth et al., 2014, 2015).(emphasis added)

The principle failure of this paper is its failure to study the scientific community and its reaction within science to research that attacks the personal identity of its participants.

I don’t think it is reading too much into the post: Academic, Not Industrial Secrecy, where one group said:

We want restrictions on who could do the analyses.

to say that attacks on personal identity leads to boorish behavior on the part of scientists.

Laypersons and scientists emit a never ending stream of examples of prejudice, favoritism, sycophancy, sloppy reasoning, to say nothing of careless and/or low quality work.

Reception of science among laypersons might improve if the scientific community abandoned its facade of “it’s objective, it’s science.”

That facade was tiresome by WWII and to keep repeating now is a disservice to the scientific community.

All of our efforts, in any field, are human endeavors and thus subject to the vagaries and uncertainties human interaction.

Live with it.

For Linguists on Your Holiday List

Saturday, December 12th, 2015

Hey Linguists!—Get Them to Get You a Copy of The Speculative Grammarian Essential Guide to Linguistics.

From the website:

Hey Linguists! Do you know why it is better to give than to receive? Because giving requires a lot more work! You have to know what someone likes, what someone wants, who someone is, to get them a proper, thoughtful gift. That sounds like a lot of work.

No, wait. That’s not right. It’s actually more work to be the recipient—if you are going to do it right. You can’t just trust people to know what you like, what you want, who you are.

You could try to help your loved ones understand a linguist’s needs and wants and desires—but you’d have to give them a mini course on historical, computational, and forensic linguistics first. Instead, you can assure them that SpecGram has the right gift for you—a gift you, their favorite linguist, will treasure for years to come: The Speculative Grammarian Essential Guide to Linguistics.

So drop some subtle or not-so-subtle hints and help your loved ones do the right thing this holiday season: gift you with this hilarious compendium of linguistic sense and nonsense.

If you need to convince your friends and family that they can’t find you a proper gift on their own, send them one of the images below, and try to explain to them why it amuses you. That’ll show ’em! (More will be added through the rest of 2015, just in case your friends and family are a little thick.)

• If guilt is more your style, check out 2013’s Sad Holiday Linguists.

• If semi-positive reinforcement is your thing, check out 2014’s Because You Can’t Do Everything You Want for Your Favorite Linguist.

Disclaimer: I haven’t proofed the diagrams against the sources cited. Rely on them at your own risk. 😉

There are others but the Hey Semioticians! reminded me of John Sowa (sorry John):

semiotics

The greatest mistake across all disciplines is taking ourselves (and our positions) far too seriously.

Enjoy!

On Teaching XQuery to Digital Humanists [Lesson in Immediate ROI]

Wednesday, November 18th, 2015

On Teaching XQuery to Digital Humanists by Clifford B. Anderson.

A paper presented at Balisage 2014 but still a great read for today. In particular where Clifford makes the case for teaching XQuery to humanists:

Making the Case for XQuery

I may as well state upfront that I regard XQuery as a fantastic language for digital humanists. If you are involved in marking up documents in XML, then learning XQuery will pay long-term dividends. I do have arguments for this bit of bravado. My reasons for lifting up XQuery as a programing language of particular interest to digital humanists are essentially three:

  • XQuery is domain-appropriate for digital humanists.

Let’s take each of these points in turn.

First, XQuery fits the domain of digital humanists. Admittedly, I am focusing here on a particular area of the digital humanities, namely the domain of digital text editing and analysis. In that domain, however, XQuery proves a nearly perfect match to the needs of digital humanists.

If you scour the online communities related to digital humanities, you will repeatedly find conversations about which programming languages to learn. Predictably, the advice is all over the map. PHP is easy to learn, readily accessible, and the language of many popular projects in the digital humanities such as Omeka and Neatline. Javascript is another obvious choice given its ubiquity. Others recommend Python or Ruby. At the margins, you’ll find the statistically-inclined recommending R. There are pluses and minuses to learning any of these languages. When you are working with XML, however, they all fall short. Inevitably, working with XML in these languages will require learning how to use packages to read XML and convert it to other formats.

Learning XQuery eliminates any impedance between data and code. There is no need to import any special packages to work with XML. Rather, you can proceed smoothly from teaching XML basics to showing how to navigate XML documents with XPath to querying XML with XQuery. You do not need to jump out of context to teach students about classes, objects, tables, or anything as awful-sounding as “shredding” XML documents or storing them as “blobs.” XQuery makes it possible for students to become productive without having to learn as many computer science or software engineering concepts. A simple four or five line FLWOR expression can easily demonstrate the power of XQuery and provide a basis for students’ tinkering and exploration. (emphasis added)

I commend the rest of the paper to you for reading but Clifford’s first point nails why learn XQuery for humanists and others.

The part I highlighted above sums it up:

XQuery makes it possible for students to become productive without having to learn as many computer science or software engineering concepts. A simple four or five line FLWOR expression can easily demonstrate the power of XQuery and provide a basis for students’ tinkering and exploration. (emphasis added)

Whether you are a student, scholar or even a type-A business type, what do you want?

To get sh*t done!

A few of us like tinkering with edge cases, proofs, theorems and automata, but having the needed output on time or sooner, really makes the day for most folks.

A minimal amount of XQuery expressions will open up XML encoded data for your custom exploration. You can experience an immediate ROI from the time you spend learning XQuery. Which will prompt you to learn more XQuery.

Think of learning XQuery as a step towards user independence. Independence from the choices made by unseen and unknown programmers.

Are you ready to take that step?

You do not want to be an edge case [The True Skynet: Your Homogenized Future]

Friday, November 13th, 2015

You do not want to be an edge case.

John D. Cook writes:

Hilary Mason made an important observation on Twitter a few days ago:

You do not want to be an edge case in this future we are building.

Systems run by algorithms can be more efficient on average, but make life harder on the edge cases, people who are exceptions to the system developers’ expectations.

Algorithms, whether encoded in software or in rigid bureaucratic processes, can unwittingly discriminate against minorities. The problem isn’t recognized minorities, such as racial minorities or the disabled, but unrecognized minorities, people who were overlooked.

For example, two twins were recently prevented from getting their drivers licenses because DMV software couldn’t tell their photos apart. Surely the people who wrote the software harbored no malice toward twins. They just didn’t anticipate that two drivers licence applicants could have indistinguishable photos.

I imagine most people reading this have had difficulty with software (or bureaucratic procedures) that didn’t anticipate something about them; everyone is an edge case in some context. Maybe you don’t have a middle name, but a form insists you cannot leave the middle name field blank. Maybe there are more letters in your name or more children in your family than a programmer anticipated. Maybe you choose not to use some technology that “everybody” uses. Maybe you happen to have a social security number that hashes to a value that causes a program to crash.

When software routinely fails, there obviously has to have a human override. But as software improves for most people, there’s less apparent need to make provision for the exceptional cases. So things could get harder for edge cases as they get better for more people.

Recent advances in machine learning have led reputable thinkers (Steven Hawking for example) to envision a future where an artificial intelligence will arise to dispense with humanity.

If you think you have heard that theme before, you have, most recently as Skynet, an entirely fictional creation in the Terminator science fiction series.

Given that no one knows how the human brain works, much less how intelligence arises, despite such alarmist claims making good press, the risk is less than a rogue black hole or a gamma-ray burst. I don’t lose sleep over either one of those, do you?

The greater “Skynet” threat to people and their cultures is the enforced homogenization of language and culture.

John mentions lacking a middle name but consider the complexities of Japanese names. Due to the creeping infection of Western culture and computer-based standardization, many Japanese list their names in Western order, given name, family name, instead of the Japanese order of family name, given name.

Even languages can start the slide to being “edge cases,” as you will see from the erosion of Hangul (Korean alphabet) from public signs in Seoul.

Computers could be preserving languages and cultural traditions, they have the capacity and infinite patience.

But they are not being used for that purpose.

Cellphones, for example, are linking humanity into a seething mass of impoverished social interaction. Impoverished social interaction that is creating more homogenized languages, not preserving diverse ones.

Not only should you be an edge case but you should push back against the homogenizing impact of computers. The diversity we lose could well be your own.

Programming for Humanists at TAMU [and Business Types]

Monday, August 17th, 2015

Programming for Humanists at TAMU

From the webpage:

[What is DH?] Digital Humanities studies the intersection and mutual influence of humanities ideas and digital methods, with the goal of understanding how the use of digital technologies and approaches alters the practice and theory of humanities scholarship. In this sense it is concerned with studying the emergence of scholarly disciplines and communicative practices at a time when those are in flux, under the influence of rapid technological, institutional and cultural change. As a way of identifying digital interests and efforts within traditional humanities fields, the term “digital humanities” also identifies, in a general way, any kind of critical engagement with digital tools and methods in a humanities context. This includes the creation of digital editions and digital text or image collections, and the creation and use of digital tools for the investigation and analysis of humanities research materials. – Julia Flanders, Northeastern University (http://goo.gl/BJeXk2)

 

Programming4Humanists is a two-semester course sequence designed to introduce participants to methodologies, coding, and programming languages associated with the Digital Humanities. We focus on creation, editing, and searchability of digital archives, but also introduce students to data mining and statistical analysis. Our forte at Texas A&M University (TAMU) is Optical Character Recognition of early modern texts, a skill we learned in completing the Early Modern OCR Project, or eMOP. Another strength that the Initiative for Digital Humanities, Media, and Culture (http://idhmc.tamu.edu) at TAMU brings to this set of courses is the Texas A&M University Press book series called “Programming for Humanists.” We use draft and final versions of these books, as well as many additional resources available on companion web pages, for participants in the workshop. The books in this series are of course upon publication available to anyone, along with the companion sites, whether the person has participated in the workshop or not. However, joining the Programming4Humanists course enables participants to communicate with the authors of these books for the sake of asking questions and indeed, through their questioning, helping us to improve the books and web materials. Our goal is to help people learn Digital Humanities methods and techniques.

Participants

Those who should attend include faculty, staff, librarians, undergraduates, and graduate students, interested in making archival and cultural materials available to a wide audience while encoding them digitally according to best practices, standards that will allow them to submit their digital editions for peer review by organizations such as the MLA Committee for Scholarly Edition and NINES / 18thConnect. Librarians will be especially interested in learning our OCR procedures as a means for digitizing large archives. Additionally, scholars, students, and librarians will receive an introduction to text mining and XQuery, the latter used for analyzing semantically rich data sets. This course gives a good overview of what textual and archival scholars are accomplishing in the field of Digital Humanities, even though the course is primarily concerned with teaching skills to participants. TAMU graduate and undergraduate students may take this course for 2 credit hours, see Schedule of Classes for Fall 2015: LBAR 489 or 689 Digital Scholarship and Publication.

Prerequisites

No prior knowledge is required but some familiarity with TEI/XML, HTML, and CSS will be helpful (See previous Programming 4 Humanists course syllabus). Certificate registrants will receive certificates confirming that they have a working knowledge of Drupal, XSLT, XQuery, and iPython Notebooks. Registration for those getting a certificate includes continued access to all class videos during the course period and an oXygen license. Non-certificate registrants will have access to the class videos for one week.

Everything that Julia says is true and this course will be very valuable for traditional humanists.

It will also be useful for business types who aren’t quants or CS majors/minors. The same “friendly” learning curve is suitable to both audiences.

You won’t be a “power user” at the end of this course but you will sense when CS folks are blowing smoke. It happens.

New Testament Virtual Manuscript Room

Tuesday, June 2nd, 2015

New Testament Virtual Manuscript Room

From the webpage:

This site is devoted to the study of Greek New Testament manuscripts. The New Testament Virtual Manuscript Room is a place where scholars can come to find the most exhaustive list of New Testament manuscript resources, can contribute to marking attributes about these manuscripts, and can find state of the art tools for researching this rich dataset.

While our tools are reasonably functional for anonymous users, they provide additional features and save options once a user has created an account and is logged in on the site. For example, registered users can save transcribed pages to their personal account and create personalized annotations to images.

A close friend has been working on this project for several years. Quite remarkable although I would prefer it to feature Hebrew (and older) texts. 😉

Spatial Humanities Workshop

Tuesday, June 2nd, 2015

Spatial Humanities Workshop by Lincoln Mullen.

From the webpage:

Scholars in the humanities have long paid attention to maps and space, but in recent years new technologies have created a resurgence of interest in the spatial humanities. This workshop will introduce participants to the following subjects:

  • how mapping and spatial analysis are being used in humanities disciplines
  • how to find, create, and manipulate spatial data
  • how to create historical layers on interactive maps
  • how to create data-driven maps
  • how to tell stories and craft arguments with maps
  • how to create deep maps of places
  • how to create web maps in a programming language
  • how to use a variety of mapping tools
  • how to create lightweight and semester-long mapping assignments

The seminar will emphasize the hands-on learning of these skills. Each day we will pay special attention to preparing lesson plans for teaching the spatial humanities to students. The aim is to prepare scholars to be able to teach the spatial humanities in their courses and to be able to use maps and spatial analysis in their own research.

Ahem, the one thing Larry forgets to mention is that he is a major player in spatial humanities. His homepage is an amazing place.

The seminar materials don’t disappoint. It would be better to be at the workshop but in lieu of attending, working through these materials will leave you well grounded in spatial humanities.

Civil War Navies Bookworm

Tuesday, May 19th, 2015

Civil War Navies Bookworm by Abby Mullen.

From the post:

If you read my last post, you know that this semester I engaged in building a Bookworm using a government document collection. My professor challenged me to try my system for parsing the documents on a different, larger collection of government documents. The collection I chose to work with is the Official Records of the Union and Confederate Navies. My Barbary Bookworm took me all semester to build; this Civil War navies Bookworm took me less than a day. I learned things from making the first one!

This collection is significantly larger than the Barbary Wars collection—26 volumes, as opposed to 6. It encompasses roughly the same time span, but 13 times as many words. Though it is still technically feasible to read through all 26 volumes, this collection is perhaps a better candidate for distant reading than my first corpus.

The document collection is broken into geographical sections, the Atlantic Squadron, the West Gulf Blockading Squadron, and so on. Using the Bookworm allows us to look at the words in these documents sequentially by date instead of having to go back and forth between different volumes to get a sense of what was going on in the whole navy at any given time.

Before you ask:

The earlier post: Text Analysis on the Documents of the Barbary Wars

More details on Bookworm.

As with all ngram viewers, exercise caution in assuming a text string has uniform semantics across historical, ethnic, or cultural fault lines.

Digital Approaches to Hebrew Manuscripts

Friday, May 8th, 2015

Digital Approaches to Hebrew Manuscripts

Monday 18th – Tuesday 19th of May 2015

From the webpage:

We are delighted to announce the programme for On the Same Page: Digital Approaches to Hebrew Manuscripts at King’s College London. This two-day conference will explore the potential for the computer-assisted study of Hebrew manuscripts; discuss the intersection of Jewish Studies and Digital Humanities; and share methodologies. Amongst the topics covered will be Hebrew palaeography and codicology, the encoding and transcription of Hebrew texts, the practical and theoretical consequences of the use of digital surrogates and the visualisation of manuscript evidence and data. For the full programme and our Call for Posters, please see below.

Organised by the Departments of Digital Humanities and Theology & Religious Studies (Jewish Studies)
Co-sponsor: Centre for Late Antique & Medieval Studies (CLAMS), King’s College London

I saw this at the blog for DigiPal: Digital Resource and Database of Palaeolography, Manuscript Studies and Diplomatic. Confession, I have never understood how the English derive acronyms and this confounds me as much as you. 😉

Be sure to look around at the DigiPal site. There are numerous manuscript images, annotation techniques, and other resources for those who foster scholarship by contributing to it.

Web Gallery of Art

Thursday, April 9th, 2015

Web Gallery of Art

From the homepage:

The Web Gallery of Art is a virtual museum and searchable database of European fine arts from 11th to 19th centuries. It was started in 1996 as a topical site of the Renaissance art, originated in the Italian city-states of the 14th century and spread to other countries in the 15th and 16th centuries. Intending to present Renaissance art as comprehensively as possible, the scope of the collection was later extended to show its Medieval roots as well as its evolution to Baroque and Rococo via Mannerism. Encouraged by the feedback from the visitors, recently 19th-century art was also included. However, we do not intend to present 20th-century and contemporary art.

The collection has some of the characteristics of a virtual museum. The experience of the visitors is enhanced by guided tours helping to understand the artistic and historical relationship between different works and artists, by period music of choice in the background and a free postcard service. At the same time the collection serves the visitors’ need for a site where various information on art, artists and history can be found together with corresponding pictorial illustrations. Although not a conventional one, the collection is a searchable database supplemented by a glossary containing articles on art terms, relevant historical events, personages, cities, museums and churches.

The Web Gallery of Art is intended to be a free resource of art history primarily for students and teachers. It is a private initiative not related to any museums or art institutions, and not supported financially by any state or corporate sponsors. However, we do our utmost, using authentic literature and advice from professionals, to ensure the quality and authenticity of the content.

We are convinced that such a collection of digital reproductions, containing a balanced mixture of interlinked visual and textual information, can serve multiple purposes. On one hand it can simply be a source of artistic enjoyment; a convenient alternative to visiting a distant museum, or an incentive to do just that. On the other hand, it can serve as a tool for public education both in schools and at home.

The Gallery doesn’t own the works in question and so resolves the copyright issue thus:

The Web Gallery of Art is copyrighted as a database. Images and documents downloaded from this database can only be used for educational and personal purposes. Distribution of the images in any form is prohibited without the authorization of their legal owner.

The Gallery suggests contacting the Scala Group (or Art Resource, Scala’s U.S. representative) if you need rights beyond educational and personal purposes.

To see how images are presented, view 10 random images from the database. (Warning: The 10 random images link will work only once. If you try it again, images briefly display and then an invalid CGI environment message pops up. Suspect if you clear the browser cache it should work a second time.)

BTW, you can listen to classical music in the background while you browse/search. That is a very nice touch.

The site offers other features and options so take time to explore.

Having seen some of Michelangelo‘s works in person, I can attest no computer screen can duplicate that experience. However, if given the choice between viewing a pale imitation on a computer screen and not seeing his work at all, the computer version is a no brainer.

BHO – British History Online

Thursday, February 12th, 2015

BHO – British History Online

The “news” from 8 December 2014 (that I missed) reports:

British History Online (BHO) is pleased to launch version 5.0 of its website. Work on the website redevelopment began in January 2014 and involved a total rebuild of the BHO database and a complete redesign of the site. We hope our readers will find the new site easier to use than ever before. New features include:

  • A new search interface that allows you to narrow your search results by place, period, source type or subject.
  • A new catalogue interface that allows you to see our entire catalogue at a glance, or to browse by place, period, source type or subject.
  • Three subject guides on local history, parliamentary history and urban history. We are hoping to add more subject guides throughout the year. If you would like to contribute, contact us.
  • Guidelines on using BHO, which include searching and browsing help, copyright and citation information, and a list of external resources that we hope will be useful to readers.
  • A new about page that includes information about our team, past and present, as well as a history of where we have come from and where we want to go next.
  • A new subscription interface (at last!) which includes three new levels of subscription in addition to the usual premium content subscription: gold subscription, which includes access to page scans and five- and ten-year long-term BHO subscriptions.
  • Increased functionality to the maps interface, which are now fully zoomable and can even go full screen. We have also replaced the old map scans with high-quality versions.
  • We also updated the site with a fresh, new look! We aimed for easy-to-read text, clear navigation, clean design and bright new images.

​Version 5.0 has been a labour of love for the entire BHO team, but we have to give special thanks to Martin Steer, our tireless website manager who rebuilt the site from the ground up.

For over a decade, you have turned to BHO for reliable and accessible sources for the history of Britain and Ireland. We started off with 29 publications in 2003 and here is where we are now:

  • 1.2 million page views per month
  • 365,000 sessions per month
  • 1,241 publications
  • 108,227 text files
  • 184,355 images
  • 10,380 maps​

​We are very grateful to our users who make this kind of development possible. Your support allows BHO to always be growing and improving. 2014 has been a busy year for BHO and 2015 promises to be just as busy. Version 5.0 was a complete rebuild of BHO. We stripped the site down and began rebuilding from scratch. The goal of the new site is to make it as easy as possible for you to find materials relevant to your research. The new site was designed to be able to grow and expand easily, while always preserving the most important features of BHO. Read about our plans for 2015 and beyond.

We’d love to hear your feedback on our new site! If you want to stay up-to-date on what we are doing at BHO, follow us on Twitter.

Subscriptions are required for approximately 20% of the content, which enables the BHO to offer the other 80% for free.

A resource such as the BHO is a joyful reminder that not all projects sanctioned by government and its co-conspirators are venal and ill-intended.

For example, can you imagine a secondary school research paper on the Great Fire of 1666 that includes observations based on Leake’s Survey of the City After the Great Fire of 1666 Engraved By W. Hollar, 1667? With additional references from BHO materials?

I would have struck a Faustian bargain in high school had such materials been available!

That is just one treasure among many.

Teachers of English, history, humanities, etc., take note!

I first saw this in a tweet by Institute of Historical Research, U. of London.

Comparative Oriental Manuscript Studies: An Introduction

Sunday, January 25th, 2015

Comparative Oriental Manuscript Studies: An Introduction edited by: Alessandro Bausi (General editor), et al.

The “homepage” of this work enables you to download the entire volume or individual chapters, depending upon your interests. It provides a lengthy introduction to codicology, palaeography, textual criticism and text editing, and of special interest to library students, cataloguing as well as conservation and preservation.

Alessandro Bausi writes in the preface:

Thinking more broadly, our project was also a serious attempt to defend and preserve the COMSt-related fields within the academic world. We know that disciplines and fields are often determined and justified by the mere existence of an easily accessible handbook or, in the better cases, sets of handbooks, textbooks, series and journals. The lack of comprehensive introductory works which are reliable, up-to-date, of broad interest and accessible to a wide audience and might be used in teaching, has a direct impact on the survival of the ‘small subjects’ most of the COMSt-related disciplines pertain to. The decision to make the COMSt handbook freely accessible online and printable on demand in a paper version at an affordable price was strategic in this respect, and not just meant to meet the prescriptions of the European Science Foundation. We deliberately declined to produce an extremely expensive work that might be bought only by a few libraries and research institutions; on the other hand, a plain electronic edition only to be accessed and downloaded as a PDF file was not regarded as a desirable solution either. Dealing with two millennia of manuscripts and codices, we did not want to dismiss the possibility of circulating a real book in our turn.

It remains, hopefully, only to say,

Lector intende: laetaberis

John Svarlien says: A rough translation is: “Reader, pay attention. You will be happy you did.”

We are all people of books. It isn’t possible to separate present day culture and what came before it from books. Even people who shun reading of books, are shaped by forces that can be traced back to books.

But books did not suddenly appear as mass-printed paperbacks in airport lobbies and checkout lines in grocery stores. There is a long history of books prior to printing to the edges of the formation of codices.

This work is an introduction to the fascinating world of studying manuscripts and codices prior to the invention of printing. When nearly every copy of a work is different from every other copy, you can imagine the debates over which copy is the “best” copy.

Imagine some versions of “Gone with the Wind” ending with:

  • Frankly, my dear, I don’t give a damn. (traditional)
  • Ashley and I don’t give a damn. (variant)
  • Cheat Ashley out of his business I suppose. (variant)
  • (Lacks a last line due to mss. damage.) (variant)

The “text” of yesteryear lacked the uniform sameness of the printed “text” of today.

When you think about your “favorite” version in the Bible, it is likely a “majority” reading but hardly the only one.

With the advent of the printing press, texts took on the opportunity to be uniformly produced in mass quantities.

With the advent of electronic texts, either due to editing or digital corruption, we are moving back towards non-uniform texts.

Will we see the birth of digital codicology and its allied fields for digital texts?

PS: Please forward the notice of this book to your local librarian.

I first saw this in a tweet by Kirk Lowery.

DiRT Digital Research Tools

Friday, January 23rd, 2015

DiRT Digital Research Tools

From the post:

The DiRT Directory is a registry of digital research tools for scholarly use. DiRT makes it easy for digital humanists and others conducting digital research to find and compare resources ranging from content management systems to music OCR, statistical analysis packages to mindmapping software.

Interesting concept but the annotations are too brief to convey much information. Not to mention that within a category, say Conduct linguistic research or Transcribe handwritten or spoken texts, the entries have no apparent order, or should I say they are not arranged in alphabetical order by name. There may be some other order that is escaping me.

Some entries appear in the wrong categories, such as Xalan being found under Transcribe handwritten or spoken texts:

Xalan
Xalan is an XSLT processor for transforming XML documents into HTML, text, or other XML document types. It implements XSL Transformations (XSLT) Version 1.0 and XML Path Language (XPath) Version 1.0.

Not what I think of when I think about transcribing handwritten or spoken texts. You?

I didn’t see a process for submitting corrections/comments on resources. I will check and post on this again. It could be a useful tool.

I first saw this in a tweet by Christophe Lalanne.