Archive for the ‘Humanities’ Category

Digital Humanities / Studies: U.Pitt.Greenberg

Wednesday, February 1st, 2017

Digital Humanities / Studies: U.Pitt.Greenberg maintained by Elisa E. Beshero-Bondar.

I discovered this syllabus and course materials by accident when one of its modules on XQuery turned up in a search. Backing out of that module I discovered this gem of a digital humanities course.

The course description:

Our course in “digital humanities” and “digital studies” is designed to be interdisciplinary and practical, with an emphasis on learning through “hands-on” experience. It is a computer course, but not a course in which you learn programming for the sake of learning a programming language. It’s a course that will involve programming, and working with coding languages, and “putting things online,” but it’s not a course designed to make you, in fifteen weeks, a professional website designer. Instead, this is a course in which we prioritize what we can investigate in the Humanities and related Social Sciences fields about cultural, historical, and literary research questions through applications in computer coding and programming, which you will be learning and applying as you go in order to make new discoveries and transform cultural objects—what we call “texts” in their complex and multiple dimensions. We think of “texts” as the transmittable, sharable forms of human creativity (mainly through language), and we interface with a particular text in multiple ways through print and electronic “documents.” When we refer to a “document,” we mean a specific instance of a text, and much of our work will be in experimenting with the structures of texts in digital document formats, accessing them through scripts we write in computer code—scripts that in themselves are a kind of text, readable both by humans and machines.

Your professors are scholars and teachers of humanities, not computer programmers by trade, and we teach this course from our backgrounds (in literature and anthropology, respectively). We teach this course to share coding methods that are highly useful to us in our fields, with an emphasis on working with texts as artifacts of human culture shaped primarily with words and letters—the forms of “written” language transferable to many media (including image and sound) that we can study with computer modelling tools that we design for ourselves based on the questions we ask. We work with computers in this course as precision instruments that help us to read and process great quantities of information, and that lead us to make significant connections, ask new kinds of questions, and build models and interfaces to change our reading and thinking experience as people curious about human history, culture, and creativity.

Our focus in this course is primarily analytical: to apply computer technologies to represent and investigate cultural materials. As we design projects together, you will gain practical experience in editing and you will certainly fine-tune your precision in writing and thinking. We will be working primarily with eXtensible Markup Language (XML) because it is a powerful tool for modelling texts that we can adapt creatively to our interests and questions. XML represents a standard in adaptability and human-readability in digital code, and it works together with related technologies with which you will gain working experience: You’ll learn how to write XPath expressions: a formal language for searching and extracting information from XML code which serves as the basis for transforming XML into many publishable forms, using XSLT and XQuery. You’ll learn to write XSLT: a programming “stylesheet” transforming language designed to convert XML to publishable formats, as well as XQuery, a query (or search) language for extracting information from XML files bundled collectively. You will learn how to design your own systematic coding methods to work on projects, and how to write your own rules in schema languages (like Schematron and Relax-NG) to keep your projects organized and prevent errors. You’ll gain experience with an international XML language called TEI (after the Text Encoding Initiative) which serves as the international standard for coding digital archives of cultural materials. Since one of the best and most widely accessible ways to publish XML is on the worldwide web, you’ll gain working experience with HTML code (a markup language that is a kind of XML) and styling HTML with Cascading Stylesheets (CSS). We will do all of this with an eye to your understanding how coding works—and no longer relying without question on expensive commercial software as the “only” available solution, because such software is usually not designed with our research questions in mind.

We think you’ll gain enough experience at least to become a little dangerous, and at the very least more independent as investigators and makers who wield computers as fit instruments for your own tasks. Your success will require patience, dedication, and regular communication and interaction with us, working through assignments on a daily basis. Your success will NOT require perfection, but rather your regular efforts throughout the course, your documenting of problems when your coding doesn’t yield the results you want. Homework exercises are a back-and-forth, intensive dialogue between you and your instructors, and we plan to spend a great deal of time with you individually over these as we work together. Our guiding principle in developing assignments and working with you is that the best way for you to learn and succeed is through regular practice as you hone your skills. Our goal is not to make you expert programmers (as we are far from that ourselves)! Rather, we want you to learn how to manipulate coding technologies for your own purposes, how to track down answers to questions, how to think your way algorithmically through problems and find good solutions.

Skimming the syllabus rekindles an awareness of the distinction between the “hard” sciences and the “difficult” ones.

Enjoy!

Update:

After yesterday’s post, Elisa Beshero-Bondar tweeted this one course is now two:

At a new homepage: newtFire {dh|ds}!

Enjoy!

Humanities Digital Library [A Ray of Hope]

Friday, January 13th, 2017

Humanities Digital Library (Launch Event)

From the webpage:

Date
17 Jan 2017, 18:00 to 17 Jan 2017, 19:00

Venue

IHR Wolfson Conference Suite, NB01/NB02, Basement, IHR, Senate House, Malet Street, London WC1E 7HU

Description

6-7pm, Tuesday 17 January 2017

Wolfson Conference Suite, Institute of Historical Research

Senate House, Malet Street, London, WC1E 7HU

www.humanities-digital-library.org

About the Humanities Digital Library

The Humanities Digital Library is a new Open Access platform for peer reviewed scholarly books in the humanities.

The Library is a joint initiative of the School of Advanced Study, University of London, and two of the School’s institutes—the Institute of Historical Research and the Institute of Advanced Legal Studies.

From launch, the Humanities Digital Library offers scholarly titles in history, law and classics. Over time, the Library will grow to include books from other humanities disciplines studied and researched at the School of Advanced Study. Partner organisations include the Royal Historical Society whose ‘New Historical Perspectives’ series will appear in the Library, published by the Institute of Historical Research.

Each title is published as an open access PDF, with copies also available to purchase in print and EPUB formats. Scholarly titles come in several formats—including monographs, edited collections and longer and shorter form works.
(emphasis in the original)

Timely evidence that not everyone in the UK is barking mad! “Barking mad” being the only explanation I can offer for the Investigatory Powers Bill.

I won’t be attending but if you can, do and support the Humanities Digital Library after it opens.

War and Peace & R

Friday, December 2nd, 2016

No, not a post about R versus Python but about R and Tolstoy‘s War and Peace.

Using R to Gain Insights into the Emotional Journeys in War and Peace by Wee Hyong Tok.

From the post:

How do you read a novel in record time, and gain insights into the emotional journey of main characters, as they go through various trials and tribulations, as an exciting story unfolds from chapter to chapter?

I remembered my experiences when I start reading a novel, and I get intrigued by the story, and simply cannot wait to get to the last chapter. I also recall many conversations with friends on some of the interesting novels that I have read awhile back, and somehow have only vague recollection of what happened in a specific chapter. In this post, I’ll work through how we can use R to analyze the English translation of War and Peace.

War and Peace is a novel by Leo Tolstoy, and captures the salient points about Russian history from the period 1805 to 1812. The novel consists of the stories of five families, and captures the trials and tribulations of various characters (e.g. Natasha and Andre). The novel consists of about 1400 pages, and is one of the longest novels that have been written.

We hypothesize that if we can build a dashboard (shown below), this will allow us to gain insights into the emotional journey undertaken by the characters in War and Peace.

Impressive work, even though I would not use it as a short-cut to “read a novel in record time.”

Rather I take this as an alternative way of reading War and Peace, one that can capture insights a casual reader may miss.

Moreover, the techniques demonstrated here could be used with other works of literature, or even non-fictional works.

Imagine conducting this analysis over the reportedly more than 7,000 page full CIA Torture Report, for example.

A heatmap does not connect any dots, but points a user towards places where interesting dots may be found.

Certainly a tool for exploring large releases/leaks of text data.

Enjoy!

PS: Large, tiresome, obscure-on-purpose, government reports to practice on with this method?

Practical Palaeography: Recreating the Exeter Book in a Modern Day ‘Scriptorium’

Tuesday, November 22nd, 2016

Practical Palaeography: Recreating the Exeter Book in a Modern Day ‘Scriptorium’

From the post:

Dr Johanna Green is a lecturer in Book History and Digital Humanities at the University of Glasgow. Her PhD (English Language, University of Glasgow 2012) focused on a palaeographical study of the textual division and subordination of the Exeter Book manuscript. Here, she tells us about the first of two sessions she led for the Society of Northumbrian Scribes, a group of calligraphers based in North East England, bringing palaeographic research and modern-day calligraphy together for the public.
(emphasis in original)

Not phrased in subject identity language, but concerns familiar to the topic map community are not far away:


My own research centres on the scribal hand of the manuscript, specifically the ways in which the poems are divided and subdivided from one another and the decorative designs used for these litterae notabiliores throughout. For much of my research, I have spent considerable time (perhaps more than I am willing to admit) wondering where one ought to draw the line with palaeography. When do the details become so tiny to no longer be of any significance? When are they just important enough to mean something significant for our understanding of how the manuscript was created and arranged? How far am I willing to argue that these tiny features have significant impact? Is, for example, this littera notabilior Đ on f. 115v (Judgement Day I, left) different enough in a significant way to this H on f.97v, (The Partridge, bottom right), and in turn are both of these litterae notabiliores performing a different function than the H on f.98r (Soul and Body II, far right)?[5]
(emphasis in original, footnote omitted)

When Dr. Green says:

…When do the details become so tiny to no longer be of any significance?…

I would say: When do the subjects (details) become so tiny we want to pass over them in silence? That is they could be but are not represented in a topic map.

Green ends her speculation, to a degree, by enlisting scribes to re-create the manuscript of interest under her observation.

I’ll leave her conclusions for her post but consider a secondary finding:


The experience also made me realise something else: I had learned much by watching them write and talking to them during the process, but I had also learned much by trying to produce the hand myself. Rather than return to Glasgow and teach my undergraduates the finer details of the script purely through verbal or written description, perhaps providing space for my students to engage in the materials of manuscript production, to try out copying a script/exemplar for themselves would help increase their understanding of the process of writing and, in turn, deepen their knowledge of the constituent parts of a letter and their significance in palaeographic endeavour. This last is something I plan to include in future palaeography teaching.

Dr. Green’s concern over palaeographic detail illustrates two important points about topic maps:

  1. Potential subjects for a topic map are always unbounded.
  2. Different people “see” different subjects.

Which also account for my yawn when Microsoft drops the Microsoft Concept Graph of more than 5.4 million concepts.

…[M]ore than 5.4 million concepts[?]

Hell, Copleston’s History of Western Philosophy easily has more concepts.

But the Microsoft Concept Graph is more useful than a topic map of Copleston in your daily, shallow, social sea.

What subjects do you see and how would capturing them and their identities make a difference in your life (professional or otherwise)?

S20-211a Hebrew Bible Technology Buffet – November 20, 2016 (save that date!)

Tuesday, October 18th, 2016

S20-211a Hebrew Bible Technology Buffet

From the webpage:

On Sunday, November 20th 2016, from 1:00 PM to 3:30 PM, GERT will host a session with the theme “Hebrew Bible Technology Buffet” at the SBL Annual Meeting in room 305 of the Convention Center. Barry Bandstra of Hope College will preside.

The session has four presentations:

Presentations will be followed by a discussion session.

You will need to register for the Annual Meeting to attend the session.

Assuming they are checking “badges” to make sure attendees have registered. Registration is very important to those who “foster” biblical scholarship by comping travel and rooms for their close friends.

PS: The website reports non-member registration is $490.00. I would like to think that is a mis-print but I suspect its not.

That’s one way to isolate yourself from an interested public. By way of contrast, snail-mail Biblical Greek courses in the 1890’s had tens of thousands of subscribers. When academics complain of being marginalized, use this as an example of self-marginalization.

DATNAV: …Navigate and Integrate Digital Data in Human Rights Research [Ethics]

Wednesday, August 24th, 2016

DATNAV: New Guide to Navigate and Integrate Digital Data in Human Rights Research by Zara Rahman.

From the introduction in the Guide:

From online videos of rights violations, to satellite images of environmental degradation, to eyewitness accounts disseminated on social media, we have access to more relevant data today than ever before. When used responsibly, this data can help human rights professionals in the courtroom, when working with governments and journalists, and in documenting historical record.

Acquiring, disseminating and storing digital data is also becoming increasingly affordable. As costs continue to decrease and new platforms are
developed, opportunities for harnessing these data sources for human rights work increase.

But integrating data collection and management into the day to day work of human rights research and documentation can be challenging, even overwhelming, for individuals and organisations. This guide is designed to help you navigate and integrate new data forms into your human rights work.

It is the result of a collaboration between Amnesty International, Benetech, and The Engine Room that began in late 2015. We conducted a series of interviews, community consultations, and surveys to understand whether digital data was being integrated into human rights work. In the vast majority of cases, we found that it wasn’t. Why?

Mainly, human rights researchers appeared to be overwhelmed by the possibilities. In the face of limited resources, not knowing how to get started or whether it would be worthwhile, most people we spoke to refrained from even attempting to strengthen their work with digital data.

To support everyone in the human rights field in navigating this complex environment, we convened a group of 16 researchers and technical experts in a castle outside Berlin, Germany in May 2016 to draft this guide over four days of intense reflection and writing.

There are additional reading resources at: https://engn.it/datnav.

The issue of ethics comes up quickly in human rights research and here the authors write:

Seven things to consider before using digital data for human rights

  1. Would digital data genuinely help answer your research questions? What are the pros and cons of the particular source or medium? What might you learn from past uses of similar technology?
  2. What sources are likely to be collecting or capturing the kinds of information you need? What is the context in which it is being produced and used? Will the people or organisations on which your work is focused be receptive to these types of data?
  3. How easily will new forms of data integrate into your existing workflow? Do you realistically have the time and money to collect, store, analyze and especially to verify this data? Can anyone on your team comfortably support the technology?
  4. Who owns or controls the data you will be using? Companies, government, or adversaries? How difficult is it to get? Is it a fair or legal collection method? What is the internal stance on this? Do you have true informed consent from individuals?
  5. How will digital divides and differences in local access to online platforms, computers or phones, affect representation of different populations? Would conclusions based on the data reinforce inequalities, stereotypes or blind spots?
  6. Are organisational protocols for confidentiality and security in digital communication and data handling sufficiently robust to deal with risks to you, your partners and sources? Are security tools and processes updated frequently enough?
  7. Do you have safeguards in place to prevent and deal with any secondary trauma from viewing digital content that you or your partners may experience at personal and organisational levels?

(Page 15)

Before I reveal my #0 consideration, consider the following story as setting the background.

At a death penalty seminar (certainly a violation of human rights), a practitioner reported a case where the prosecuting attorney said a particular murder case was a question of “good versus evil.” In the course of preparing for that case, it was discovered that while teaching a course for paralegals, the prosecuting attorney had a sexual affair with one of his students. Affidavits were obtained, etc., and a motion was filed in the pending criminal case entitled: Motion To Define Good and Evil.

There was a mix of opinions on whether blind-siding the prosecuting attorney with his personal failings, with the fallout for his family, was a legitimate approach?

My question was: Did they consider asking the prosecuting attorney to take the death penalty off the table, in exchange for not filing the Motion To Define Good and Evil? A question of effective use of the information and not about the legitimacy of using it.

For human rights violations, my #0 Question would be:

0. Can the information be used to stop and/or redress human rights violations without harming known human rights victims?

The other seven questions, like “…all deliberate speed…,” are a game played by non-victims.

Digital Humanities In the Library

Sunday, July 31st, 2016

Digital Humanities In the Library / Of the Library: A dh+lib Special Issue

A special issue of dh + lib introduced by Sarah Potvin, Thomas Padilla and Caitlin Christian-Lamb in their essay: Digital Humanities In the Library / Of the Library, saying:

What are the points of contact between digital humanities and libraries? What is at stake, and what issues arise when the two meet? Where are we, and where might we be going? Who are “we”? By posing these questions in the CFP for a new dh+lib special issue, the editors hoped for sharp, provocative meditations on the state of the field. We are proud to present the result, ten wide-ranging contributions by twenty-two authors, collectively titled “Digital Humanities In the Library / Of the Library.”

We make the in/of distinction pointedly. Like the Digital Humanities (DH), definitions of library community are typically prefigured by “inter-” and “multi-” frames, rendered as work and values that are interprofessional, interdisciplinary, and multidisciplinary. Ideally, these characterizations attest to diversified yet unified purpose, predicated on the application of disciplinary expertise and metaknowledge to address questions that resist resolution from a single perspective. Yet we might question how a combinatorial impulse obscures the distinct nature of our contributions and, consequently, our ability to understand and respect individual agency. Working across the similarly encompassing and amorphous contours of the Digital Humanities compels the library community to reckon with its composite nature.

All of the contributions merit your attention but I was especially taken by: When Metadata Becomes Outreach: Indexing, Describing, and Encoding For DH by Emma Annette Wilson and Mary Alexander has this gem that will resonate with topic map fans:


DH projects require high-quality metadata in order to thrive, and the bigger the project, the more important that metadata becomes to make data discoverable, navigable, and open to computational analysis. The functions of all metadata are to allow our users to identify and discover resources through records acting as surrogates of resources, and to discover similarities, distinctions, and other nuances within single texts or across a corpus. High quality metadata brings standardization to the project by recording elements’ definitions, obligations, repeatability, rules for hierarchical structure, and attributes. Input guidelines and the use of controlled vocabularies bring consistencies that promote findability for researchers and users alike.

Modulo my reservations about the data/metadata distinction depending upon a point of view and all of them being subjects in any event, its hard to think of a clearer statement of the value that a topic map could bring to a DH project.

Consistencies can peacefully co-exist with with historical or present-day inconsistencies, at least so long as you are using a topic map.

I commend the entire issue to your for reading!

Electronic Literature Organization

Sunday, June 19th, 2016

Electronic Literature Organization

From the “What is E-Lit” page:

Electronic literature, or e-lit, refers to works with important literary aspects that take advantage of the capabilities and contexts provided by the stand-alone or networked computer. Within the broad category of electronic literature are several forms and threads of practice, some of which are:

  • Hypertext fiction and poetry, on and off the Web
  • Kinetic poetry presented in Flash and using other platforms
  • Computer art installations which ask viewers to read them or otherwise have literary aspects
  • Conversational characters, also known as chatterbots
  • Interactive fiction
  • Literary apps
  • Novels that take the form of emails, SMS messages, or blogs
  • Poems and stories that are generated by computers, either interactively or based on parameters given at the beginning
  • Collaborative writing projects that allow readers to contribute to the text of a work
  • Literary performances online that develop new ways of writing

The ELO showcase, created in 2006 and with some entries from 2010, provides a selection outstanding examples of electronic literature, as do the two volumes of our Electronic Literature Collection.

The field of electronic literature is an evolving one. Literature today not only migrates from print to electronic media; increasingly, “born digital” works are created explicitly for the networked computer. The ELO seeks to bring the literary workings of this network and the process-intensive aspects of literature into visibility.

The confrontation with technology at the level of creation is what distinguishes electronic literature from, for example, e-books, digitized versions of print works, and other products of print authors “going digital.”

Electronic literature often intersects with conceptual and sound arts, but reading and writing remain central to the literary arts. These activities, unbound by pages and the printed book, now move freely through galleries, performance spaces, and museums. Electronic literature does not reside in any single medium or institution.

I was looking for a recent presentation by Allison Parrish on bots when I encountered Electronic Literature Organization (ELO).

I was attracted by the bot discussion at a recent conference but as you can see, the range of activities of the ELO is much broader.

Enjoy!

Exploratory Programming for the Arts and Humanities

Wednesday, April 6th, 2016

Exploratory Programming for the Arts and Humanities by Nick Montfort.

From the webpage:

This book introduces programming to readers with a background in the arts and humanities; there are no prerequisites, and no knowledge of computation is assumed. In it, Nick Montfort reveals programming to be not merely a technical exercise within given constraints but a tool for sketching, brainstorming, and inquiring about important topics. He emphasizes programming’s exploratory potential—its facility to create new kinds of artworks and to probe data for new ideas.

The book is designed to be read alongside the computer, allowing readers to program while making their way through the chapters. It offers practical exercises in writing and modifying code, beginning on a small scale and increasing in substance. In some cases, a specification is given for a program, but the core activities are a series of “free projects,” intentionally underspecified exercises that leave room for readers to determine their own direction and write different sorts of programs. Throughout the book, Montfort also considers how computation and programming are culturally situated—how programming relates to the methods and questions of the arts and humanities. The book uses Python and Processing, both of which are free software, as the primary programming languages.

Full Disclosure: I haven’t seen a copy of Exploratory Programming.

I am reluctant to part with $40.00 US for either print or an electronic version where the major heads in the table of contents read as follows:

1 Modifying a Program

2 Calculating

3 Double, Double

4 Programming Fundamentals

5 Standard Starting Points

6 Text I

7 Text II

8 Image I

9 Image II

10 Text III

11 Statistics and Visualization

12 Animation

13 Sound

14 Interaction

15 Onward

The table of contents shows more than one hundred pages out of two hundred and sixty-three are spend on introduction to computer programming topics.

Text, which has a healthy section on string operations, merits a mere seventy pages. The other one hundred pages is split between visualization, sound, animation, etc.

Compare that table of contents with this one*:

Chapter One – Modular Programming: An Approach

Chapter Two – Data Entry and Text Verification

Chapter Three – Index and Concordance

Chapter Four – Text Criticism

Chapter Five – Improved Searching Techniques

Chapter Six – Morphological Analysis

Which table of contents promises to be more useful for exploration?

Personal computers are vastly more powerful today than when the second table of contents was penned.

Yet, students start off as though they are going to write their own tools from scratch. Unlikely and certainly not the best use of their time.

In depth coverage of the NLTK Toolkit historical or contemporary texts, in depth, would teach them a useful tool. A tool they could apply to other material.

To cover machine learning, consider Weka. A tool students can learn in class and then apply in new and different situations.

There are tools for image and sound analysis but the important term is tool.

Just as we don’t teach students to make their own paper, we should focus on enabling them to reap the riches that modern software tools offer.

Or to put it another way, let’s stop repeating the past and move forward.

* Oh, the second table of contents? Computer Programs for Literary Analysis, John R. Abercrombie, Philadelphia : Univ. of Philadelphia Press, ©1984. Yes, 1984.

Laypersons vs. Scientists – “…laypersons may be prone to biases…”

Saturday, March 12th, 2016

The “distinction” between laypersons and scientists is more a world view about some things than “all scientists are rational” or “all laypersons are irrational.” Scientists and laypersons can be just as rational and/or irrational, depending upon the topic at hand.

Having said that, The effects of social identity threat and social identity affirmation on laypersons’ perception of scientists by Peter Nauroth, et al., finds, unsurprisingly, that if a layperson’s social identity is threatened by research, they have a less favorable view of the scientists involved.

Abstract:

Public debates about socio-scientific issues (e.g. climate change or violent video games) are often accompanied by attacks on the reputation of the involved scientists. Drawing on the social identity approach, we report a minimal group experiment investigating the conditions under which scientists are perceived as non-prototypical, non-reputable, and incompetent. Results show that in-group affirming and threatening scientific findings (compared to a control condition) both alter laypersons’ evaluations of the study: in-group affirming findings lead to more positive and in-group threatening findings to more negative evaluations. However, only in-group threatening findings alter laypersons’ perceptions of the scientists who published the study: scientists were perceived as less prototypical, less reputable, and less competent when their research results imply a threat to participants’ social identity compared to a non-threat condition. Our findings add to the literature on science reception research and have implications for understanding the public engagement with science.

Perceived attacks on personal identity have negative consequences for the “reception” of science.

Implications for public engagement with science

Our findings have immediate implications for public engagement with science activities. When laypersons perceive scientists as less competent, less reputable, and not representative of the scientific community and the scientist’s opinion as deviating from the current scientific state-of-the-art, laypersons might be less willing to participate in constructive discussions (Schrodt et al., 2009). Furthermore, our mediation analysis suggests that these negative perceptions deepen the trench between scientists and laypersons concerning the current scientific state-of-the-art. We speculate that these biases might actually even lead to engagement activities to backfire: instead of developing a mutual understanding they might intensify laypersons’ misconceptions about the scientific state-of-the-art. Corroborating this hypothesis, Binder et al. (2011) demonstrated that discussions about controversial science topics may in fact polarize different groups around a priori positions. Additional preliminary support for this hypothesis can also be found in case studies about public engagement activities in controversial socio-scientific issues. Some of these reports (for two examples, see Lezaun and Soneryd, 2007) indicate problems to maintain a productive atmosphere between laypersons and experts in the discussion sessions.

Besides these practical implications, our results also add further evidence to the growing body of literature questioning the validity of the deficit model in science communication according to which people’s attitudes toward science are mainly determined by their knowledge about science (Sturgis and Allum, 2004). We demonstrated that social identity concerns profoundly influence laypersons’ perceptions and evaluations of scientific results regardless of laypersons’ knowledge. However, our results also question whether involving laypersons in policy decision processes based upon scientific evidence is reasonable in all socio-scientific issues. Particularly when the scientific evidence has potential negative consequences for social groups, our research suggests that laypersons may be prone to biases based upon their social affiliations. For example, if regular video game players were involved in decision-making processes concerning potential sales restrictions of violent video games, they would be likely to perceive scientific evidence demonstrating detrimental effects of violent video games as shoddy and the respective researchers as disreputable (Greitemeyer, 2014; Nauroth et al., 2014, 2015).(emphasis added)

The principle failure of this paper is its failure to study the scientific community and its reaction within science to research that attacks the personal identity of its participants.

I don’t think it is reading too much into the post: Academic, Not Industrial Secrecy, where one group said:

We want restrictions on who could do the analyses.

to say that attacks on personal identity leads to boorish behavior on the part of scientists.

Laypersons and scientists emit a never ending stream of examples of prejudice, favoritism, sycophancy, sloppy reasoning, to say nothing of careless and/or low quality work.

Reception of science among laypersons might improve if the scientific community abandoned its facade of “it’s objective, it’s science.”

That facade was tiresome by WWII and to keep repeating now is a disservice to the scientific community.

All of our efforts, in any field, are human endeavors and thus subject to the vagaries and uncertainties human interaction.

Live with it.

For Linguists on Your Holiday List

Saturday, December 12th, 2015

Hey Linguists!—Get Them to Get You a Copy of The Speculative Grammarian Essential Guide to Linguistics.

From the website:

Hey Linguists! Do you know why it is better to give than to receive? Because giving requires a lot more work! You have to know what someone likes, what someone wants, who someone is, to get them a proper, thoughtful gift. That sounds like a lot of work.

No, wait. That’s not right. It’s actually more work to be the recipient—if you are going to do it right. You can’t just trust people to know what you like, what you want, who you are.

You could try to help your loved ones understand a linguist’s needs and wants and desires—but you’d have to give them a mini course on historical, computational, and forensic linguistics first. Instead, you can assure them that SpecGram has the right gift for you—a gift you, their favorite linguist, will treasure for years to come: The Speculative Grammarian Essential Guide to Linguistics.

So drop some subtle or not-so-subtle hints and help your loved ones do the right thing this holiday season: gift you with this hilarious compendium of linguistic sense and nonsense.

If you need to convince your friends and family that they can’t find you a proper gift on their own, send them one of the images below, and try to explain to them why it amuses you. That’ll show ’em! (More will be added through the rest of 2015, just in case your friends and family are a little thick.)

• If guilt is more your style, check out 2013’s Sad Holiday Linguists.

• If semi-positive reinforcement is your thing, check out 2014’s Because You Can’t Do Everything You Want for Your Favorite Linguist.

Disclaimer: I haven’t proofed the diagrams against the sources cited. Rely on them at your own risk. 😉

There are others but the Hey Semioticians! reminded me of John Sowa (sorry John):

semiotics

The greatest mistake across all disciplines is taking ourselves (and our positions) far too seriously.

Enjoy!

On Teaching XQuery to Digital Humanists [Lesson in Immediate ROI]

Wednesday, November 18th, 2015

On Teaching XQuery to Digital Humanists by Clifford B. Anderson.

A paper presented at Balisage 2014 but still a great read for today. In particular where Clifford makes the case for teaching XQuery to humanists:

Making the Case for XQuery

I may as well state upfront that I regard XQuery as a fantastic language for digital humanists. If you are involved in marking up documents in XML, then learning XQuery will pay long-term dividends. I do have arguments for this bit of bravado. My reasons for lifting up XQuery as a programing language of particular interest to digital humanists are essentially three:

  • XQuery is domain-appropriate for digital humanists.

Let’s take each of these points in turn.

First, XQuery fits the domain of digital humanists. Admittedly, I am focusing here on a particular area of the digital humanities, namely the domain of digital text editing and analysis. In that domain, however, XQuery proves a nearly perfect match to the needs of digital humanists.

If you scour the online communities related to digital humanities, you will repeatedly find conversations about which programming languages to learn. Predictably, the advice is all over the map. PHP is easy to learn, readily accessible, and the language of many popular projects in the digital humanities such as Omeka and Neatline. Javascript is another obvious choice given its ubiquity. Others recommend Python or Ruby. At the margins, you’ll find the statistically-inclined recommending R. There are pluses and minuses to learning any of these languages. When you are working with XML, however, they all fall short. Inevitably, working with XML in these languages will require learning how to use packages to read XML and convert it to other formats.

Learning XQuery eliminates any impedance between data and code. There is no need to import any special packages to work with XML. Rather, you can proceed smoothly from teaching XML basics to showing how to navigate XML documents with XPath to querying XML with XQuery. You do not need to jump out of context to teach students about classes, objects, tables, or anything as awful-sounding as “shredding” XML documents or storing them as “blobs.” XQuery makes it possible for students to become productive without having to learn as many computer science or software engineering concepts. A simple four or five line FLWOR expression can easily demonstrate the power of XQuery and provide a basis for students’ tinkering and exploration. (emphasis added)

I commend the rest of the paper to you for reading but Clifford’s first point nails why learn XQuery for humanists and others.

The part I highlighted above sums it up:

XQuery makes it possible for students to become productive without having to learn as many computer science or software engineering concepts. A simple four or five line FLWOR expression can easily demonstrate the power of XQuery and provide a basis for students’ tinkering and exploration. (emphasis added)

Whether you are a student, scholar or even a type-A business type, what do you want?

To get sh*t done!

A few of us like tinkering with edge cases, proofs, theorems and automata, but having the needed output on time or sooner, really makes the day for most folks.

A minimal amount of XQuery expressions will open up XML encoded data for your custom exploration. You can experience an immediate ROI from the time you spend learning XQuery. Which will prompt you to learn more XQuery.

Think of learning XQuery as a step towards user independence. Independence from the choices made by unseen and unknown programmers.

Are you ready to take that step?

You do not want to be an edge case [The True Skynet: Your Homogenized Future]

Friday, November 13th, 2015

You do not want to be an edge case.

John D. Cook writes:

Hilary Mason made an important observation on Twitter a few days ago:

You do not want to be an edge case in this future we are building.

Systems run by algorithms can be more efficient on average, but make life harder on the edge cases, people who are exceptions to the system developers’ expectations.

Algorithms, whether encoded in software or in rigid bureaucratic processes, can unwittingly discriminate against minorities. The problem isn’t recognized minorities, such as racial minorities or the disabled, but unrecognized minorities, people who were overlooked.

For example, two twins were recently prevented from getting their drivers licenses because DMV software couldn’t tell their photos apart. Surely the people who wrote the software harbored no malice toward twins. They just didn’t anticipate that two drivers licence applicants could have indistinguishable photos.

I imagine most people reading this have had difficulty with software (or bureaucratic procedures) that didn’t anticipate something about them; everyone is an edge case in some context. Maybe you don’t have a middle name, but a form insists you cannot leave the middle name field blank. Maybe there are more letters in your name or more children in your family than a programmer anticipated. Maybe you choose not to use some technology that “everybody” uses. Maybe you happen to have a social security number that hashes to a value that causes a program to crash.

When software routinely fails, there obviously has to have a human override. But as software improves for most people, there’s less apparent need to make provision for the exceptional cases. So things could get harder for edge cases as they get better for more people.

Recent advances in machine learning have led reputable thinkers (Steven Hawking for example) to envision a future where an artificial intelligence will arise to dispense with humanity.

If you think you have heard that theme before, you have, most recently as Skynet, an entirely fictional creation in the Terminator science fiction series.

Given that no one knows how the human brain works, much less how intelligence arises, despite such alarmist claims making good press, the risk is less than a rogue black hole or a gamma-ray burst. I don’t lose sleep over either one of those, do you?

The greater “Skynet” threat to people and their cultures is the enforced homogenization of language and culture.

John mentions lacking a middle name but consider the complexities of Japanese names. Due to the creeping infection of Western culture and computer-based standardization, many Japanese list their names in Western order, given name, family name, instead of the Japanese order of family name, given name.

Even languages can start the slide to being “edge cases,” as you will see from the erosion of Hangul (Korean alphabet) from public signs in Seoul.

Computers could be preserving languages and cultural traditions, they have the capacity and infinite patience.

But they are not being used for that purpose.

Cellphones, for example, are linking humanity into a seething mass of impoverished social interaction. Impoverished social interaction that is creating more homogenized languages, not preserving diverse ones.

Not only should you be an edge case but you should push back against the homogenizing impact of computers. The diversity we lose could well be your own.

Programming for Humanists at TAMU [and Business Types]

Monday, August 17th, 2015

Programming for Humanists at TAMU

From the webpage:

[What is DH?] Digital Humanities studies the intersection and mutual influence of humanities ideas and digital methods, with the goal of understanding how the use of digital technologies and approaches alters the practice and theory of humanities scholarship. In this sense it is concerned with studying the emergence of scholarly disciplines and communicative practices at a time when those are in flux, under the influence of rapid technological, institutional and cultural change. As a way of identifying digital interests and efforts within traditional humanities fields, the term “digital humanities” also identifies, in a general way, any kind of critical engagement with digital tools and methods in a humanities context. This includes the creation of digital editions and digital text or image collections, and the creation and use of digital tools for the investigation and analysis of humanities research materials. – Julia Flanders, Northeastern University (http://goo.gl/BJeXk2)

 

Programming4Humanists is a two-semester course sequence designed to introduce participants to methodologies, coding, and programming languages associated with the Digital Humanities. We focus on creation, editing, and searchability of digital archives, but also introduce students to data mining and statistical analysis. Our forte at Texas A&M University (TAMU) is Optical Character Recognition of early modern texts, a skill we learned in completing the Early Modern OCR Project, or eMOP. Another strength that the Initiative for Digital Humanities, Media, and Culture (http://idhmc.tamu.edu) at TAMU brings to this set of courses is the Texas A&M University Press book series called “Programming for Humanists.” We use draft and final versions of these books, as well as many additional resources available on companion web pages, for participants in the workshop. The books in this series are of course upon publication available to anyone, along with the companion sites, whether the person has participated in the workshop or not. However, joining the Programming4Humanists course enables participants to communicate with the authors of these books for the sake of asking questions and indeed, through their questioning, helping us to improve the books and web materials. Our goal is to help people learn Digital Humanities methods and techniques.

Participants

Those who should attend include faculty, staff, librarians, undergraduates, and graduate students, interested in making archival and cultural materials available to a wide audience while encoding them digitally according to best practices, standards that will allow them to submit their digital editions for peer review by organizations such as the MLA Committee for Scholarly Edition and NINES / 18thConnect. Librarians will be especially interested in learning our OCR procedures as a means for digitizing large archives. Additionally, scholars, students, and librarians will receive an introduction to text mining and XQuery, the latter used for analyzing semantically rich data sets. This course gives a good overview of what textual and archival scholars are accomplishing in the field of Digital Humanities, even though the course is primarily concerned with teaching skills to participants. TAMU graduate and undergraduate students may take this course for 2 credit hours, see Schedule of Classes for Fall 2015: LBAR 489 or 689 Digital Scholarship and Publication.

Prerequisites

No prior knowledge is required but some familiarity with TEI/XML, HTML, and CSS will be helpful (See previous Programming 4 Humanists course syllabus). Certificate registrants will receive certificates confirming that they have a working knowledge of Drupal, XSLT, XQuery, and iPython Notebooks. Registration for those getting a certificate includes continued access to all class videos during the course period and an oXygen license. Non-certificate registrants will have access to the class videos for one week.

Everything that Julia says is true and this course will be very valuable for traditional humanists.

It will also be useful for business types who aren’t quants or CS majors/minors. The same “friendly” learning curve is suitable to both audiences.

You won’t be a “power user” at the end of this course but you will sense when CS folks are blowing smoke. It happens.

New Testament Virtual Manuscript Room

Tuesday, June 2nd, 2015

New Testament Virtual Manuscript Room

From the webpage:

This site is devoted to the study of Greek New Testament manuscripts. The New Testament Virtual Manuscript Room is a place where scholars can come to find the most exhaustive list of New Testament manuscript resources, can contribute to marking attributes about these manuscripts, and can find state of the art tools for researching this rich dataset.

While our tools are reasonably functional for anonymous users, they provide additional features and save options once a user has created an account and is logged in on the site. For example, registered users can save transcribed pages to their personal account and create personalized annotations to images.

A close friend has been working on this project for several years. Quite remarkable although I would prefer it to feature Hebrew (and older) texts. 😉

Spatial Humanities Workshop

Tuesday, June 2nd, 2015

Spatial Humanities Workshop by Lincoln Mullen.

From the webpage:

Scholars in the humanities have long paid attention to maps and space, but in recent years new technologies have created a resurgence of interest in the spatial humanities. This workshop will introduce participants to the following subjects:

  • how mapping and spatial analysis are being used in humanities disciplines
  • how to find, create, and manipulate spatial data
  • how to create historical layers on interactive maps
  • how to create data-driven maps
  • how to tell stories and craft arguments with maps
  • how to create deep maps of places
  • how to create web maps in a programming language
  • how to use a variety of mapping tools
  • how to create lightweight and semester-long mapping assignments

The seminar will emphasize the hands-on learning of these skills. Each day we will pay special attention to preparing lesson plans for teaching the spatial humanities to students. The aim is to prepare scholars to be able to teach the spatial humanities in their courses and to be able to use maps and spatial analysis in their own research.

Ahem, the one thing Larry forgets to mention is that he is a major player in spatial humanities. His homepage is an amazing place.

The seminar materials don’t disappoint. It would be better to be at the workshop but in lieu of attending, working through these materials will leave you well grounded in spatial humanities.

Civil War Navies Bookworm

Tuesday, May 19th, 2015

Civil War Navies Bookworm by Abby Mullen.

From the post:

If you read my last post, you know that this semester I engaged in building a Bookworm using a government document collection. My professor challenged me to try my system for parsing the documents on a different, larger collection of government documents. The collection I chose to work with is the Official Records of the Union and Confederate Navies. My Barbary Bookworm took me all semester to build; this Civil War navies Bookworm took me less than a day. I learned things from making the first one!

This collection is significantly larger than the Barbary Wars collection—26 volumes, as opposed to 6. It encompasses roughly the same time span, but 13 times as many words. Though it is still technically feasible to read through all 26 volumes, this collection is perhaps a better candidate for distant reading than my first corpus.

The document collection is broken into geographical sections, the Atlantic Squadron, the West Gulf Blockading Squadron, and so on. Using the Bookworm allows us to look at the words in these documents sequentially by date instead of having to go back and forth between different volumes to get a sense of what was going on in the whole navy at any given time.

Before you ask:

The earlier post: Text Analysis on the Documents of the Barbary Wars

More details on Bookworm.

As with all ngram viewers, exercise caution in assuming a text string has uniform semantics across historical, ethnic, or cultural fault lines.

Digital Approaches to Hebrew Manuscripts

Friday, May 8th, 2015

Digital Approaches to Hebrew Manuscripts

Monday 18th – Tuesday 19th of May 2015

From the webpage:

We are delighted to announce the programme for On the Same Page: Digital Approaches to Hebrew Manuscripts at King’s College London. This two-day conference will explore the potential for the computer-assisted study of Hebrew manuscripts; discuss the intersection of Jewish Studies and Digital Humanities; and share methodologies. Amongst the topics covered will be Hebrew palaeography and codicology, the encoding and transcription of Hebrew texts, the practical and theoretical consequences of the use of digital surrogates and the visualisation of manuscript evidence and data. For the full programme and our Call for Posters, please see below.

Organised by the Departments of Digital Humanities and Theology & Religious Studies (Jewish Studies)
Co-sponsor: Centre for Late Antique & Medieval Studies (CLAMS), King’s College London

I saw this at the blog for DigiPal: Digital Resource and Database of Palaeolography, Manuscript Studies and Diplomatic. Confession, I have never understood how the English derive acronyms and this confounds me as much as you. 😉

Be sure to look around at the DigiPal site. There are numerous manuscript images, annotation techniques, and other resources for those who foster scholarship by contributing to it.

Web Gallery of Art

Thursday, April 9th, 2015

Web Gallery of Art

From the homepage:

The Web Gallery of Art is a virtual museum and searchable database of European fine arts from 11th to 19th centuries. It was started in 1996 as a topical site of the Renaissance art, originated in the Italian city-states of the 14th century and spread to other countries in the 15th and 16th centuries. Intending to present Renaissance art as comprehensively as possible, the scope of the collection was later extended to show its Medieval roots as well as its evolution to Baroque and Rococo via Mannerism. Encouraged by the feedback from the visitors, recently 19th-century art was also included. However, we do not intend to present 20th-century and contemporary art.

The collection has some of the characteristics of a virtual museum. The experience of the visitors is enhanced by guided tours helping to understand the artistic and historical relationship between different works and artists, by period music of choice in the background and a free postcard service. At the same time the collection serves the visitors’ need for a site where various information on art, artists and history can be found together with corresponding pictorial illustrations. Although not a conventional one, the collection is a searchable database supplemented by a glossary containing articles on art terms, relevant historical events, personages, cities, museums and churches.

The Web Gallery of Art is intended to be a free resource of art history primarily for students and teachers. It is a private initiative not related to any museums or art institutions, and not supported financially by any state or corporate sponsors. However, we do our utmost, using authentic literature and advice from professionals, to ensure the quality and authenticity of the content.

We are convinced that such a collection of digital reproductions, containing a balanced mixture of interlinked visual and textual information, can serve multiple purposes. On one hand it can simply be a source of artistic enjoyment; a convenient alternative to visiting a distant museum, or an incentive to do just that. On the other hand, it can serve as a tool for public education both in schools and at home.

The Gallery doesn’t own the works in question and so resolves the copyright issue thus:

The Web Gallery of Art is copyrighted as a database. Images and documents downloaded from this database can only be used for educational and personal purposes. Distribution of the images in any form is prohibited without the authorization of their legal owner.

The Gallery suggests contacting the Scala Group (or Art Resource, Scala’s U.S. representative) if you need rights beyond educational and personal purposes.

To see how images are presented, view 10 random images from the database. (Warning: The 10 random images link will work only once. If you try it again, images briefly display and then an invalid CGI environment message pops up. Suspect if you clear the browser cache it should work a second time.)

BTW, you can listen to classical music in the background while you browse/search. That is a very nice touch.

The site offers other features and options so take time to explore.

Having seen some of Michelangelo‘s works in person, I can attest no computer screen can duplicate that experience. However, if given the choice between viewing a pale imitation on a computer screen and not seeing his work at all, the computer version is a no brainer.

BHO – British History Online

Thursday, February 12th, 2015

BHO – British History Online

The “news” from 8 December 2014 (that I missed) reports:

British History Online (BHO) is pleased to launch version 5.0 of its website. Work on the website redevelopment began in January 2014 and involved a total rebuild of the BHO database and a complete redesign of the site. We hope our readers will find the new site easier to use than ever before. New features include:

  • A new search interface that allows you to narrow your search results by place, period, source type or subject.
  • A new catalogue interface that allows you to see our entire catalogue at a glance, or to browse by place, period, source type or subject.
  • Three subject guides on local history, parliamentary history and urban history. We are hoping to add more subject guides throughout the year. If you would like to contribute, contact us.
  • Guidelines on using BHO, which include searching and browsing help, copyright and citation information, and a list of external resources that we hope will be useful to readers.
  • A new about page that includes information about our team, past and present, as well as a history of where we have come from and where we want to go next.
  • A new subscription interface (at last!) which includes three new levels of subscription in addition to the usual premium content subscription: gold subscription, which includes access to page scans and five- and ten-year long-term BHO subscriptions.
  • Increased functionality to the maps interface, which are now fully zoomable and can even go full screen. We have also replaced the old map scans with high-quality versions.
  • We also updated the site with a fresh, new look! We aimed for easy-to-read text, clear navigation, clean design and bright new images.

​Version 5.0 has been a labour of love for the entire BHO team, but we have to give special thanks to Martin Steer, our tireless website manager who rebuilt the site from the ground up.

For over a decade, you have turned to BHO for reliable and accessible sources for the history of Britain and Ireland. We started off with 29 publications in 2003 and here is where we are now:

  • 1.2 million page views per month
  • 365,000 sessions per month
  • 1,241 publications
  • 108,227 text files
  • 184,355 images
  • 10,380 maps​

​We are very grateful to our users who make this kind of development possible. Your support allows BHO to always be growing and improving. 2014 has been a busy year for BHO and 2015 promises to be just as busy. Version 5.0 was a complete rebuild of BHO. We stripped the site down and began rebuilding from scratch. The goal of the new site is to make it as easy as possible for you to find materials relevant to your research. The new site was designed to be able to grow and expand easily, while always preserving the most important features of BHO. Read about our plans for 2015 and beyond.

We’d love to hear your feedback on our new site! If you want to stay up-to-date on what we are doing at BHO, follow us on Twitter.

Subscriptions are required for approximately 20% of the content, which enables the BHO to offer the other 80% for free.

A resource such as the BHO is a joyful reminder that not all projects sanctioned by government and its co-conspirators are venal and ill-intended.

For example, can you imagine a secondary school research paper on the Great Fire of 1666 that includes observations based on Leake’s Survey of the City After the Great Fire of 1666 Engraved By W. Hollar, 1667? With additional references from BHO materials?

I would have struck a Faustian bargain in high school had such materials been available!

That is just one treasure among many.

Teachers of English, history, humanities, etc., take note!

I first saw this in a tweet by Institute of Historical Research, U. of London.

Comparative Oriental Manuscript Studies: An Introduction

Sunday, January 25th, 2015

Comparative Oriental Manuscript Studies: An Introduction edited by: Alessandro Bausi (General editor), et al.

The “homepage” of this work enables you to download the entire volume or individual chapters, depending upon your interests. It provides a lengthy introduction to codicology, palaeography, textual criticism and text editing, and of special interest to library students, cataloguing as well as conservation and preservation.

Alessandro Bausi writes in the preface:

Thinking more broadly, our project was also a serious attempt to defend and preserve the COMSt-related fields within the academic world. We know that disciplines and fields are often determined and justified by the mere existence of an easily accessible handbook or, in the better cases, sets of handbooks, textbooks, series and journals. The lack of comprehensive introductory works which are reliable, up-to-date, of broad interest and accessible to a wide audience and might be used in teaching, has a direct impact on the survival of the ‘small subjects’ most of the COMSt-related disciplines pertain to. The decision to make the COMSt handbook freely accessible online and printable on demand in a paper version at an affordable price was strategic in this respect, and not just meant to meet the prescriptions of the European Science Foundation. We deliberately declined to produce an extremely expensive work that might be bought only by a few libraries and research institutions; on the other hand, a plain electronic edition only to be accessed and downloaded as a PDF file was not regarded as a desirable solution either. Dealing with two millennia of manuscripts and codices, we did not want to dismiss the possibility of circulating a real book in our turn.

It remains, hopefully, only to say,

Lector intende: laetaberis

John Svarlien says: A rough translation is: “Reader, pay attention. You will be happy you did.”

We are all people of books. It isn’t possible to separate present day culture and what came before it from books. Even people who shun reading of books, are shaped by forces that can be traced back to books.

But books did not suddenly appear as mass-printed paperbacks in airport lobbies and checkout lines in grocery stores. There is a long history of books prior to printing to the edges of the formation of codices.

This work is an introduction to the fascinating world of studying manuscripts and codices prior to the invention of printing. When nearly every copy of a work is different from every other copy, you can imagine the debates over which copy is the “best” copy.

Imagine some versions of “Gone with the Wind” ending with:

  • Frankly, my dear, I don’t give a damn. (traditional)
  • Ashley and I don’t give a damn. (variant)
  • Cheat Ashley out of his business I suppose. (variant)
  • (Lacks a last line due to mss. damage.) (variant)

The “text” of yesteryear lacked the uniform sameness of the printed “text” of today.

When you think about your “favorite” version in the Bible, it is likely a “majority” reading but hardly the only one.

With the advent of the printing press, texts took on the opportunity to be uniformly produced in mass quantities.

With the advent of electronic texts, either due to editing or digital corruption, we are moving back towards non-uniform texts.

Will we see the birth of digital codicology and its allied fields for digital texts?

PS: Please forward the notice of this book to your local librarian.

I first saw this in a tweet by Kirk Lowery.

DiRT Digital Research Tools

Friday, January 23rd, 2015

DiRT Digital Research Tools

From the post:

The DiRT Directory is a registry of digital research tools for scholarly use. DiRT makes it easy for digital humanists and others conducting digital research to find and compare resources ranging from content management systems to music OCR, statistical analysis packages to mindmapping software.

Interesting concept but the annotations are too brief to convey much information. Not to mention that within a category, say Conduct linguistic research or Transcribe handwritten or spoken texts, the entries have no apparent order, or should I say they are not arranged in alphabetical order by name. There may be some other order that is escaping me.

Some entries appear in the wrong categories, such as Xalan being found under Transcribe handwritten or spoken texts:

Xalan
Xalan is an XSLT processor for transforming XML documents into HTML, text, or other XML document types. It implements XSL Transformations (XSLT) Version 1.0 and XML Path Language (XPath) Version 1.0.

Not what I think of when I think about transcribing handwritten or spoken texts. You?

I didn’t see a process for submitting corrections/comments on resources. I will check and post on this again. It could be a useful tool.

I first saw this in a tweet by Christophe Lalanne.

Humanities Open Book: Unlocking Great Books

Friday, January 16th, 2015

Humanities Open Book: Unlocking Great Books

Deadline: June 10, 2015

A new joint grant program by the National Endowment for the Humanities (NEH) and the Andrew W. Mellon Foundation seeks to give a second life to outstanding out-of-print books in the humanities by turning them into freely accessible e-books.

Over the past 100 years, tens of thousands of academic books have been published in the humanities, including many remarkable works on history, literature, philosophy, art, music, law, and the history and philosophy of science. But the majority of these books are currently out of print and largely out of reach for teachers, students, and the public. The Humanities Open Book pilot grant program aims to “unlock” these books by republishing them as high-quality electronic books that anyone in the world can download and read on computers, tablets, or mobile phones at no charge.

The National Endowment for the Humanities (NEH) and the Andrew W. Mellon Foundation are the two largest funders of humanities research in the United States. Working together, NEH and Mellon will give grants to publishers to identify great humanities books, secure all appropriate rights, and make them available for free, forever, under a Creative Commons license.

The new Humanities Open Book grant program is part of the National Endowment for the Humanities’ agency-wide initiative The Common Good: The Humanities in the Public Square, which seeks to demonstrate and enhance the role and significance of the humanities and humanities scholarship in public life.

“The large number of valuable scholarly books in the humanities that have fallen out of print in recent decades represents a huge untapped resource,” said NEH Chairman William Adams. “By placing these works into the hands of the public we hope that the Humanities Open Book program will widen access to the important ideas and information they contain and inspire readers, teachers and students to use these books in exciting new ways.”

“Scholars in the humanities are making increasing use of digital media to access evidence, produce new scholarship, and reach audiences that increasingly rely on such media for information to understand and interpret the world in which they live,” said Earl Lewis, President of the Andrew W. Mellon Foundation. “The Andrew W. Mellon Foundation is delighted to join NEH in helping university presses give new digital life to enduring works of scholarship that are presently unavailable to new generations of students, scholars, and general readers.”

The National Endowment for the Humanities and the Andrew W. Mellon Foundation will jointly provide $1 million to convert out-of-print books into EPUB e-books with a Creative Commons (CC) license, ensuring that the books are freely downloadable with searchable texts and in formats that are compatible with any e-reading device. Books proposed under the Humanities Open Book program must be of demonstrable intellectual significance and broad interest to current readers.

Application guidelines and a list of F.A.Q’s for the Humanities Open Book program are available online at www.NEH.gov. The application deadline for the first cycle of Humanities Open Book grants is June 10, 2015.

What great news to start a weekend!

If you decided to apply, remember that topic maps can support indexes for a book or across books or across books and including other material. You could make a classic work in the humanities into a portal that opens onto work prior to its publication, at the time of its publication, or since. Something to set yourself apart from simply making that text available.

The Past, Present and Future of Scholarly Publishing

Saturday, January 3rd, 2015

The Past, Present and Future of Scholarly Publishing By Michael Eisen.

Michael made this presentation to the Commonwealth Club of California on March 12, 2013. This post is from the written text for the presentation and you can catch the audio here.

Michael does a great job tracing the history of academic publishing, the rise of open access and what is holding us back from a more productive publishing environment for everyone.

I disagree with his assessment of classification:

And as for classification, does anyone really think that assigning every paper to one of 10,000 journals, organized in a loose and chaotic hierarchy of topics and importance, is really the best way to help people browse the literature? This is a pure relic of a bygone era – an artifact of the historical accident that Gutenberg invented the printing press before Al Gore invented the Internet.

but will pass over that to address the more serious issue of open access publishing in the humanities.

Michael notes:

But the battle is by no means won. Open access collectively represents only around 10% of biomedical publishing, has less penetration in other sciences, and is almost non-existent in the humanities. And most scientists still send their best papers to “high impact” subscription-based journals.

There are open access journals in the humanities but it is fair to say they are few and far in between. If prestige is one of the drivers in scientific publishing, where large grant programs abound for some times of research, prestige is about the only driver for humanities publishing.

There are grant programs for the humanities but nothing on the scale of funding in the sciences. Salaries in the humanities are for the most part nothing to write home about. Humanities publishing really comes down to prestige.

Prestige from publication may be a dry, hard bone but it is the only bone that most humanities scholars will ever have. Try to take that away and you are likely to get bitten.

For instance, have you ever wondered about the proliferation of new translations of the Bible? Have we discovered new texts? New discoveries about biblical languages? Discovery of major mistakes in a prior edition? What if I said none of the above? To what would you assign the publication of new translations of the Bible?

If you compare the various translations you will find different “editors,” unless you are looking at a common source for bibles. Some sources do that as well. They create different “versions” for different target audiences.

With the exception of new versions like the New Revised Standard Version, which was undertaken to account for new information from the Dead Sea Scrolls, new editions of the Bible are primarily scholarly churn.

The humanities aren’t going to move any closer to open access publishing until their employers (universities) and funders, insist on open access publishing as a condition for tenure and funding.

I will address Michael’s mis-impressions about the value of classification another time. 😉

The Machines in the Valley Digital History Project

Friday, January 2nd, 2015

The Machines in the Valley Digital History Project by Jason Heppler.

From the post:

I am excited to finally release the digital component of my dissertation, Machines in the Valley.

My dissertation, Machines in the Valley, examines the environmental, economic, and cultural conflicts over suburbanization and industrialization in California’s Santa Clara Valley–today known as Silicon Valley–between 1945 and 1990. The high technology sector emerged as a key component of economic and urban development in the postwar era, particularly in western states seeking to diversify their economic activities. Industrialization produced thousands of new jobs, but development proved problematic when faced with competing views about land use. The natural allure that accompanied the thousands coming West gave rise to a modern environmental movement calling for strict limitations on urban growth, the preservation of open spaces, and the reduction of pollution. Silicon Valley stood at the center of these conflicts as residents and activists criticized the environmental impact of suburbs and industry in the valley. Debates over the Santa Clara Valley’s landscape tells the story not only of Silicon Valley’s development, but Americans’ changing understanding of nature and the environmental costs of urban and industrial development.

A great example of a digital project in the humanities!

How does Jason’s dissertation differ from a collection of resources on the same topic?

A collection of resources requires each of us to duplicate Jason’s work to extract the same information. Jason has curated the data, that is he has separated out the useful from the not so useful, eliminated duplicate sources that don’t contribute to the story, and provided his own analysis as a value-add to the existing data that he has organized. That means we don’t have to duplicate Jason’s work, for which we are all thankful.

How does Jason’s dissertation differ from a topic map on the same topic?

Take one of the coming soon topics for comparison:

“The Stanford Land Machine has Gone Berserk!” Stanford University and the Stanford Industrial Park (Coming Soon)

Stanford University is the largest landholder on the San Francisco Peninsula, controlling nearly 9,000 acres. In the 1950s, Stanford started acting as a real estate developer, first with the establishment of the Stanford Industrial Park in 1953 and later through several additional land development programs. These programs, however, ran into conflict with surrounding neighborhoods whose ideas for the land did not include industrialization.

Universities are never short on staff and alumni that they would prefer being staff and/or alumni from some other university. Jason will be writing about one or more such individuals under this topic. In the process of curation, he will select known details about such individuals as are appropriate for his discussion. It isn’t possible to include every known detail about any person, location, event, artifact, etc. No one would have time to read the argument being made in the dissertation.

In addition to the curation/editing process, there will be facts that Jason doesn’t uncover and/or that are unknown to anyone at present. If the governor of California can conceal an illegitimate child for ten years, it won’t be surprising to find other details about the people Jason discusses in his dissertation.

When such new information comes out, how do we put that together with the information already collected in Jason’s dissertation?

Unless you are expecting a second edition of Jason’s dissertation, the quick answer is we’re not. Not today, not tomorrow, not ever.

The current publishing paradigm is designed for republication, not incremental updating of publications. If new facts do appear and more likely enough time has passes that Jason’s dissertation is no longer “new,” some new PhD candidate will add new data, dig out the same data as Jason, and fashion a new dissertation.

If instead of imprisoning his data in prose, if Jason had his prose presentation for the dissertation and topics (as in topic maps) for the individuals, deeds, events, etc., then as more information is discovered, it could be fitted into his existing topic map of that data. Unlike the prose, a topic map doesn’t require re-publication in order to add new information.

In twenty or thirty years when Jason is advising some graduate student who wants to extend his dissertation, Jason can give them the topic map that has up to date data (or to be updated), making the next round of scholarship on this issue cumulative and not episodic.

Announcing Digital Pedagogy in the Humanities: Concepts, Models, and Experiments

Tuesday, December 23rd, 2014

Announcing Digital Pedagogy in the Humanities: Concepts, Models, and Experiments by Rebecca Frost Davis.

From the post:

I’m elated today to announce, along with my fellow editors, Matt Gold, Katherine D. Harris, and Jentery Sayers, and in conjunction with the Modern Language Association Digital Pedagogy in the Humanities: Concepts, Models, and Experiments, an open-access, curated collection of downloadable, reusable, and remixable pedagogical resources for humanities scholars interested in the intersections of digital technologies with teaching and learning. This is a book in a new form. Taken as a whole, this collection will document the richly-textured culture of teaching and learning that responds to new digital learning environments, research tools, and socio-cultural contexts, ultimately defining the heterogeneous nature of digital pedagogy. You can see the full announcement here: https://github.com/curateteaching/digitalpedagogy/blob/master
/announcement.md

Many of you may have heard of this born-digital project under some other names (Digital Pedagogy Keywords) and hashtags (#digipedkit). Since it was born at the MLA convention in 2012 it has been continually evolving. You can trace that evolution, in part, through my earlier presentations: http://rebeccafrostdavis.wordpress.com/tag/curateteaching/

For the future, please follow Digital Pedagogy in the Humanities on Twitter through the hashtag #curateteaching and visit our news page for updates. And if you know of a great pedagogical artifact to share, please help us curate teaching by tweeting it to the hashtag #curateteaching. We’ll be building an archive of those tweets, as well.

After looking at the list of keywords: Draft List of Keywords for Digital Pedagogy in the Humanities: Concepts, Models, and Experiments, I am hopeful those of you with a humanities background can suggest additional terms.

I didn’t see “topic maps” listed. 😉 Maybe that should be under Annotation? In any event, this looks like an exciting project.

Enjoy!

Digital Humanities in the Southeast 2014

Tuesday, December 9th, 2014

Digital Humanities in the Southeast 2014

Big data is challenging because of the three or four V’s, depending on who you believe. (Originally, volume, variety, and velocity. At some later point, veracity was added.) When big data fully realizes the need for semantics, they will need to add a capital S.

If you want to prepare for that eventuality, the humanities have projects where the data sets are small compared to big data but suffer from the big S, as in semantics.

A number of workshop presentations are listed, most with both audio and slides. Ranging from Latin and history to war and Eliot.

A great opportunity to see problems that are not difficult in the four Vs sense but are difficult none the less.

I first saw this in a tweet by Brian Croxall.

JudaicaLink released

Wednesday, July 30th, 2014

JudaicaLink released

From the post:

Data extractions from two encyclopediae from the domain of Jewish culture and history have been released as Linked Open Data within our JudaicaLink project.

JudaicaLink now provides access to 22,808 concepts in English (~ 10%) and Russian (~ 90%), mostly locations and persons.

See here for further information: http://www.judaicalink.org/blog/kai-eckert/encyclopedia-russian-jewry-released-updates-yivo-encyclopedia

Next steps in this project include “…the creation of links between the two encyclopedias and links to external sources like DBpedia or Geonames.”

In case you are interested, the two encyclopedias are:

The YIVO Encyclopedia of Jews in Eastern Europe, courtesy of the YIVO Institute of Jewish Research, NY.

Rujen.ru provides an Internet version of the Encyclopedia of Russian Jewry, which is published in Moscow since 1994, giving a comprehensive, objective picture of the life and activity of the Jews of Russia, the Soviet Union and the CIS.

For more details: Encyclopediae

If you are looking to contribute content or time to a humanities project, this should be on your short list.

Digital Humanities and Computer Science

Sunday, July 27th, 2014

Chicago Colloquium on Digital Humanities and Computer Science

Deadlines:

1 August 2014, abstracts of ~ 750 words and a minimal bio sent to martinmueller@northwestern.edu.

31 August 2014, Deadline for Early Registration Discount.

19 September 2014, Dealing for group rate reservations at the Orrington Hotel.

23-24 October, 2014 Colloquium.

From the call for papers:

The ninth annual meeting of the Chicago Colloquium on Digital Humanities and Computer Science (DHCS) will be hosted by Northwestern University on October 23-24, 2014.

The DHCS Colloquium has been a lively regional conference (with non-trivial bi-coastal and overseas sprinkling), rotating since 2006 among the University of Chicago (where it began), DePaul, IIT, Loyola, and Northwestern. At the first Colloquium Greg Crane asked his memorable question “What to do with a million books?” Here are some highlights that I remember across the years:

  • An NLP programmer at Los Alamos talking about the ways security clearances prevented CIA analysts and technical folks from talking to each other.
  • A demonstration that if you replaced all content words in Arabic texts and focused just on stop words you could determine with a high degree of certainty the geographical origin of a given piece of writing.
  • A visualization of phrases like “the king’s daughter” in a sizable corpus, telling you much about who owned what.
  • A social network analysis of Alexander the Great and his entourage.
  • An amazingly successful extraction of verbal parallels from very noisy data.
  • Did you know that Jane Austen was a game theorist before her time and that her characters were either skillful or clueless practitioners of this art?

And so forth. Given my own interests, I tend to remember “Text as Data” stuff, but there was much else about archaeology, art, music, history, and social or political life. You can browse through some of the older programs at http://lucian.uchicago.edu/blogs/dhcs/.

….

One of the weather sites promises that October is between 42 F for the low and 62 F for the high (on average). Sounds like a nice time to visit Northwestern University!

To say nothing of an exciting conference!

I first saw this in a tweet by David Bamman.

SAMUELS [English Historical Semantic Tagger]

Wednesday, July 9th, 2014

SAMUELS (Semantic Annotation and Mark-Up for Enhancing Lexical Searches)

From the webpage:

The SAMUELS project (Semantic Annotation and Mark-Up for Enhancing Lexical Searches) is funded by the Arts and Humanities Research Council in conjunction with the Economic and Social Research Council (grant reference AH/L010062/1) from January 2014 to April 2015. It will deliver a system for automatically annotating words in texts with their precise meanings, disambiguating between possible meanings of the same word, ultimately enabling a step-change in the way we deal with large textual data. It uses the Historical Thesaurus of English as its core dataset, and will provide for each word in a text the Historical Thesaurus reference code for that concept. Textual data tagged in this way can then be accurately searched and precisely investigated, producing results which can be automatically aggregated at a range of levels of precision. The project also draws on a series of research sub-projects which will employ the software thus developed, testing and validating the utility of the SAMUELS tagger as a tool for wide-ranging further research.
….

To really appreciate this project, visit SAMUELS English Semantic Tagger Test Site.

There you can enter up to 2000 English words and select low/upper year boundaries!

Just picking a text at random, ;-), I chose:

Greenpeace flew its 135-foot-long thermal airship over the Bluffdale, UT, data center early Friday morning, carrying the message: “NSA Illegal Spying Below” along with a link steering people to a new web site, StandAgainstSpying.org, which the three groups launched with the support of a separate, diverse coalition of over 20 grassroots advocacy groups and Internet companies. The site grades members of Congress on what they have done, or often not done, to rein in the NSA.

Some terms and Semtag3 by time period:

1500-1600:

  • congress: C09d01 [Sexual intercourse]; E07e16 [Inclination]; E08e12 [Movement towards a thing/person/position]
  • data: 04.10[Unrecognised]
  • thermal: 04.10[Unrecognised]
  • UT: 04.10[Unrecognised]
  • web: B06a07 [Disorders of eye/vision]; B22h08 [Class Arachnida (spiders, scorpions)]; B10 [Biological Substance];

1700-1800

  • congress: S06k17a [Diplomacy]; C09d01 [Sexual intercourse]; E07e16 [Inclination];
  • data: 04.10[Unrecognised]
  • thermal: 04.10[Unrecognised]
  • UT: 04.10[Unrecognised]
  • web: B06a07 [Disorders of eye/vision]; B22h08 [Class Arachnida (spiders, scorpions)]; B10 [Biological Substance];

1800-1900

  • congress: S06k17a [Diplomacy]; C09d01 [Sexual intercourse]; O07 [Conversation];
  • data: H55a [Attestation, witness, evidence];
  • thermal: A04b02 [Spring]; C09a [Sexual desire]; D03c02 [Heat];
  • UT: 04.10[Unrecognised]
  • web: B06a07 [Disorders of eye/vision]; B06d01 [Deformities of specific parts]; B25d [Tools and implements];

1900-2000

  • congress: S06k17a [Diplomacy]; C09d01 [Sexual intercourse]; O07 [Conversation];
  • data: F04v04 [Data]; H55a [Attestation, witness, evidence]; W05 [Information];
  • thermal: A04b02 [Spring]; B28b [Types/styles of clothing]; D03c02 [Heat];
  • UT: 04.10[Unrecognised]
  • web: B06d01 [Deformities of specific parts]; B22h08 [Class Arachnida (spiders, scorpions)]; B10 [Biological Substance];

2000-2014

  • congress: 04.10[Unrecognised]
  • data: 04.10[Unrecognised]
  • thermal: 04.10[Unrecognised]
  • UT: 04.10[Unrecognised]
  • web: 04.10[Unrecognised]

I am assuming that the “04.10[unrecognized]” for all terms in 2000-2014 means there is no usage data for that time period.

I have never heard anyone deny that meanings of words change over time and domain.

What remains a mystery is why the value-add of documenting the meanings of words isn’t obvious?

I say “words,” I should be saying “data.” Remembering the loss of the $125 Million Mars Climate Orbiter. One system read a value as “pounds of force” and another read the same data as “newtons.” In that scenario, ET doesn’t get to call home.

So let’s rephrase the question to: Why isn’t the value-add of documenting the semantics of data obvious?

Suggestions?