Archive for the ‘Library’ Category

New: Library of Congress Demographic Group Terms (LCDGT)

Wednesday, May 13th, 2015

From an email:

As part of its ongoing effort to provide effective access to library materials, the Library of Congress is developing a new vocabulary, entitled Library of Congress Demographic Group Terms (LCDGT). This vocabulary will be used to describe the creators of, and contributors to, resources, and also the intended audience of resources. It will be created and maintained by the Policy and Standards Division, and be distinct from the other vocabularies that are maintained by that division: Library of Congress Subject Headings (LCSH), Library of Congress Genre/Form Terms for Library and Archival Materials (LCGFT), and the Library of Congress Medium of Performance Thesaurus for Music (LCMPT).

A general rationale for the development of LCDGT, information about the pilot vocabulary, and a link to the Tentative List of terms in the pilot may be found on LC’s Acquisitions and Bibliographic Access website at

The Policy and Standards Division is accepting comments on the pilot vocabulary and the principles guiding its development through June 5, 2015. Comments may be sent to Janis L. Young at

A follow-up question to this post asked:

Is there a list of the codes used in field 072 in these lists? Some I can figure out, but it would be nice to see a list of the categories you’re using.

The list in question is: DEMOGRAPHIC GROUP TERMS.

To which Adam Schiff replied:

The list of codes is in and online at (although the latter is still lacking a few of the codes found in the former).


Digital Approaches to Hebrew Manuscripts

Friday, May 8th, 2015

Digital Approaches to Hebrew Manuscripts

Monday 18th – Tuesday 19th of May 2015

From the webpage:

We are delighted to announce the programme for On the Same Page: Digital Approaches to Hebrew Manuscripts at King’s College London. This two-day conference will explore the potential for the computer-assisted study of Hebrew manuscripts; discuss the intersection of Jewish Studies and Digital Humanities; and share methodologies. Amongst the topics covered will be Hebrew palaeography and codicology, the encoding and transcription of Hebrew texts, the practical and theoretical consequences of the use of digital surrogates and the visualisation of manuscript evidence and data. For the full programme and our Call for Posters, please see below.

Organised by the Departments of Digital Humanities and Theology & Religious Studies (Jewish Studies)
Co-sponsor: Centre for Late Antique & Medieval Studies (CLAMS), King’s College London

I saw this at the blog for DigiPal: Digital Resource and Database of Palaeolography, Manuscript Studies and Diplomatic. Confession, I have never understood how the English derive acronyms and this confounds me as much as you. 😉

Be sure to look around at the DigiPal site. There are numerous manuscript images, annotation techniques, and other resources for those who foster scholarship by contributing to it.

One Subject, Three Locators

Tuesday, May 5th, 2015

As you may know, the Library of Congress actively maintains its subject headings. Not surprising to anyone other than purveyors of fixed ontologies. New subjects appear, terminology changes, old subjects have new names, etc.

The Subject Authority Cooperative Program (SACO) has a mailing list:

About the SACO Listserv (

The SACO Program welcomes all interested parties to subscribe to the SACO listserv. This listserv was established first and foremost to facilitate communication with SACO contributors throughout the world. The Summaries of the Weekly Subject Editorial Review Meeting are posted to enable SACO contributors to keep abreast of changes and know if proposed headings have been approved or not. The listserv may also be used as a vehicle to foster discussions on the construction, use, and application of subject headings. Questions posted may be answered by any list member and not necessarily by staff in the Cooperative Programs Section (Coop) or PSD. Furthermore, participants are encouraged to provide comments, share examples, experiences, etc.

On the list this week was the question:

Does anyone know how these three sites differ as sources for consulting approved subject lists?

Janis L. Young, Policy and Standards Division, Library of Congress replied:

Just to clarify: all of the links that you and Paul listed take you to the same Approved Lists. We provide multiple access points to the information in order to accommodate users who approach our web site in different ways.

Depending upon your goals, the Approved Lists could be treated as a subject that has three locators.


Friday, May 1st, 2015

OPenn: Primary Digital Resources Available to All through Penn Libraries’ New Online Platform by Jessie Dummer.

From the post:

The Penn Libraries and the Schoenberg Institute for Manuscript Studies are thrilled to announce the launch of OPenn: Primary Resources Available to Everyone (, a new website that makes digitized cultural heritage material freely available and accessible to the public. OPenn is a major step in the Libraries’ strategic initiative to embrace open data, with all images and metadata on this site available as free cultural works to be freely studied, applied, copied, or modified by anyone, for any purpose. It is crucial to the mission of SIMS and the Penn Libraries to make these materials of great interest and research value easy to access and reuse. The OPenn team at SIMS has been working towards launching the website for the past year. Director Will Noel’s original idea to make our Medieval and Renaissance manuscripts open to all has grown into a space where the Libraries can collaborate with other institutions who want to open their data to the world.

Images of the manuscripts are currently available on OPenn at full resolution, with derivatives also provided for easy reuse on the web. Downloading, whether several select images or the entire dataset, is easily accomplished by following instructions or recipes posted in the Technical Read Me on OPenn. The website is designed to be machine-readable, but easy for individuals to use, too.

Oh, the manuscripts themselves?

Licensing is a real treat:

All images and their contents from the Lawrence J. Schoenberg Collection are free of known copyright restrictions and in the public domain. See the Creative Commons Public Domain Mark page for more information on terms of use:

Unless otherwise stated, all manuscript descriptions and other cataloging metadata are ©2015 The University of Pennsylvania Libraries. They are licensed for use under a Creative Commons Attribution Licensed version 4.0 (CC-BY-4.0):

For a description of the terms of use see, the Creative Commons Deed:

In substance and licensing such a departure from academic societies that still consider comping travel and hotel rooms as “fostering scholarship.” “Ye shall know them by their fruits.” (Matthew 7:16)

Almost a Topic Map? Or Just a Mashup?

Thursday, April 9th, 2015

WikipeDPLA by Eric Phetteplace.

From the webpage:

See relevant results from the Digital Public Library of America on any Wikipedia article. This extension queries the DPLA each time you visit a Wikipedia article, using the article’s title, redirects, and categories to find relevant items. If you click a link at the top of the article, it loads in a series of links to the items. The original code behind WikiDPLA was written at LibHack, a hackathon at the American Library Association’s 2014 Midwinter Meeting in Philadelphia:

Google Chrome App Home Page

GitHub page

Wikipedia:The Wikipedia Library/WikipeDPLA

How you resolve the topic map versus mashup question depends on how much precision you expect from a topic map. While knowing additional places to search is useful, I never have a problem with assembling more materials than can be read in the time allowed. On the other hand, some people may need more prompting than others, so I can’t say that general references are out of bounds.

Assuming you were maintaining data sets with locally unique identifiers, using a modification of this script to query an index of all local scripts (say Pig scripts) to discover other scripts using the same data could be quite useful.

BTW, you need to have a Wikipedia account and be logged in for the extension to work. Or at least that was my experience.


Galleries, Libraries, Archives, and Museums (GLAM CC Licensing)

Friday, March 6th, 2015

Galleries, Libraries, Archives, and Museums (GLAM CC Licensing)

A very extensive list of galleries, libraries, archives, and museums (GLAM) that are using CC licensing.

A good resource to have at hand if you need to argue for CC licensing with your gallerys, library, archive, or museum.

I first saw this in a tweet by Adrianne Russell.

Update: Resource List for March 5 Open Licensing Online Program

Data Visualization as a Communication Tool

Friday, March 6th, 2015

Data Visualization as a Communication Tool by Susan [Gardner] Archambault, Joanne Helouvry, Bonnie Strohl, and Ginger Williams.


This paper provides a framework for thinking about meaningful data visualization in ways that can be applied to routine statistics collected by libraries. An overview of common data display methods is provided, with an emphasis on tables, scatter plots, line charts, bar charts, histograms, pie charts, and infographics. Research on “best practices” in data visualization design is presented as well as a comparison of free online data visualization tools. Different data display methods are best suited for different quantitative relationships. There are rules to follow for optimal data visualization design. Ten free online data visualization tools are recommended by the authors.

Good review of basic visualization techniques with an emphasis on library data. You don’t have to be in Tufte‘s league in order to make effective data visualizations.

D-Lib Magazine January/February 2015

Monday, January 19th, 2015

D-Lib Magazine January/February 2015

From the table of contents (see the original toc for abstracts):


2nd International Workshop on Linking and Contextualizing Publications and Datasets by Laurence Lannom, Corporation for National Research Initiatives

Data as “First-class Citizens” by Łukasz Bolikowski, ICM, University of Warsaw, Poland; Nikos Houssos, National Documentation Centre / National Hellenic Research Foundation, Greece; Paolo Manghi, Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Italy and Jochen Schirrwagen, Bielefeld University Library, Germany


Semantic Enrichment and Search: A Case Study on Environmental Science Literature by Kalina Bontcheva, University of Sheffield, UK; Johanna Kieniewicz and Stephen Andrews, British Library, UK; Michael Wallis, HR Wallingford, UK

A-posteriori Provenance-enabled Linking of Publications and Datasets via Crowdsourcing by Laura Drăgan, Markus Luczak-Rösch, Elena Simperl, Heather Packer and Luc Moreau, University of Southampton, UK; Bettina Berendt, KU Leuven, Belgium

A Framework Supporting the Shift from Traditional Digital Publications to Enhanced Publications by Alessia Bardi and Paolo Manghi, Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Italy

Science 2.0 Repositories: Time for a Change in Scholarly Communication by Massimiliano Assante, Leonardo Candela, Donatella Castelli, Paolo Manghi and Pasquale Pagano, Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Italy

Data Citation Practices in the CRAWDAD Wireless Network Data Archive by Tristan Henderson, University of St Andrews, UK and David Kotz, Dartmouth College, USA

A Methodology for Citing Linked Open Data Subsets by Gianmaria Silvello, University of Padua, Italy

Challenges in Matching Dataset Citation Strings to Datasets in Social Science by Brigitte Mathiak and Katarina Boland, GESIS — Leibniz Institute for the Social Sciences, Germany

Enabling Living Systematic Reviews and Clinical Guidelines through Semantic Technologies by Laura Slaughter; The Interventional Centre, Oslo University Hospital (OUS), Norway; Christopher Friis Berntsen and Linn Brandt, Internal Medicine Department, Innlandet Hosptial Trust and MAGICorg, Norway and Chris Mavergames, Informatics and Knowledge Management Department, The Cochrane Collaboration, Germany

Data without Peer: Examples of Data Peer Review in the Earth Sciences by Sarah Callaghan, British Atmospheric Data Centre, UK

The Tenth Anniversary of Assigning DOI Names to Scientific Data and a Five Year History of DataCite by Jan Brase and Irina Sens, German National Library of Science and Technology, Germany and Michael Lautenschlager, German Climate Computing Centre, Germany

New Events

N E W S   &   E V E N T S

In Brief: Short Items of Current Awareness

In the News: Recent Press Releases and Announcements

Clips & Pointers: Documents, Deadlines, Calls for Participation

Meetings, Conferences, Workshops: Calendar of Activities Associated with Digital Libraries Research and Technologies

The quality of D-Lib Magazine meets or exceeds the quality claimed by pay-per-view publishers.


Harvard Library adopts LibraryCloud

Wednesday, January 7th, 2015

Harvard Library adopts LibraryCloud by David Weinberger.

From the post:

According to a post by the Harvard Library, LibraryCloud is now officially a part of the Library toolset. It doesn’t even have the word “pilot” next to it. I’m very happy and a little proud about this.

LibraryCloud is two things at once. Internal to Harvard Library, it’s a metadata hub that lets lots of different data inputs be normalized, enriched, and distributed. As those inputs change, you can change LibraryCloud’s workflow process once, and all the apps and services that depend upon those data can continue to work without making any changes. That’s because LibraryCloud makes the data that’s been input available through an API which provides a stable interface to that data. (I am overstating the smoothness here. But that’s the idea.)

To the Harvard community and beyond, LibraryCloud provides open APIs to access tons of metadata gathered by Harvard Library. LibraryCloud already has metadata about 18M items in the Harvard Library collection — one of the great collections — including virtually all the books and other items in the catalog (nearly 13M), a couple of million of images in the VIA collection, and archives at the folder level in Harvard OASIS. New data can be added relatively easily, and because LibraryCloud is workflow based, that data can be updated, normalized and enriched automatically. (Note that we’re talking about metadata here, not the content. That’s a different kettle of copyrighted fish.)

LibraryCloud began as an idea of mine (yes, this is me taking credit for the idea) about 4.5 years ago. With the help of the Harvard Library Innovation Lab, which I co-directed until a few months ago, we invited in local libraries and had a great conversation about what could be done if there were an open API to metadata from multiple libraries. Over time, the Lab built an initial version of LibraryCloud primarily with Harvard data, but with scads of data from non-Harvard sources. (Paul Deschner, take many many bows. Matt Phillips, too.) This version of LibraryCloud — now called lilCloud — is still available and is still awesome.

Very impressive news from Harvard!

Plus, the LibraryCloud is open source!

Documentation. Well, that’s the future home of the documentation. For now, the current documentation is on Google Doc: LibraryCloud Item API


The LibraryCloud Item API provides access to metadata about items in the Harvard Library collections. For the purposes of this API, an “item” is the metadata describing a catalog record within the Harvard Library.


Serendipity in the Stacks:…

Wednesday, January 7th, 2015

Serendipity in the Stacks: Libraries, Information Architecture, and the Problems of Accidental Discovery by Patrick L. Carr.


Serendipity in the library stacks is generally regarded as a positive occurrence. While acknowledging its benefits, this essay draws on research in library science, information systems, and other fields to argue that, in two important respects, this form of discovery can be usefully framed as a problem. To make this argument, the essay examines serendipity both as the outcome of a process situated within the information architecture of the stacks and as a user perception about that outcome.

A deeply dissatisfying essay on serendipity as evidenced in the author’s conclusion that reads in part:

While acknowledging the validity of Morville’s points, I nevertheless believe that, along with its positive aspects, serendipity in the stacks can be usefully framed as a problem. From a process-based standpoint, serendipity is problematic because it is an indicator of a potential misalignment between user intention and process outcome. And, from a perception-based standpoint, serendipity is problematic because it can encourage user-constructed meanings for libraries that are rooted in opposition to change rather than in users’ immediate and evolving information needs.

To illustrate the “…potential misalignment between user intention and process outcome,” Carr uses the illustration of a user looking for a specific volume by call number but the absence of the book for its location, results in the discovery of an even more useful book nearby. That Carr describes as:

Even if this information were to prove to be more valuable to the user than the information in the book that was sought, the user’s serendipitous discovery nevertheless signifies a misalignment of user intention and process outcome.

Sorry, that went by rather quickly. If the user considers the discovery to be a favorable outcome, why should we take Carr’s word that it “signifies a misalignment of user intention and process outcome?” What other measure for success should an information retrieval system have other than satisfaction of its users? What other measure would be meaningful?

Carr refuses to consider how libraries could seem to maximize what is seen as a positive experience by users because:

By situating the library as a tool that functions to facilitate serendipitous discovery in the stacks, librarians risk also situating the library as a mechanism that functions as a symbolic antithesis to the tools for discovery that are emerging in online environments. In this way, the library could signify a kind of bastion against change. Rather than being cast as a vital tool for meeting discovery needs in emergent online environments, the library could be marginalized in a way that suggests to users that they perceive it as a means of retreat from online environments.

I don’t doubt the same people who think librarians are superflous since “everyone can find what they need on the Internet” would be quick to find libraries as being “bastion[s] against change.” For any number of reasons. But the opinions of semi-literates should not dictate library policy.

What Carr fails to take into account is that a stacks “environment,” which he concedes does facilitate serendipitous discovery, can be replicated in digital space.

For example, while it is currently a prototype, StackLife at Harvard is an excellent demonstration of a virtual stack environment.


Jonathan Zittrain, Vice-Dean for Library and Information Resources, Harvard Law School; Professor of Law at Harvard Law School and the Harvard Kennedy School of Government; Professor of Computer Science at the Harvard School of Engineering and Applied Sciences; Co-founder of the Berkman Center for Internet & Society, nominated StackLife for Stanford Prize for Innovation in Research Libraries, saying in part:

  • It always shows a book (or other item) in a context of other books.
  • That context is represented visually as a scrollable stack of items — a shelf rotated so that users can more easily read the information on the spines.
  • The stack integrates holdings from multiple libraries.
  • That stack is sorted by “StackScore,” a measure of how often the library’s community has used a book. At the Harvard Library installation, the computation includes ten year aggregated checkouts weighted by faculty, grad, or undergrad; number of holdings in the 73 campus libraries, times put on reserve, etc.
  • The visualization is simple and clean but also information-rich. (a) The horizontal length of the book reflects the physical book’s height. (b) The vertical height of the book in the stack represents its page count. (c) The depth of the color blue of the spine indicates its StackScore; a deeper blue means that the work is more often used by the community.
  • When clicked, a work displays its Library of Congress Subject Headings (among other metadata). Clicking one of those headings creates a new stack consisting of all the library’s items that share that heading.
  • If there is a Wikipedia page about that work, Stacklife also displays the Wikipedia categories on that page, and lets the user explore by clicking on them.
  • Clicking on a work creates an information box that includes bibliographic information, real-time availability at the various libraries, and, when available: (a) the table of contents; (b) a link to Google Books’ online reader; (c) a link to the Wikipedia page about that book; (d) a link to any National Public Radio audio about the work; (e) a link to the book’s page at Amazon.
  • Every author gets a page that shows all of her works in the library in a virtual stack. The user can click to see any of those works on a shelf with works on the same topic by other authors.
  • Stacklife is scalable, presenting enormous collections of items in a familiar way, and enabling one-click browsing, faceting, and subject-based clustering.

Does StackLife sound like a library “…that [is] rooted in opposition to change rather than in users’ immediate and evolving information needs.”

I can’t speak for you but it doesn’t sound that way to me. It sounds like a library that isn’t imposing its definition of satisfaction upon users (good for Harvard) and that is working to blend the familiar with new to the benefit of its users.

We can only hope that College & Research Libraries will have a response from the StackLife project to Carr’s essay in the same issue.

PS: If you have library friends who don’t read this blog, please forward a link to this post to their attention. I know they are consumed with their current tasks but the StackLife project is one they need to be aware of. Thanks!

I first saw the essay on Facebook in a posting by Simon St.Laurent.

Early English Books Online – Good News and Bad News

Friday, January 2nd, 2015

Early English Books Online

The very good news is that 25,000 volumes from the Early English Books Online collection have been made available to the public!

From the webpage:

The EEBO corpus consists of the works represented in the English Short Title Catalogue I and II (based on the Pollard & Redgrave and Wing short title catalogs), as well as the Thomason Tracts and the Early English Books Tract Supplement. Together these trace the history of English thought from the first book printed in English in 1475 through to 1700. The content covers literature, philosophy, politics, religion, geography, science and all other areas of human endeavor. The assembled collection of more than 125,000 volumes is a mainstay for understanding the development of Western culture in general and the Anglo-American world in particular. The STC collections have perhaps been most widely used by scholars of English, linguistics, and history, but these resources also include core texts in religious studies, art, women’s studies, history of science, law, and music.

Even better news from Sebastian Rahtz Sebastian Rahtz (Chief Data Architect, IT Services, University of Oxford):

The University of Oxford is now making this collection, together with Gale Cengage’s Eighteenth Century Collections Online (ECCO), and Readex’s Evans Early American Imprints, available in various formats (TEI P5 XML, HTML and ePub) initially via the University of Oxford Text Archive at, and offering the source XML for community collaborative editing via Github. For the convenience of UK universities who subscribe to JISC Historic Books, a link to page images is also provided. We hope that the XML will serve as the base for enhancements and corrections.

This catalogue also lists EEBO Phase 2 texts, but the HTML and ePub versions of these can only be accessed by members of the University of Oxford.

[Technical note]
Those interested in working on the TEI P5 XML versions of the texts can check them out of Github, via, where each of the texts is in its own repository (eg There is a CSV file listing all the texts at, and a simple Linux/OSX shell script to clone all 32853 unrestricted repositories at

Now for the BAD NEWS:

An additional 45,000 books:

Currently, EEBO-TCP Phase II texts are available to authorized users at partner libraries. Once the project is done, the corpus will be available for sale exclusively through ProQuest for five years. Then, the texts will be released freely to the public.

Can you guess why the public is barred from what are obviously public domain texts?

Because our funding is limited, we aim to key as many different works as possible, in the language in which our staff has the most expertise.

Academic projects are supposed to fund themselves and be self-sustaining. When anyone asks about sustainability of an academic project, ask them when the last time your countries military was “self sustaining?” The U.S. has spent $2.6 trillion on a “war on terrorism” and has nothing to show for it other than dead and injured military personnel, perversion of budgetary policies, and loss of privacy on a world wide scale.

It is hard to imagine what sort of life-time access for everyone on Earth could be secured for less than $1 trillion. No more special pricing and contracts if you are in countries A to Zed. Eliminate all that paperwork for publishers and to access all you need is a connection to the Internet. The publishers would have a guaranteed income stream, less overhead from sales personnel, administrative staff, etc. And people would have access (whether used or not) to educate themselves, to make new discoveries, etc.

My proposal does not involve payments to large military contractors or subversion of legitimate governments or imposition of American values on other cultures. Leaving those drawbacks to one side, what do you think about it otherwise?

Ferguson Municipal Public Library

Tuesday, November 25th, 2014

Ferguson Municipal Public Library

Ashley Ford tweeted that donations should be made to the Ferguson Municipal Public Library.

While schools are closed in Ferguson, the library has stayed open and has been a safe refuge.

Support the Ferguson Municipal Public Library as well as your own.

Libraries are where our tragedies, triumphs, and history live on for future generations.

Treasury Island: the film

Tuesday, November 25th, 2014

Treasury Island: the film by Lauren Willmott, Boyce Keay, and Beth Morrison.

From the post:

We are always looking to make the records we hold as accessible as possible, particularly those which you cannot search for by keyword in our catalogue, Discovery. And we are experimenting with new ways to do it.

The Treasury series, T1, is a great example of a series which holds a rich source of information but is complicated to search. T1 covers a wealth of subjects (from epidemics to horses) but people may overlook it as most of it is only described in Discovery as a range of numbers, meaning it can be difficult to search if you don’t know how to look. There are different processes for different periods dating back to 1557 so we chose to focus on records after 1852. Accessing these records requires various finding aids and multiple stages to access the papers. It’s a tricky process to explain in words so we thought we’d try demonstrating it.

We wanted to show people how to access these hidden treasures, by providing a visual aid that would work in conjunction with our written research guide. Armed with a tablet and a script, we got to work creating a video.

Our remit was:

  • to produce a video guide no more than four minutes long
  • to improve accessibility to these records through a simple, step-by–step process
  • to highlight what the finding aids and documents actually look like

These records can be useful to a whole range of researchers, from local historians to military historians to social historians, given that virtually every area of government action involved the Treasury at some stage. We hope this new video, which we intend to be watched in conjunction with the written research guide, will also be of use to any researchers who are new to the Treasury records.

Adding video guides to our written research guides are a new venture for us and so we are very keen to hear your feedback. Did you find it useful? Do you like the film format? Do you have any suggestions or improvements? Let us know by leaving a comment below!

This is a great illustration that data management isn’t something new. The Treasury Board has kept records since 1557 and has accumulated a rather extensive set of materials.

The written research guide looks interesting but since I am very unlikely to ever research Treasury Board records, I am unlikely to need it.

However, the authors have anticipated that someone might be interested in process of record keeping itself and so provided this additional reference:

Thomas L Heath, The Treasury (The Whitehall Series, 1927, GP Putnam’s Sons Ltd, London and New York)

That would be an interesting find!

I first saw this in a tweet by Andrew Janes.

On Excess: Susan Sontag’s Born-Digital Archive

Tuesday, October 28th, 2014

On Excess: Susan Sontag’s Born-Digital Archive by Jeremy Schmidt & Jacquelyn Ardam.

From the post:

In the case of the Sontag materials, the end result of Deep Freeze and a series of other processing procedures is a single IBM laptop, which researchers can request at the Special Collections desk at UCLA’s Research Library. That laptop has some funky features. You can’t read its content from home, even with a VPN, because the files aren’t online. You can’t live-Tweet your research progress from the laptop — or access the internet at all — because the machine’s connectivity features have been disabled. You can’t copy Annie Leibovitz’s first-ever email — “Mat and I just wanted to let you know we really are working at this. See you at dinner. xxxxxannie” (subject line: “My first Email”) — onto your thumb drive because the USB port is locked. And, clearly, you can’t save a new document, even if your desire to type yourself into recent intellectual history is formidable. Every time it logs out or reboots, the laptop goes back to ground zero. The folders you’ve opened slam shut. The files you’ve explored don’t change their “Last Accessed” dates. The notes you’ve typed disappear. It’s like you were never there.

Despite these measures, real limitations to our ability to harness digital archives remain. The born-digital portion of the Sontag collection was donated as a pair of external hard drives, and that portion is composed of documents that began their lives electronically and in most cases exist only in digital form. While preparing those digital files for use, UCLA archivists accidentally allowed certain dates to refresh while the materials were in “thaw” mode; the metadata then had to be painstakingly un-revised. More problematically, a significant number of files open as unreadable strings of symbols because the software with which they were created is long out of date. Even the fully accessible materials, meanwhile, exist in so many versions that the hapless researcher not trained in computer forensics is quickly overwhelmed.

No one would dispute the need for an authoritative copy of Sontag‘s archive, or at least as close to authoritative as humanly possible. The heavily protected laptop makes sense to me, assuming that the archive considers that to be the authoritative copy.

What has me puzzled, particularly since there are binary formats not recognized in the archive, is why isn’t a non-authoritative copy of the archive online. Any number of people may still possess the software necessary to read the files and/or be able to decrypt the file formats. That would be a net gain to the archive if recovery could be practiced on a non-authoritative copy. They may well encounter such files in the future.

After searching the Online Archive of California, I did encounter Finding Aid for the Susan Sontag papers, ca. 1939-2004 which reports:

Restrictions Property rights to the physical object belong to the UCLA Library, Department of Special Collections. Literary rights, including copyright, are retained by the creators and their heirs. It is the responsibility of the researcher to determine who holds the copyright and pursue the copyright owner or his or her heir for permission to publish where The UC Regents do not hold the copyright.

Availability Open for research, with following exceptions: Boxes 136 and 137 of journals are restricted until 25 years after Susan Sontag’s death (December 28, 2029), though the journals may become available once they are published.

Unfortunately, this finding aid does not mention Sontag’s computer or the transfer of the files to a laptop. A search of Melvyl (library catalog) finds only one archival collection and that is the one mentioned above.

I have written to the special collections library for clarification and will update this post when an answer arrives.

I mention this collection because of Sontag’s importance for a generation and because digital archives will soon be the majority of cases. One hopes the standard practice will be to donate all rights to an archival repository to insure its availability to future generations of scholars.

analyze the public libraries survey (pls) with r

Friday, October 24th, 2014

analyze the public libraries survey (pls) with r by Anthony Damico.

From the post:

each and every year, the institute of museum and library services coaxes librarians around the country to put down their handheld “shhhh…” sign and fill out a detailed online questionnaire about their central library, branch, even bookmobile. the public libraries survey (pls) is actually a census: nearly every public library in the nation responds annually. that microdata is waiting for you to check it out, no membership required. the american library association estimates well over one hundred thousand libraries in the country, but less than twenty thousand outlets are within the sample universe of this survey since most libraries in the nation are enveloped by some sort of school system. a census of only the libraries that are open to the general public, the pls typically hits response rates of 98% from the 50 states and dc. check that out.

A great way to practice your R skills!

Not to mention generating analysis to support your local library.

Medical Heritage Library (MHL)

Sunday, September 21st, 2014

Medical Heritage Library (MHL)

From the post:

The Medical Heritage Library (MHL) and DPLA are pleased to announce that MHL content can now be discovered through DPLA.

The MHL, a specialized research collection stored in the Internet Archive, currently includes nearly 60,000 digital rare books, serials, audio and video recordings, and ephemera in the history of medicine, public health, biomedical sciences, and popular medicine from the medical special collections of 22 academic, special, and public libraries. MHL materials have been selected through a rigorous process of curation by subject specialist librarians and archivists and through consultation with an advisory committee of scholars in the history of medicine, public health, gender studies, digital humanities, and related fields. Items, selected for their educational and research value, extend from 1235 (Liber Aristotil[is] de nat[u]r[a] a[nima]li[u]m ag[res]tium [et] marino[rum]), to 2014 (The Grog Issue 40 2014) with the bulk of the materials dating from the 19th century.

“The rich history of medicine content curated by the MHL is available for the first time alongside collections like those from the Biodiversity Heritage Library and the Smithsonian, and offers users a single access point to hundreds of thousands of scientific and history of science resources,” said DPLA Assistant Director for Content Amy Rudersdorf.

The collection is particularly deep in American and Western European medical publications in English, although more than a dozen languages are represented. Subjects include anatomy, dental medicine, surgery, public health, infectious diseases, forensics and legal medicine, gynecology, psychology, anatomy, therapeutics, obstetrics, neuroscience, alternative medicine, spirituality and demonology, diet and dress reform, tobacco, and homeopathy. The breadth of the collection is illustrated by these popular items: the United States Naval Bureau of Medical History’s audio oral history with Doctor Walter Burwell (1994) who served in the Pacific theatre during World War II and witnessed the first Japanese kamikaze attacks; History and medical description of the two-headed girl : sold by her agents for her special benefit, at 25 cents (1869), the first edition of Gray’s Anatomy (1858) (the single most-downloaded MHL text at more than 2,000 downloads annually), and a video collection of Hanna – Barbera Production Flintstones (1960) commercials for Winston cigarettes.

“As is clear from today’s headlines, science, health, and medicine have an impact on the daily lives of Americans,” said Scott H. Podolsky, chair of the MHL’s Scholarly Advisory Committee. “Vaccination, epidemics, antibiotics, and access to health care are only a few of the ongoing issues the history of which are well documented in the MHL. Partnering with the DPLA offers us unparalleled opportunities to reach new and underserved audiences, including scholars and students who don’t have access to special collections in their home institutions and the broader interested public.“

Quick links:

Digital Public Library of America

Internet Archive

Medical Heritage Library website

I remember the Flintstone commercials for Winston cigarettes. Not all that effective a campaign, I smoked Marboros (reds in a box) for almost forty-five (45) years. 😉

As old vices die out, new ones, like texting and driving take their place. On behalf of current and former smokers, I am confident that smoking was not a factor in 1,600,000 accidents per year and 11 teen deaths every day.

Libraries may digitize books without permission, EU top court rules [Nation-wide Site Licenses?]

Friday, September 19th, 2014

Libraries may digitize books without permission, EU top court rules by Loek Essers.

From the post:

European libraries may digitize books and make them available at electronic reading points without first gaining consent of the copyright holder, the highest European Union court ruled Thursday.

The Court of Justice of the European Union (CJEU) ruled in a case in which the Technical University of Darmstadt digitized a book published by German publishing house Eugen Ulmer in order to make it available at its electronic reading posts, but refused to license the publisher’s electronic textbooks.

A spot of good news to remember next on the next 9/11 anniversary. A Member State may authorise libraries to digitise, without the consent of the rightholders, books they hold in their collection so as to make them available at electronic reading points

Users can’t make copies onto a USB stick but under contemporary fictions about property rights represented in copyright statutes that isn’t surprising.

What is surprising is that nations have not yet stumbled upon the idea of nation-wide site licenses for digital materials.

A nation acquiring a site license the ACM Digital Library, IEEE, Springer and a dozen or so other resources/collections would have these positive impacts:

  1. Access to core computer science publications for everyone located in that nation
  2. Publishers would have one payor and could reduce/eliminate the staff that manage digital access subscriptions
  3. Universities and colleges would not require subscriptions nor the staff to manage those subscriptions (integration of those materials into collections would remain a library task)
  4. Simplify access software based on geographic IP location (fewer user/password issues)
  5. Universities and colleges could spend funds now dedicated to subscriptions for other materials
  6. Digitization of both periodical and monograph literature would be encouraged
  7. Avoids tiresome and not-likely-to-succeed arguments about balancing the public interest in IP rights discussions.

For me, #7 is the most important advantage of nation-wide licensing of digital materials. As you can tell by my reference to “contemporary fictions about property rights” I fall quite firmly on a particular side of the digital rights debate. However, I am more interested in gaining access to published materials for everyone than trying to convince others of the correctness of my position. Therefore, let’s adopt a new strategy: “Pay the man.”

As I outline above, there are obvious financial advantages to publishers from nation-wide site licenses, in the form of reduced internal costs, reduced infrastructure costs and a greater certainty in cash flow. There are advantages for the public as well as universities and colleges, so I would call that a win-win solution.

The Developing World Initiatives by Francis & Taylor is described as:

Taylor & Francis Group is committed to the widest distribution of its journals to non-profit institutions in developing countries. Through agreements with worldwide organisations, academics and researchers in more than 110 countries can access vital scholarly material, at greatly reduced or no cost.

Why limit access to materials to “non-profit institutions in developing countries?” Granting that the site-license fees for the United States would be higher than Liberia but the underlying principle is the same. The less you regulate access the simpler the delivery model and the higher the profit to the publisher. What publisher would object to that?

There are armies of clerks currently invested in the maintenance of one-off subscription models but the greater public interest in access to materials consistent with publisher IP rights should carry the day.

If Tim O’Reilly and friends are serious about changing access models to information, let’s use nation-wide site licenses to eliminate firewalls and make effective linking and transclusion a present day reality.

Publishers get paid, readers get access. It’s really that simple. Just on a larger scale than is usually discussed.

PS: Before anyone raises the issues of cost for national-wide site licenses, remember that the United States has spent more than $1 trillion in a “war” on terrorism that has made no progress in making the United States or its citizens more secure.

If the United Stated decided to pay Spinger Science+Business Media the €866m ($1113.31m) total revenue it made in 2012, for the cost of its ‘war” on terrorism, it could have purchased a site license to all Spinger Science+Business Media content for the entire United States for 898.47 years. (Check my math: 1,000,000,000,000 / 1,113,000,000 = 898.472.)

I first saw this in Nat Torkington’s Four short links: 15 September 2014.

Shanghai Library adds 2 million records to WorldCat…

Tuesday, September 16th, 2014

Shanghai Library adds 2 million records to WorldCat to share its collection with the world Compiled by Ming POON, Josephine SCHE, and Mi Chu WIENS (November, 2004).

From the post:

Shanghai Library, the largest public library in China and one of the largest libraries in the world, has contributed 2 million holdings to WorldCat, including some 770,000 unique bibliographic records, to share its collection worldwide.

These records, which represent books and journals published between 1911 and 2013, were loaded in WorldCat earlier this year. The contribution from Shanghai Library, an OCLC member since 1996, enhances the richness and depth of Chinese materials in WorldCat as well as the discoverability of these collections around the world.

“We are pleased to add Shanghai Library’s holdings to WorldCat, which is the global union catalog of library collections,” said Dr. Jianzhong Wu, Director, Shanghai Library “Shanghai is a renowned, global city, and the library should be as well. With WorldCat, we not only raise the visibility of our collection to a global level but we also share our national heritage and identity with other libraries and their users through the OCLC WorldShare Interlibrary Loan service.”

“The leadership of Shanghai Library has a bold global vision,” says Andrew H. Wang, Vice President, OCLC Asia Pacific. “The addition of Shanghai Library’s holdings and unique records enriches coverage of the Chinese collection in WorldCat for researchers everywhere.”

I don’t have a feel for how many unique Chinese bibliographic records are online but 770,000 sounds like a healthy addition.

You may also be interested in: Online Resources for Chinese Studies in North American Libraries.

Given the compilation date, 2004, I ran the W3C Link Checker on

You can review the results at:

Summary of results:

Code Occurrences What to do
(N/A) 6 The link was not checked due to robots exclusion rules. Check the link manually, and see also the link checker documentation on robots exclusion.
(N/A) 2 The hostname could not be resolved. Check the link for typos.
403 1 The link is forbidden! This needs fixing. Usual suspects: a missing index.html or Overview.html, or a missing ACL.
404 61 The link is broken. Double-check that you have not made any typo, or mistake in copy-pasting. If the link points to a resource that no longer exists, you may want to remove or fix the link.
500 5 This is a server side problem. Check the URI.

(emphasis added)

At a minimum, the broken links need to be corrected but updating the listing to include new resources would make a nice graduate student project.

I don’t have the background or language skills with Chinese resources to embark on such a project but would be happy to assist anyone who undertakes the task.

Cooper Hewitt, Color Interface

Tuesday, July 29th, 2014

From the about page:

Cooper Hewitt, Smithsonian Design Museum is the only museum in the nation devoted exclusively to historic and contemporary design. The Museum presents compelling perspectives on the impact of design on daily life through active educational and curatorial programming.

It is the mission of Cooper Hewitt’s staff and Board of Trustees to advance the public understanding of design across the thirty centuries of human creativity represented by the Museum’s collection. The Museum was founded in 1897 by Amy, Eleanor, and Sarah Hewitt—granddaughters of industrialist Peter Cooper—as part of The Cooper Union for the Advancement of Science and Art. A branch of the Smithsonian since 1967, Cooper-Hewitt is housed in the landmark Andrew Carnegie Mansion on Fifth Avenue in New York City.

I thought some background might be helpful because the Cooper Hewitt has a new interface:


Color, or colour, is one of the attributes we’re interested in exploring for collection browsing. Bearing in mind that only a fraction of our collection currently has images, here’s a first pass.

Objects with images now have up to five representative colors attached to them. The colors have been selected by our robotic eye machines who scour each image in small chunks to create color averages. These have then been harvested and “snapped” to the grid of 120 different colors — derived from the CSS3 palette and naming conventions — below to make navigation a little easier.

My initial reaction was to recall the old library joke where a patron comes to the circulation desk and doesn’t know a book’s title or author, but does remember it had a blue cover. 😉 At which point you wish Basil from Faulty Towers was manning the circulation desk. 😉

It may be a good idea with physical artifacts because color/colour is a fixed attribute that may be associated with a particular artifact.

If you know the collection, you can amuse yourself by trying to guess what objects will be returned for particular colors.

BTW, the collection is interlinked by people, roles, periods, types, countries. Very impressive!

Don’t miss the resources for developers at: and their GitHub account.

I first saw this in a tweet by Lyn Marie B.

PS: The use of people, roles, objects, etc. for browsing has a topic map-like feel. Since their data and other resources are downloadable, more investigation will follow.

Darwin’s ship library goes online

Wednesday, July 16th, 2014

Darwin’s ship library goes online by Dennis Normile.

From the post:

As Charles Darwin cruised the world on the HMS Beagle, he had access to an unusually well-stocked 400-volume library. That collection, which contained the observations of numerous other naturalists and explorers, has now been recreated online. As of today, all of more than 195,000 pages and 5000 illustrations from the works are available for the perusal of scholars and armchair naturalists alike, thanks to the Darwin Online project.

Perhaps it isn’t the amount of information you have available but how deeply you understand it that makes a difference.


Early Canadiana Online

Friday, May 23rd, 2014

Early Canadiana Online

From the webpage:

These collections contain over 80,000 rare books, magazines and government publications from the 1600s to the 1940s.

This rare collection of documentary heritage will be of interest to scholars, genealogists, history buffs and anyone who enjoys reading about Canada’s early days.

The Early Canadiana Online collection of rare books, magazines and government publications has over 80,000 titles (3,500,000 pages) and is growing. The collection includes material published from the time of the first European settlers to the first four decades of the 20th Century.

You will find books written in 21 languages including French, English, 10 First Nations languages and several European languages, Latin and Greek.

Every online collection such as this one, increases the volume of information that is accessible and also increases the difficulty of finding related information for any given subject. But the latter is such a nice problem to have!

I first saw this in a tweet from Lincoln Mullen.

APIs for Scholarly Resources

Friday, May 16th, 2014

APIs for Scholarly Resources

From the webpage:

APIs, short for application programming interface, are tools used to share content and data between software applications. APIs are used in a variety of contexts, but some examples include embedding content from one website into another, dynamically posting content from one application to display in another application, or extracting data from a database in a more programmatic way than a regular user interface might allow.

Many scholarly publishers, databases, and products offer APIs to allow users with programming skills to more powerfully extract data to serve a variety of research purposes. With an API, users might create programmatic searches of a citation database, extract statistical data, or dynamically query and post blog content.

Below is a list of commonly used scholarly resources at MIT that make their APIs available for use. If you have programming skills and would like to use APIs in your research, use the table below to get an overview of some available APIs.

If you have any questions or know of an API you would like to see include in this list, please contact Mark Clemente, Library Fellow for Scholarly Publishing and Licensing in the MIT Libraries (contact information at the bottom of this page).

A nice listing of scholarly resources with public APIs and your opportunity to contribute back to this listing with APIs that you discover.

Sadly, as far as I know (subject to your corrections), the ACM Digital Library has no public API.

Not all that surprising considering considering the other shortcomings of the ACM Digital Library interface. For example, you can only save items (their citations) to a binder one item at a time. Customer service will opine they have had this request before but no, you can’t contact the committee that makes decisions about Digital Library features. Nor will they tell you who is on that committee. Sounds like the current Whitehouse doesn’t it?

I first saw this in a tweet by Scott Chamberlain.

Codex Sinaiticus Added to Digitised Manuscripts

Tuesday, March 25th, 2014

Codex Sinaiticus Added to Digitised Manuscripts by Julian Harrison.

From the post (I have omitted the images, see the original post for those):

Codex Sinaiticus is one of the great treasures of the British Library. Written in the mid-4th century in the Eastern Mediterranean (possibly at Caesarea), it is one of the two oldest surviving copies of the Greek Bible, along with Codex Vaticanus, in Rome. Written in four narrow columns to the page (aside from in the Poetic books, in two columns), its visual appearance is particularly striking.

The significance of Codex Sinaiticus for the text of the New Testament is incalculable, not least because of the many thousands of corrections made to the manuscript between the 4th and 12th centuries.

The manuscript itself is now distributed between four institutions: the British Library, the Universitäts-Bibliothek at Leipzig, the National Library of Russia in St Petersburg, and the Monastery of St Catherine at Mt Sinai. Several years ago, these four institutions came together to collaborate on the Codex Sinaiticus Project, which resulted in full digital coverage and transcription of all extant parts of the manuscript. The fruits of these labours, along with many additional essays and scholarly resources, can be found on the Codex Sinaiticus website.

The British Library owns the vast majority of Codex Sinaiticus and only the British Library portion is being released as part of the Digitised Manuscripts project.

The world in which biblical scholarship is done has changed radically over the last 20 years.

This effort by the British Library should be applauded and supported.

Google Search Appliance and Libraries

Monday, March 24th, 2014

Using Google Search Appliance (GSA) to Search Digital Library Collections: A Case Study of the INIS Collection Search by Dobrica Savic.

From the post:

In February 2014, I gave a presentation at the conference on Faster, Smarter and Richer: Reshaping the library catalogue (FSR 2014), which was organized by the Associazione Italiana Biblioteche (AIB) and Biblioteca Apostolica Vaticana in Rome, Italy. My presentation focused on the experience of the International Nuclear Information System (INIS) in using Google Search Appliance (GSA) to search digital library collections at the International Atomic Energy Agency (IAEA). 

Libraries are facing many challenges today. In addition to diminished funding and increased user expectations, the use of classic library catalogues is becoming an additional challenge. Library users require fast and easy access to information resources, regardless of whether the format is paper or electronic. Google Search, with its speed and simplicity, has established a new standard for information retrieval which did not exist with previous generations of library search facilities. Put in a position of David versus Goliath, many small, and even larger libraries, are losing the battle to Google, letting many of its users utilize it rather than library catalogues.

The International Nuclear Information System (INIS)

The International Nuclear Information System (INIS) hosts one of the world's largest collections of published information on the peaceful uses of nuclear science and technology. It offers on-line access to a unique collection of 3.6 million bibliographic records and 483,000 full texts of non-conventional (grey) literature. This large digital library collection suffered from most of the well-known shortcomings of the classic library catalogue. Searching was complex and complicated, it required training in Boolean logic, full-text searching was not an option, and response time was slow. An opportune moment to improve the system came with the retirement of the previous catalogue software and the adoption of Google Search Appliance (GSA) as an organization-wide search engine standard.

To be completely honest, my first reaction wasn’t a favorable one.

But even the complete blog post does not do justice to the project in question.

Take a look at the slides, which include screen shots of the new interface before reaching an opinion.

Take this as a lesson on what your search interface should be offering by default.

There are always other screens you can fill with advanced features.

Cataloguing projects

Tuesday, March 11th, 2014

Cataloguing projects (UK National Archive)

From the webpage:

The National Archives’ Cataloguing Strategy

The overall objective of our cataloguing work is to deliver more comprehensive and searchable catalogues, thus improving access to public records. To make online searches work well we need to provide adequate data and prioritise cataloguing work that tackles less adequate descriptions. For example, we regard ranges of abbreviated names or file numbers as inadequate.

I was lead to this delightful resource by a tweet from David Underdown, advising that his presentation from National Catalogue Day in 2013 was now onlne.

His presentation along with several others and reports about projects in prior years are available at this projects page.

I thought the presentation titled: Opening up of Litigation: 1385-1875 by Amanda Bevan and David Foster, was quite interesting in light of various projects that want to create new “public” citation systems for law and litigation.

I haven’t seen such a proposal yet that gives sufficient consideration to the enormity of what do you do with old legal materials?

The litigation presentation could be a poster child for topic maps.

I am looking forward to reading the other presentations as well.

UX Crash Course: 31 Fundamentals

Monday, February 3rd, 2014

UX Crash Course: 31 Fundamentals by Joel Marsh.

From the post:

Basic UX Principles: How to get started

The following list isn’t everything you can learn in UX. It’s a quick overview, so you can go from zero-to-hero as quickly as possible. You will get a practical taste of all the big parts of UX, and a sense of where you need to learn more. The order of the lessons follows a real-life UX process (more or less) so you can apply these ideas as-you-go. Each lesson also stands alone, so feel free to bookmark them as a reference!

Main topics:

Introduction & Key Ideas

How to Understand Users

Information Architecture

Visual Design Principles

Functional Layout Design

User Psychology

Designing with Data

Users who interact with designers, librarians and library students come to mind, would do well to review these posts. If nothing else, it will give users better questions to ask vendors about their web interface design process.

Wellcome Images

Tuesday, January 21st, 2014

Thousands of years of visual culture made free through Wellcome Images

From the post:

We are delighted to announce that over 100,000 high resolution images including manuscripts, paintings, etchings, early photography and advertisements are now freely available through Wellcome Images.

Drawn from our vast historical holdings, the images are being released under the Creative Commons Attribution (CC-BY) licence.

This means that they can be used for commercial or personal purposes, with an acknowledgement of the original source (Wellcome Library, London). All of the images from our historical collections can be used free of charge.

The images can be downloaded in high-resolution directly from the Wellcome Images website for users to freely copy, distribute, edit, manipulate, and build upon as you wish, for personal or commercial use. The images range from ancient medical manuscripts to etchings by artists such as Vincent Van Gogh and Francisco Goya.

The earliest item is an Egyptian prescription on papyrus, and treasures include exquisite medieval illuminated manuscripts and anatomical drawings, from delicate 16th century fugitive sheets, whose hinged paper flaps reveal hidden viscera to Paolo Mascagni’s vibrantly coloured etching of an ‘exploded’ torso.

Other treasures include a beautiful Persian horoscope for the 15th-century prince Iskandar, sharply sketched satires by Rowlandson, Gillray and Cruikshank, as well as photography from Eadweard Muybridge’s studies of motion. John Thomson’s remarkable nineteenth century portraits from his travels in China can be downloaded, as well a newly added series of photographs of hysteric and epileptic patients at the famous Salpêtrière Hospital

Semantics or should I say semantic confusion is never far away. While viewing an image of Gladstone as Scrooge:


When “search by keyword” offered “colonies,” I assumed either the colonies of the UK at the time.

Imagine my surprise when among other images, Wellcome Images offered:

petri dish

The search by keywords had found fourteen petri dish images, three images of Batavia, seven maps of India (salt, leporsy), one half naked woman being held down, and the Gladstone image from earlier.

About what one expects from search these days but we could do better. Much better.

I first saw this in a tweet by Neil Saunders.

Library of Congress RSS Feeds

Thursday, January 16th, 2014

Library of Congress RSS Feeds

Quite by accident I stumbled upon a list of Library of Congress RSS feeds and email subscriptions in the following categories:

  • Collections Preservation
  • Copyright
  • Digital Preservation
  • Events
  • Folklife
  • For Librarians
  • For Teachers
  • General News
  • Hispanic Division
  • Legal
  • Music Division
  • Journalism
  • Poetry & Literature
  • Science
  • Site Updates
  • Veterans History
  • Visual Resources

If you think about it, libraries are aggregations of diverse semantics from across many domains.

Quite at odds with any particular cultural monotone of the day.

Subversive places. That must be why I like them so much!

…Digital Asset Sustainability…

Thursday, January 16th, 2014

A National Agenda Bibliography for Digital Asset Sustainability and Preservation Cost Modeling by Butch Lazorchak.

From the post:

The 2014 National Digital Stewardship Agenda, released in July 2013, is still a must-read (have you read it yet?). It integrates the perspective of dozens of experts to provide funders and decision-makers with insight into emerging technological trends, gaps in digital stewardship capacity and key areas for development.

The Agenda suggests a number of important research areas for the digital stewardship community to consider, but the need for more coordinated applied research in cost modeling and sustainability is high on the list of areas prime for research and scholarship.

The section in the Agenda on “Applied Research for Cost Modeling and Audit Modeling” suggests some areas for exploration:

“Currently there are limited models for cost estimation for ongoing storage of digital content; cost estimation models need to be robust and flexible. Furthermore, as discussed below…there are virtually no models available to systematically and reliably predict the future value of preserved content. Different approaches to cost estimation should be explored and compared to existing models with emphasis on reproducibility of results. The development of a cost calculator would benefit organizations in making estimates of the long‐term storage costs for their digital content.”

In June of 2012 I put together a bibliography of resources touching on the economic sustainability of digital resources. I’m pleasantly surprised as all the new work that’s been done in the meantime, but as the Agenda suggests, there’s more room for directed research in this area. Or perhaps, as Paul Wheatley suggests in this blog post, what’s really needed are coordinated responses to sustainability challenges that build directly on this rich body of work, and that effectively communicate the results out to a wide audience.

I’ve updated the bibliography, hoping that researchers and funders will explore the existing body of projects, approaches and research, note the gaps in coverage suggested by the Agenda and make efforts to address the gaps in the near future through new research or funding.

I count some seventy-one (71) items in this bibliography.

Digital preservation is an area where topic maps can help maintain access over changing customs and vocabularies, but just like migrating from one form of media to another, it doesn’t happen by itself.

Nor is there any “free lunch,” because the data is culturally important, rare, etc. Someone has to pay the bill for it being preserved.

Having the cost of semantic access included in digital preservation would not hurt the cause of topic maps.