Archive for the ‘Digital Library’ Category

Humanities Digital Library [A Ray of Hope]

Friday, January 13th, 2017

Humanities Digital Library (Launch Event)

From the webpage:

17 Jan 2017, 18:00 to 17 Jan 2017, 19:00


IHR Wolfson Conference Suite, NB01/NB02, Basement, IHR, Senate House, Malet Street, London WC1E 7HU


6-7pm, Tuesday 17 January 2017

Wolfson Conference Suite, Institute of Historical Research

Senate House, Malet Street, London, WC1E 7HU

About the Humanities Digital Library

The Humanities Digital Library is a new Open Access platform for peer reviewed scholarly books in the humanities.

The Library is a joint initiative of the School of Advanced Study, University of London, and two of the School’s institutes—the Institute of Historical Research and the Institute of Advanced Legal Studies.

From launch, the Humanities Digital Library offers scholarly titles in history, law and classics. Over time, the Library will grow to include books from other humanities disciplines studied and researched at the School of Advanced Study. Partner organisations include the Royal Historical Society whose ‘New Historical Perspectives’ series will appear in the Library, published by the Institute of Historical Research.

Each title is published as an open access PDF, with copies also available to purchase in print and EPUB formats. Scholarly titles come in several formats—including monographs, edited collections and longer and shorter form works.
(emphasis in the original)

Timely evidence that not everyone in the UK is barking mad! “Barking mad” being the only explanation I can offer for the Investigatory Powers Bill.

I won’t be attending but if you can, do and support the Humanities Digital Library after it opens.

Digital Humanities In the Library

Sunday, July 31st, 2016

Digital Humanities In the Library / Of the Library: A dh+lib Special Issue

A special issue of dh + lib introduced by Sarah Potvin, Thomas Padilla and Caitlin Christian-Lamb in their essay: Digital Humanities In the Library / Of the Library, saying:

What are the points of contact between digital humanities and libraries? What is at stake, and what issues arise when the two meet? Where are we, and where might we be going? Who are “we”? By posing these questions in the CFP for a new dh+lib special issue, the editors hoped for sharp, provocative meditations on the state of the field. We are proud to present the result, ten wide-ranging contributions by twenty-two authors, collectively titled “Digital Humanities In the Library / Of the Library.”

We make the in/of distinction pointedly. Like the Digital Humanities (DH), definitions of library community are typically prefigured by “inter-” and “multi-” frames, rendered as work and values that are interprofessional, interdisciplinary, and multidisciplinary. Ideally, these characterizations attest to diversified yet unified purpose, predicated on the application of disciplinary expertise and metaknowledge to address questions that resist resolution from a single perspective. Yet we might question how a combinatorial impulse obscures the distinct nature of our contributions and, consequently, our ability to understand and respect individual agency. Working across the similarly encompassing and amorphous contours of the Digital Humanities compels the library community to reckon with its composite nature.

All of the contributions merit your attention but I was especially taken by: When Metadata Becomes Outreach: Indexing, Describing, and Encoding For DH by Emma Annette Wilson and Mary Alexander has this gem that will resonate with topic map fans:

DH projects require high-quality metadata in order to thrive, and the bigger the project, the more important that metadata becomes to make data discoverable, navigable, and open to computational analysis. The functions of all metadata are to allow our users to identify and discover resources through records acting as surrogates of resources, and to discover similarities, distinctions, and other nuances within single texts or across a corpus. High quality metadata brings standardization to the project by recording elements’ definitions, obligations, repeatability, rules for hierarchical structure, and attributes. Input guidelines and the use of controlled vocabularies bring consistencies that promote findability for researchers and users alike.

Modulo my reservations about the data/metadata distinction depending upon a point of view and all of them being subjects in any event, its hard to think of a clearer statement of the value that a topic map could bring to a DH project.

Consistencies can peacefully co-exist with with historical or present-day inconsistencies, at least so long as you are using a topic map.

I commend the entire issue to your for reading!

How do you skim through a digital book?

Sunday, June 19th, 2016

How do you skim through a digital book? by Chloe Roberts.

From the post:

We’ve had a couple of digitised books that proved really popular with online audiences. Perhaps partly reflecting the interests of the global population, they’ve been about prostitutes and demons.

I’ve been especially interested in how people have interacted with these popular digitised books. Imagine how you’d pick up a book to look at in a library or bookshop. Would you start from page one, laboriously working through page by page, or would you flip through it, checking for interesting bits? Should we expect any different behaviour when people use a digital book?

We collect data on aggregate (nothing personal or trackable to our users) about what’s being asked of our digitised items in the viewer. With such a large number of views of these two popular books, I’ve got a big enough dataset to get an interesting idea of how readers might be using our digitised books.

Focusing on ‘Compendium rarissimum totius Artis Magicae sistematisatae per celeberrimos Artis hujus Magistros. Anno 1057. Noli me tangere’ (the 18th century one about demons) I’ve mapped the number of page views (horizontal axis) against page number (vertical axis, with front cover at the top), and added coloured bands to represent what’s on those pages.

Chole captured and then analyzed the reading behavior of readers on two very popular electronic titles.

She explains her second observation:

Observation 2: People like looking at pictures more than text

by suggesting the text being in Latin and German may explain the fondness for the pictures.

Perhaps, but I have heard the same observation made about Playboy magazine. 😉

From a documentation/training perspective, Chole’s technique, for digital training materials, could provide guidance on:

  • Length of materials
  • Use of illustrations
  • Organization of materials
  • What material is habitually unread?

If critical material isn’t being read, exhorting newcomers to read more carefully, is not the answer.

If security and/or on-boarding reading isn’t happening, as shown by reader behavior, that’s your fault, not the readers.

Your call, successful staff and customers or failing staff and customers you can blame for security faults and declining sales.

Choose carefully.

Almost a Topic Map? Or Just a Mashup?

Thursday, April 9th, 2015

WikipeDPLA by Eric Phetteplace.

From the webpage:

See relevant results from the Digital Public Library of America on any Wikipedia article. This extension queries the DPLA each time you visit a Wikipedia article, using the article’s title, redirects, and categories to find relevant items. If you click a link at the top of the article, it loads in a series of links to the items. The original code behind WikiDPLA was written at LibHack, a hackathon at the American Library Association’s 2014 Midwinter Meeting in Philadelphia:

Google Chrome App Home Page

GitHub page

Wikipedia:The Wikipedia Library/WikipeDPLA

How you resolve the topic map versus mashup question depends on how much precision you expect from a topic map. While knowing additional places to search is useful, I never have a problem with assembling more materials than can be read in the time allowed. On the other hand, some people may need more prompting than others, so I can’t say that general references are out of bounds.

Assuming you were maintaining data sets with locally unique identifiers, using a modification of this script to query an index of all local scripts (say Pig scripts) to discover other scripts using the same data could be quite useful.

BTW, you need to have a Wikipedia account and be logged in for the extension to work. Or at least that was my experience.


Qatar Digital Library

Tuesday, October 28th, 2014

New Qatar Digital Library Offers Readers Unrivalled Collection of Precious Heritage Material

From the post:

The Qatar Digital Library which provides new public access to over half a million pages of precious historic archive and manuscript material has been launched today thanks to the British Library-Qatar Foundation Partnership project. This incredible resource makes documents and other items relating to the modern history of Qatar, the Gulf region and beyond, fully accessible and free of charge to researchers and the general public through a state-of-the-art online portal.

In line with the principles of the Qatar National Vision 2030, which aims to preserve the nation’s heritage and enhance Arab and Islamic values and identity, the launch of the Qatar Digital Library supports QF’s aim of unlocking human potential for the benefit of Qatar and the world.

Qatar National Library, a member of Qatar Foundation, has a firm commitment to preserving and showcasing Qatar’s heritage and promoting education and community development by sharing knowledge and providing resources to students, researchers, and the wider community.

With Qatar Foundation’s support, an expert, technical team has been preserving and digitising materials from the UK’s India Office Records archives over the past two years in order to be shared publicly on the portal owned and managed by Qatar National Library.

The Qatar Digital Library provides online access to over 475,000 pages from the India Office Records that date from the mid-18th century to 1951, and relate to modern historic events in Qatar, the Gulf and the Middle East region.

In addition, the Qatar Digital Library shares 25,000 pages of medieval Arab Islamic sciences manuscripts, historical maps, photographs and sound recordings.

These precious materials are being made available online for the first time. The Qatar Digital Library provides clear descriptions of the digitised materials in Arabic and English, and can be accessed for personal and research use from anywhere free of charge.

The Qatar Digital Library (homepage).

Simply awesome!

A great step towards unlocking the riches of Arab scholarship.

I first saw this in British Library Launches Qatar Digital Library by Africa S. Hands.

Data Blog Aggregation – Coffeehouse

Thursday, October 2nd, 2014


From the about page:

Coffeehouse aggregates posts about data management from around the internet.

The idea for this site draws inspiration from other aggregators such as Ecobloggers and R-Bloggers.

Coffeehouse is a project of DataONE, the Data Observation Network for Earth.

Posts are lightly curated. That is, all posts are brought in, but if we see posts that aren’t on topic, we take them down from this blog. They are not of course taken down from the original poster, just this blog.

Recently added data blogs:

Archive and Data Management Training Center

We believe that the character and structure of the social science research environment determines attitudes to re-use.

We also believe a healthy research environment gives researchers incentives to confidently create re-usable data, and for data archives and repositories to commit to supporting data discovery and re-use through data enhancement and long-term preservation.

The purpose of our center is to ensure excellence in the creation, management, and long-term preservation of research data. We promote the adoption of standards in research data management and archiving to support data availability, re-use, and the repurposing of archived data.

Our desire is to see the European research area producing quality data with wide and multipurpose re-use value. By supporting multipurpose re-use, we want to help researchers, archives and repositories realize the intellectual value of public investment in academic research. (From the “about” page for the Archive and Data Management Training Center website but representative of the blog as well)

Data Ab Initio

My name is Kristin Briney and I am interested in all things relating to scientific research data.

I have been in love with research data since working on my PhD in Physical Chemistry, when I preferred modeling and manipulating my data to actually collecting it in the lab (or, heaven forbid, doing actual chemistry). This interest in research data led me to a Master’s degree in Information Studies where I focused on the management of digital data.

This blog is something I wish I had when I was a practicing scientist: a resource to help me manage my data and navigate the changing landscape of research dissemination.

Digital Library Blog (Stanford)

The latest news and milestones in the development of Stanford’s digital library–including content, new services, and infrastructure development.

Dryad News and Views

Welcome to Dryad news and views, a blog about news and events related to the Dryad digital repository. Subscribe, comment, contribute– and be sure to Publish Your Data!

Dryad is a curated general-purpose repository that makes the data underlying scientific publications discoverable, freely reusable, and citable. Any journal or publisher that wishes to encourage data archiving may refer authors to Dryad. Dryad welcomes data submissions related to any published, or accepted, peer reviewed scientific and medical literature, particularly data for which no specialized repository exists.

Journals can support and facilitate their authors’ data archiving by implementing “submission integration,” by which the journal manuscript submission system interfaces with Dryad. In a nutshell: the journal sends automated notifications to Dryad of new manuscripts, which enables Dryad to create a provisional record for the article’s data, thereby streamlining the author’s data upload process. The published article includes a link to the data in Dryad, and Dryad links to the published article.

The Dryad documentation site provides complete information about Dryad and the submission integration process.

Dryad staff welcome all inquiries. Thank you.


The data deluge refers to the increasingly large and complex data sets generated by researchers that must be managed by their creators with “industrial-scale data centres and cutting-edge networking technology” (Nature 455) in order to provide for use and re-use of the data.

The lack of standards and infrastructure to appropriately manage this (often tax-payer funded) data requires data creators, data scientists, data managers, and data librarians to collaborate in order to create and acquire the technology required to provide for data use and re-use.

This blog is my way of sorting through the technology, management, research and development that have come together to successfully solve the data deluge. I will post and discuss both current and past R&D in this area. I welcome any comments.

There are fourteen (14) data blogs to date feeding into Coffeehouse. Unlike some data blog aggregations, ads do not overwhelm content at Coffeehouse.

If you have a data blog, please consider adding it to Coffeehouse. Suggest that other data bloggers do the same.

Digital Libraries For Musicology

Thursday, May 15th, 2014

The 1st International Digital Libraries for Musicology workshop (DLfM 2014)

12th September 2014 (full day), London, UK

in conjunction with the ACM/IEEE Digital Libraries conference 2014

From the call for papers:


Many Digital Libraries have long offered facilities to provide multimedia content, including music. However there is now an ever more urgent need to specifically support the distinct multiple forms of music, the links between them, and the surrounding scholarly context, as required by the transformed and extended methods being applied to musicology and the wider Digital Humanities.

The Digital Libraries for Musicology (DLfM) workshop presents a venue specifically for those working on, and with, Digital Library systems and content in the domain of music and musicology. This includes Music Digital Library systems, their application and use in musicology, technologies for enhanced access and organisation of musics in Digital Libraries, bibliographic and metadata for music, intersections with music Linked Data, and the challenges of working with the multiple representations of music across large-scale digital collections such as the Internet Archive and HathiTrust.


Paper submission deadline: 27th June 2014 (23:59 UTC-11)
Notification of acceptance: 30th July 2014
Registration deadline for one author per paper: 11th August 2014 (14:00 UTC)
Camera ready submission deadline: 11th August 2014 (14:00 UTC)

If you want a feel for the complexity of music as a retrieval subject, consult the various proposals at: Music markup languages, which are only some of the possible music encoding languages.

It is hard to say which domains are more “complex” than others in terms of encoding and subject identity, but it is safe to say that music falls towards the complex end of the scale. (sorry)

I first saw this in a tweet by Misanderasaurus Rex.

Yet Another Giant List of Digitised Manuscript Hyperlinks

Tuesday, January 21st, 2014

Yet Another Giant List of Digitised Manuscript Hyperlinks

From the post:

A new year, a newly-updated list of digitised manuscript hyperlinks! This master list contains everything that has been digitised up to this point by the Medieval and Earlier Manuscripts department, complete with hyperlinks to each record on our Digitised Manuscripts site. We’ll have another list for you in three months; you can download the current version here: Download BL Medieval and Earlier Digitised Manuscripts Master List 14.01.13. Have fun!

I count 921 digitized manuscripts, with more on the way!

A highly selective sampling:

That leaves 917 manuscripts for you to explore! With more on the way!

CAUTION! When I try to use Chrome on Ubuntu to access these links, I get: “This webpage has a redirect loop.” The same links work fine in Firefox. I have posted a comment about this issue to the post. Will update when I have more news. If your experience is same/different let me know. Just curious.



Vote by midnight January 26, 2014 to promote the Medieval Manuscripts Blog.

Vote for Medieval Manuscripts Blog in the UK Blog Awards

…Digital Asset Sustainability…

Thursday, January 16th, 2014

A National Agenda Bibliography for Digital Asset Sustainability and Preservation Cost Modeling by Butch Lazorchak.

From the post:

The 2014 National Digital Stewardship Agenda, released in July 2013, is still a must-read (have you read it yet?). It integrates the perspective of dozens of experts to provide funders and decision-makers with insight into emerging technological trends, gaps in digital stewardship capacity and key areas for development.

The Agenda suggests a number of important research areas for the digital stewardship community to consider, but the need for more coordinated applied research in cost modeling and sustainability is high on the list of areas prime for research and scholarship.

The section in the Agenda on “Applied Research for Cost Modeling and Audit Modeling” suggests some areas for exploration:

“Currently there are limited models for cost estimation for ongoing storage of digital content; cost estimation models need to be robust and flexible. Furthermore, as discussed below…there are virtually no models available to systematically and reliably predict the future value of preserved content. Different approaches to cost estimation should be explored and compared to existing models with emphasis on reproducibility of results. The development of a cost calculator would benefit organizations in making estimates of the long‐term storage costs for their digital content.”

In June of 2012 I put together a bibliography of resources touching on the economic sustainability of digital resources. I’m pleasantly surprised as all the new work that’s been done in the meantime, but as the Agenda suggests, there’s more room for directed research in this area. Or perhaps, as Paul Wheatley suggests in this blog post, what’s really needed are coordinated responses to sustainability challenges that build directly on this rich body of work, and that effectively communicate the results out to a wide audience.

I’ve updated the bibliography, hoping that researchers and funders will explore the existing body of projects, approaches and research, note the gaps in coverage suggested by the Agenda and make efforts to address the gaps in the near future through new research or funding.

I count some seventy-one (71) items in this bibliography.

Digital preservation is an area where topic maps can help maintain access over changing customs and vocabularies, but just like migrating from one form of media to another, it doesn’t happen by itself.

Nor is there any “free lunch,” because the data is culturally important, rare, etc. Someone has to pay the bill for it being preserved.

Having the cost of semantic access included in digital preservation would not hurt the cause of topic maps.


German Digital Library releases API

Thursday, December 5th, 2013

German Digital Library releases API by Lieke Ploeger.

From the post:

Last month the German Digital Library (Deutsche Digitale Bibliothek – DDB) made a promising step forward toward further opening up their data by releasing its API (Application Programming Interface) to the public. This API provides access to all the metadata of the DDB released under a CC0 license, which is the predominant share. The release of this API opens up a wide range of possibilities for users to build applications, create combinations with other data or include the German digitised cultural heritage on other platforms. In the future, the DDB also plans to organize a programming competition for API applications as well as a series of workshops for developers.

The official press release.

Technical documentation on the API (German).

A good excuse for you to brush up on your German. Besides, not all of it is in German.

Texas Conference on Digital Libraries 2013

Wednesday, June 5th, 2013

Texas Conference on Digital Libraries 2013

Abstracts and in many cases presentations from the Texas Conference on Digital Libraries 2013.

A real treasure trove on digital libraries projects and issues.

Library: A place where IR isn’t limited by software.

Interpreting the knowledge map of digital library research (1990–2010)

Tuesday, May 14th, 2013

Interpreting the knowledge map of digital library research (1990–2010) by Son Hoang Nguyen and Gobinda Chowdhury. (Nguyen, S. H. and Chowdhury, G. (2013), Interpreting the knowledge map of digital library research (1990–2010). J. Am. Soc. Inf. Sci., 64: 1235–1258. doi: 10.1002/asi.22830)


A knowledge map of digital library (DL) research shows the semantic organization of DL research topics and also the evolution of the field. The research reported in this article aims to find the core topics and subtopics of DL research in order to build a knowledge map of the DL domain. The methodology is comprised of a four-step research process, and two knowledge organization methods (classification and thesaurus building) were used. A knowledge map covering 21 core topics and 1,015 subtopics of DL research was created and provides a systematic overview of DL research during the last two decades (1990–2010). We argue that the map can work as a knowledge platform to guide, evaluate, and improve the activities of DL research, education, and practices. Moreover, it can be transformed into a DL ontology for various applications. The research methodology can be used to map any human knowledge domain; it is a novel and scientific method for producing comprehensive and systematic knowledge maps based on literary warrant.

This is a an impressive piece of work and likely to be read by librarians, particularly digital librarians.

That restricted readership is unfortunate because anyone building a knowledge (topic) map will benefit from the research methodology detailed in this article.

… Preservation and Stewardship of Scholarly Works, 2012 Supplement

Tuesday, March 19th, 2013

Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works, 2012 Supplement by Charles W. Bailey, Jr.

From the webpage:

In a rapidly changing technological environment, the difficult task of ensuring long-term access to digital information is increasingly important. The Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works, 2012 Supplement presents over 130 English-language articles, books, and technical reports published in 2012 that are useful in understanding digital curation and preservation. This selective bibliography covers digital curation and preservation copyright issues, digital formats (e.g., media, e-journals, research data), metadata, models and policies, national and international efforts, projects and institutional implementations, research studies, services, strategies, and digital repository concerns.

It is a supplement to the Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works, which covers over 650 works published from 2000 through 2011. All included works are in English. The bibliography does not cover conference papers, digital media works (such as MP3 files), editorials, e-mail messages, letters to the editor, presentation slides or transcripts, unpublished e-prints, or weblog postings.

The bibliography includes links to freely available versions of included works. If such versions are unavailable, italicized links to the publishers' descriptions are provided.

Links, even to publisher versions and versions in disciplinary archives and institutional repositories, are subject to change. URLs may alter without warning (or automatic forwarding) or they may disappear altogether. Inclusion of links to works on authors' personal websites is highly selective. Note that e-prints and published articles may not be identical.

The bibliography is available under a Creative Commons Attribution-NonCommercial 3.0 Unported License.

Supplement to “the” starting point for research on digital curation.

International Conference on Theory and Practice of Digital Libraries (TPDL)

Monday, February 18th, 2013

International Conference on Theory and Practice of Digital Libraries (TPDL)

Valletta, Malta, September 22-26, 2013. I thought that would get your attention. Details follow.


Full and Short papers, Posters, Panels, and Demonstrations deadline: March 23, 2013

Workshops and Tutorials proposals deadline: March 4, 2013

Doctoral Consortium papers submission deadline: June 2, 2013

Notification of acceptance for Papers, Posters, and Demonstrations: May 20, 2013

Notification of acceptance for Panels, Workshops and Tutorials: April 22, 2013

Doctoral Consortium acceptance notification: June 24, 2013

Camera ready versions: June 9, 2013

End of early registration: July 31, 2013

Conference dates: September 22-26, 2013

The general theme of the conference is “Sharing meaningful information,” a theme reflected in the topics for conference submissions:

General areas of interests include, but are not limited to, the following topics, organized in four categories, according to a conceptualization that coincides with the four arms of the Maltese Cross:


  • Information models
  • Digital library conceptual models and formal issues
  • Digital library 2.0
  • Digital library education curricula
  • Economic and legal aspects (e.g. rights management) landscape for digital libraries
  • Theoretical models of information interaction and organization
  • Information policies
  • Studies of human factors in networked information
  • Scholarly primitives
  • Novel research tools and methods with emphasis on digital humanities
  • User behavior analysis and modeling
  • Social-technical perspectives of digital information


  • Digital library architectures
  • Cloud and grid deployments
  • Federation of repositories
  • Collaborative and participatory information environments
  • Data storage and indexing
  • Big data management
  • e-science, e-government, e-learning, cultural heritage infrastructures
  • Semi structured data
  • Semantic web issues in digital libraries
  • Ontologies and knowledge organization systems
  • Linked Data and its applications


  • Metadata schemas with emphasis to metadata for composite content (Multimedia, geographical, statistical data and other special content formats)
  • Interoperability and Information integration
  • Digital Curation and related workflows
  • Preservation, authenticity and provenance
  • Web archiving
  • Social media and dynamically generated content for particular uses/communities (education, science, public, etc.)
  • Crowdsourcing
  • 3D models indexing and retrieval
  • Authority management issues


  • Information Retrieval and browsing
  • Multilingual and Multimedia Information Retrieval
  • Personalization in digital libraries
  • Context awareness in information access
  • Semantic aware services
  • Technologies for delivering/accessing digital libraries, e.g. mobile devices
  • Visualization of large-scale information environments
  • Evaluation of online information environments
  • Quality metrics
  • Interfaces to digital libraries
  • Data mining/extraction of structure from networked information
  • Social networks analysis and virtual organizations
  • Traditional and alternative metrics of scholarly communication
  • Mashups of resources

Do you know if there are plans for recording presentations?

Given the location and diminishing travel funding, an efficient way to increase the impact of the presentations.


Monday, November 19th, 2012


From the about page:

D-Lib Magazine is an electronic publication with a focus on digital library research and development, including new technologies, applications, and contextual social and economic issues. D-Lib Magazine appeals to a broad technical and professional audience. The primary goal of the magazine is timely and efficient information exchange for the digital library community to help digital libraries be a broad interdisciplinary field, and not a set of specialties that know little of each other.

I am about to post concerning an article in D-Lib and realized I don’t have a blog entry on D-Lib!

Not that it is topic map specific but it is digital library specific, with all the issues that entails. Remarkably similar to the issues any topic map author or software will face.

D-Lib has proven what many of us suspected:

The quality of content is not related to the medium of delivery.


….Comparing Digital Preservation Glossaries [Why Do We Need Common Vocabularies?]

Friday, August 10th, 2012

From AIP to Zettabyte: Comparing Digital Preservation Glossaries

Emily Reynolds (2012 Junior Fellow) writes:

As we mentioned in our introductory post last month, the OSI Junior Fellows are working on a project involving a draft digital preservation policy framework. One component of our work is revising a glossary that accompanies the framework. We’ve spent the last two weeks poring through more than two dozen glossaries relating to digital preservation concepts to locate and refine definitions to fit the terms used in the document.

We looked at dictionaries from well-established archival entities like the Society of American Archivists, as well as more strictly technical organizations like the Internet Engineering Task Force. While some glossaries take a traditional archival approach, others were more technical; we consulted documents primarily focusing on electronic records, archives, digital storage and other relevant fields. Because of influential frameworks like the OAIS Reference Model, some terms were defined similarly across the glossaries that we looked at. But the variety in the definitions for other terms points to the range of practitioners discussing digital preservation issues, and highlights the need for a common vocabulary. Based on what we found, that vocabulary will have to be broadly drawn and flexible to meet different kinds of requirements.

OSI = Office of Strategic Initiatives (Library of Congress)

Not to be overly critical, but I stumble over:

Because of influential frameworks like the OAIS Reference Model, some terms were defined similarly across the glossaries that we looked at. But the variety in the definitions for other terms points to the range of practitioners discussing digital preservation issues, and highlights the need for a common vocabulary.

Why does a “variety in the definitions for other terms…highlight[s] the need for a common vocabulary?”

I take it as a given that we have diverse vocabularies.

And that attempts at “common” vocabularies succeed in creating yet another “diverse” vocabulary.

So, why would anyone looking at “diverse” vocabularies jump to the conclusion that a “common” vocabulary is required?

Perhaps what is missing is the definition of the problem presented by “diverse” vocabularies.

Hard to solve a problem if you don’t know it is. (Hasn’t stopped some people that I know but that is a story for another day.)

I put it to you (and in your absence I will answer, so answer quickly):

What is the problem (or problems) presented by diverse vocabularies? (Feel free to use examples.)

Or if you prefer, Why do we need common vocabularies?

Evolutionary Subject Tagging in the Humanities…

Saturday, December 3rd, 2011

Evolutionary Subject Tagging in the Humanities; Supporting Discovery and Examination in Digital Cultural Landscapes by JackAmmerman, Vika Zafrin, Dan Benedetti, Garth W. Green.


In this paper, the authors attempt to identify problematic issues for subject tagging in the humanities, particularly those associated with information objects in digital formats. In the third major section, the authors identify a number of assumptions that lie behind the current practice of subject classification that we think should be challenged. We move then to propose features of classification systems that could increase their effectiveness. These emerged as recurrent themes in many of the conversations with scholars, consultants, and colleagues. Finally, we suggest next steps that we believe will help scholars and librarians develop better subject classification systems to support research in the humanities.

Truly remarkable piece of work!

Just to entice you into reading the entire paper, the authors challenge the assumption that knowledge is analogue. Successfully in my view but I already held that position so I was an easy sell.

BTW, if you are in my topic maps class, this paper is required reading. Summarize what you think are the strong/weak points of the paper in 2 to 3 pages.


Friday, August 19th, 2011


From the Introduction:

The MONK Project provides access to the digitized texts described above along with tools to enable literary research through the discovery, exploration, and visualization of patterns. Users typically start a project with one of the toolsets that has been predefined by the MONK team. Each toolset is made up of individual tools (e.g. a search tool, a browsing tool, a rating tool, and a visualization), and these tools are applied to worksets of texts selected by the user from the MONK datastore. Worksets and results can be saved for later use or modification, and results can be exported in some standard formats (e.g., CSV files).

The public data set:

This instance of the MONK Project includes approximately 525 works of American literature from the 18th and 19th centuries, and 37 plays and 5 works of poetry by William Shakespeare provided by the scholars and libraries at Northwestern University, Indiana University, the University of North Carolina at Chapel Hill, and the University of Virginia. These texts are available to all users, regardless of institutional affiliation.

Digging a bit further:

Each of these texts is normalized (using Abbot, a complex XSL stylesheet) to a TEI schema designed for analytic purposes (TEI-A), and each text has been “adorned” (using Morphadorner) with tokenization, sentence boundaries, standard spellings, parts of speech and lemmata, before being ingested (using Prior) into a database that provides Java access methods for extracting data for many purposes, including searching for objects; direct presentation in end-user applications as tables, lists, concordances, or visualizations; getting feature counts and frequencies for analysis by data-mining and other analytic procedures; and getting tokenized streams of text for working with n-gram and other colocation analyses, repetition analyses, and corpus query-language pattern-matching operations. Finally, MONK’s quantitative analytics (naive Bayesian analysis, support vector machines, Dunnings log likelihood, and raw frequency comparisons), are run through the SEASR environment.

Here’s my topic maps question: So, how do I reliably combine the results from a subfield that uses a different vocabulary than my own? For that matter, how do I discover it in the first place?

I think the MONK project is quite remarkable but lament the impending repetition of research across such a vast archive simply because it is unknown or expressed a “foreign” tongue.

Bridging the Gulf:…

Saturday, April 30th, 2011

Bridging the Gulf: Communication and Information in Society, Technology, and Work

October 9-13, 2011, New Orleans, Louisiana

From the website:

The ASIST Annual Meeting is the main venue for disseminating research centered on advances in the information sciences and related applications of information technology.

ASIST 2011 builds on the success of the 2010 conference structure and will have the integrated program that is an ASIST strength. This will be achieved using the six reviewing tracks pioneered in 2010, each with its own committee of respected reviewers to ensure that the conference meets your high expectations for standards and quality. These reviewers, experts in their fields, will assist with a rigorous peer-review process.

Important Dates:

  1. Papers, Panels, Workshops & Tutorials
    • Deadline for submissions: May 31
    • Notification to authors: June 28
    • Final copy: July 15
  2. Posters, Demos & Videos:
    • Deadline for submissions: July 1
    • Notification to authors: July 20
    • Final copy: July 27

One of the premier technical conferences for librarians and information professionals in the United States.

The track listings are:

  • Track 1 – Information Behaviour
  • Track 2 – Knowledge Organization
  • Track 3 – Interactive Information & Design
  • Track 4 – Information and Knowledge Management
  • Track 5 – Information Use
  • Track 6 – Economic, Social, and Political Issues

A number of opportunities for topic map based presentations.

The conference being located in New Orleans is yet another reason to attend! The food, music, and street life has to be experienced to be believed. No description would be adequate.

Journal of Digital Information

Friday, March 25th, 2011

Journal of Digital Information

Publishing papers on the management, presentation and uses of information in digital environments, JoDI is a peer-reviewed Web journal supported by Texas A&M University Libraries.

First publishing papers in 1997, the Journal of Digital Information is an electronic-only, peer-reviewed journal covering the broad topics related to digital libraries, hypertext and hypermedia systems, and the issues of digital information. JoDI is supported by the Texas A&M University Libraries through the Digital Initiatives, Research and Technology group, and hosted by the Texas Digital Library.

Looks like an interesting venue to explore for material on digital libraries.

ACM Digital Library for Computing Professionals

Wednesday, January 12th, 2011

ACM Digital Library for Computing Professionals

The ACM has released a new version of it digital library, and, is offering a free three-month trial of it.

From the announcement:

  • Reorganized author profile pages that present a snapshot of author contributions and metrics of author influence by monitoring publication and citation counts and download usage volume
  • Broadened citation pages for individual articles with tabs for metadata and links to facilitate exploration and discovery of the depth of content in the DL
  • Enhanced interactivity tools such as RSS feeds, bibliographic exports, and social network channels to retrieve data, promote user engagement, and introduce user content
  • Redesigned binders for creating personal, annotatable collections of bibliographies or reading lists, and sharing them with ACM and non-ACM members, or exporting them into standard authoring tools like self-generated virtual PDF publications
  • Expanded table-of-contents opt-in service for all publications in the DL—from ACM and other publishers—that alerts users via email and RSS feeds to new issues of journals, magazines, newsletters, and proceedings.

I mention it here for a couple of reasons:

1) For resources on computing, whether contemporary or older materials, I can’t think of a better starting place for research. I am here more often than not.

2) It sets a benchmark for what is available in terms of digital libraries. If you are going to use topic maps to build a digital library, what would you do better?

CASPAR (Cultural, Artistic, and Scientific Knowledge for Preservation, Access and Retrieval)

Saturday, October 23rd, 2010

CASPAR (Cultural, Artistic, and Scientific Knowledge for Preservation, Access and Retrieval).

From the website:

CASPAR methodological and technological solution:

  • is compliant to the OAIS Reference Model – the main standard of reference in digital preservation
  • is technology-neutral: the preservation environment could be implemented using any kind of emerging technology
  • adopts a distributed, asynchronous, loosely coupled architecture and each key component is self-contained and portable: it may be deployed without dependencies on different platform and framework
  • is domain independent: it could be applied with low additional effort to multiple domains/contexts.
  • preserves knowledge and intelligibility, not just the “bits”
  • guarantees the integrity and identity of the information preserved as well as the protection of digital rights

FYI: OAIS Reference Model

As a librarian, you will be confronted with claims similar to these in vendor literature, grant applications and other marketing materials.


  1. Pick one of these claims. What documentation/software produced by the project would you review to evaluate the claim you have chosen?
  2. What other materials do you think would be relevant to your review?
  3. Perform the actual review (10 – 15 pages, with citations, project)

Union Catalogs of Learning Objects: Why Not?

Friday, October 8th, 2010

Union Catalogs of Learning Objects: Why Not? Author(s): Ana M.B. Pavani Keywords: Metadata – Learning Objects – Digital Libraries – Union Catalogs – Open Archives Inititative Protocol for Metadata Harvesting


This work presents a combined view of digital libraries, union catalogs and digital learning materials; union catalogs of metadata of ETD – Electronic Theses and Dissertations are shown as a paradigm. From this integrated view, and based on the existing ETD solution, it suggests that union catalogs of learning objects (digital learning materials with independent identities) be implemented with the participation of institutions worldwide. Open and free software solutions, and training are part of the overall proposed strategy.

More of a call to action than a specific proposal.

Worth reading to be reminded how important it is to share resources.

Even if, like the first cataloging venture in the 13th century, the work of sharing will never be done.