Archive for the ‘Library software’ Category

Black Womxn Authors, Library of Congress and MarcXML (Part 2)

Thursday, April 20th, 2017

(After writing this post I got a message from Clifford Anderson on a completely different way to approach the Marc to XML problem. A very neat way. But, I thought the directions on installing MarcEdit on Ubuntu 16.04 would be helpful anyway. More on Clifford’s suggestion to follow.)

If your just joining, read Black Womxn Authors, Library of Congress and MarcXML (Part 1) for the background on why this flurry of installation is at all meaningful!

The goal is to get a working copy of MarcEdit installed on my Ubuntu 16.04 machine.

MarcEdit Linux Installation Instructions reads in part:

Installation Steps:

  1. Download the MarcEdit app bundle. This file has been zipped to reduce the download size.
  2. Unzip the file and open the MarcEdit folder. Find the Install.txt file and read it.
  3. Ensure that you have the Mono framework installed. What is Mono? Mono is an open source implementation of Microsoft’s .NET framework. The best way to describe it is that .NET is very Java-like; it’s a common runtime that can work across any platform in which the framework has been installed. There are a number of ways to get the Mono framework — for MarcEdit’s purposes, it is recommended that you download and install the official package available from the Mono Project’s website. You can find the Mac OSX download here:
  4. Run MarEdit via the command-line using mono MarcEdit.exe from within the MarcEdit directory.

Well, sort of. 😉

First, you need to go to the Mono Project Download page. From there, under Xamarin packages, follow Debian, Ubuntu, and derivatives.

There is a package for Ubuntu 16.10, but it’s Mono 4.2.1. By installing the Xamarin packages, I am running Mono 4.7.0. Your call but as a matter of habit, I run the latest compatible packages.

Updating your package lists for Debian, Ubuntu, and derivatives:

Add the Mono Project GPG signing key and the package repository to your system (if you don’t use sudo, be sure to switch to root):

sudo apt-key adv --keyserver hkp:// --recv-keys 3FA7E0328081BFF6A14DA29AA6A19B38D3D831EF

echo "deb wheezy main" | sudo tee /etc/apt/sources.list.d/mono-xamarin.list

And for Ubuntu 16.10:

echo "deb wheezy-apache24-compat main" | sudo tee -a /etc/apt/sources.list.d/mono-xamarin.list

Now run:

sudo apt-get update

The Usage section suggests:

The package mono-devel should be installed to compile code.

The package mono-complete should be installed to install everything – this should cover most cases of “assembly not found” errors.

The package referenceassemblies-pcl should be installed for PCL compilation support – this will resolve most cases of “Framework not installed: .NETPortable” errors during software compilation.

The package ca-certificates-mono should be installed to get SSL certificates for HTTPS connections. Install this package if you run into trouble making HTTPS connections.

The package mono-xsp4 should be installed for running ASP.NET applications.

Find and select mono-complete first. Most decent package managers will show dependencies that will be installed. Add any of these that were missed.

Do follow the hints here to verify that Mono is working correctly.

Are We There Yet?

Not quite. It was at this point that I unpacked and discovered there is no “Install.txt file.” Rather there is a linux_install.txt, which reads:

a) Ensure that the dependencies have been installed
1) Dependency list:
i) MONO 3.4+ (Runtime plus the System.Windows.Forms library [these are sometimes separate])
ii) YAZ 5 + YAZ 5 develop Libraries + YAZ++ ZOOM bindings
iii) ZLIBC libraries
iV) libxml2/libxslt libraries
b) Unzip
c) On first run:
a) mono MarcEdit.exe
b) Preferences tab will open, click on other, and set the following two values:
i) Temp path: /tmp/
ii) MONO path: [to your full mono path]

** For Z39.50 Support
d) Yaz.Sharp.dll.config — ensure that the dllmap points to the correct version of the shared libyaz object.
e) main_icon.bmp can be used for a desktop icon

Opps! Without unzipping, you won’t see the dependencies:

ii) YAZ 5 + YAZ 5 develop Libraries + YAZ++ ZOOM bindings
iii) ZLIBC libraries
iV) libxml2/libxslt libraries

The YAZ site has a readme file for Ubuntu, but here is the very abbreviated version:

sudo apt-key add indexdata.asc

echo "deb xenial main" | sudo tee -a /etc/apt/sources.list
echo "deb-src xenial main" | sudo tee -a /etc/apt/sources.list

(That sequence only works for Ubuntu xenial. See the readme file for other versions.)

Of course:

sudo apt-get update

As of of today, you are looking for yaz 5.21.0-1 and libyaz5-dev 5.21.0-1.

Check for and/or install ZLIBC and libxml2/libxslt libraries.

Personal taste but I reboot at this point to make sure all the libraries re-load to the correct versions, etc. Should work without rebooting but that’s up to you.

Fire it up with

mono MarcEdit.ext

Choose Locations (not Other) and confirm “Set Temporary Path:” is /tmp/ and MONO Path (the location of mono, try which mono, input the results and select OK.

I did the install on Sunday evening and so after all this, the software on loading announces it has been ungraded! Yes, while I was installing all the dependencies, a new and improved version of MarcEdit was posted.

The XML extraction is a piece of cake so I am working on the XQuery on the resulting MarcXML records for part 3.

Black Womxn Authors, Library of Congress and MarcXML (Part 1)

Monday, April 17th, 2017

This adventure started innocently enough with the 2017 Womxn of Color Reading Challenge by Der Vang. As an “older” White male Southerner working in technology, I don’t encounter works by womxn of color unless it is intentional.

The first book, “A book that became a movie,” was easy. I read the deeply moving Beloved by Toni Morrison. I recommend reading a non-critical edition before you read a critical one. Let Morrison speak for herself before you read others offering their views on the story.

The second book, “A book that came out the year you were born,” have proven to be more difficult. Far more difficult. You see I think Der Vang was assuming a reading audience younger than I am, for which womxn of color authors would not be difficult to find. That hasn’t proven to be the case for me.

I searched the usual places but likely collections did not denote an author’s gender or race. The Atlanta-Fulton Public Library reference service came riding to the rescue after I had exhausted my talents with this message:

‘Attached is a “List of Books Published by Negro Writers in 1954 and Late 1953” (pp. 10-12) by Blyden Jackson, IN “The Blithe Newcomers: Resume of Negro Literature in 1954: Part I,” Phylon v.16, no.1 (1st Quarter 1955): 5-12, which has been annotated with classifications (Biography) or subjects (Poetry). Thirteen are written by women; however, just two are fiction. The brief article preceding the list does not mention the books by the women novelists–Elsie Jordan (Strange Sinner) or Elizabeth West Wallace (Scandal at Daybreak). No Part II has been identified. And AARL does not own these two. Searching AARL holdings in Classic Catalog by year yields seventeen by women but no fiction. Most are biographies. Two is better than none but not exactly a list.

A Celebration of Women Writers – African American Writers (
) seems to have numerous [More Information] links which would possibly allow the requestor to determine the 1954 novelists among them.’
(emphasis in original)

Using those two authors/titles as leads, I found in the Library of Congress online catalog:
Jordan, Elsie. Strange sinner / Elsie Jordan. 1st ed. New York : Pageant, c1954.
172 p. ; 21 cm.
PZ4.J818 St
Wallace, Elizabeth West. [from old catalog] Scandal at daybreak. [1st ed.] New York, Pageant Press [1954]
167 p. 21 cm.
PZ4.W187 Sc

Checking elsewhere, both titles are out of print, although I did see one (1) copy of Elise Jordan’s Strange Sinner for $100. I think I have located a university with a digital scan but will have to report back on that later.

Since both Jordan and Wallace published with Pageant Press the same year, I reasoned that other womxn of color may have also published with them and that could lead me to more accessible works.

Experienced librarians are no doubt already grinning because if you search for “Pageant Press,” with the Library of Congress online catalog, you get 961 “hits,” displayed 25 “hits” at a time. Yes, you can set the page to return 100 “hits at a time, but not while you have sort by date of publication selected. 🙁

That is you can display 100 “hits” per page in no particular order, or, you can display the “hits” in date of publication order, but only 25 “hits” at a time. (Or at least that was my experience, please correct me if that’s wrong.)

But, with the 100 “hits” per page, you can “save as,” but only as Marc records, Unicode (UTF-8) or not. No MarcXML format.

In the response to my query about the same, the response from the Library of Congress reads:

At the moment we have no plans to provide an option to save search results as MARCXML. We will consider it for future development projects.

I can understand that in the current climate in Washington but a way to convert Marc records to the easier (in my view) to manipulate MarcXMLformat, would be a real benefit to readers and researchers alike.

Fortunately there is a solution, MarcEdit.

From the webpage:

This LibGuide attempts to document the features of MarcEdit, which was developed by Terry Reese. It is open source software designed to facilitate the harvesting, editing, and creation of MARC records. This LibGuide was adapted from a standalone document, and while the structure of the original document has been preserved in this LibGuide, it is also available in PDF form at the link below. The original documentation and this LibGuide were written with the idea that it would be consulted on an as-needed basis. As a result, the beginning steps of many processes may be repeated within the same page or across the LibGuide as a whole so that users would be able to understand the entire process of implementing a function within MarcEdit without having to consult other guides to know where to begin. There are also screenshots that are repeated throughout, which may provide a faster reference for users to understand what steps they may already be familiar with.

Of course, installing MarcEdit on Ubuntu, isn’t a straightforward task. But I have 961 Marc records and possibly more that would be very useful in MarcXML. Tomorrow I will document the installation steps I followed with Ubuntu 16.04.

PS: I’m not ignoring the suggested A Celebration of Women Writers – African American Writers (
. But I have gotten distracted by the technical issue of how to convert all the holdings at the Library of Congress for a publisher into MarcXML. Suggestions on how to best use this resource?

NCSU Offers Social Media Archives Toolkit for Libraries [Defeating Censors]

Sunday, February 28th, 2016

NCSU Offers Social Media Archives Toolkit for Libraries by Matt Enis.

From the post:

North Carolina State University (NCSU) Libraries recently debuted a free, web-based social media archives toolkit designed to help cultural heritage organizations develop social media collection strategies, gain knowledge of ways in which peer institutions are collecting similar content, understand current and potential uses of social media content by researchers, assess the legal and ethical implications of archiving this content, and develop techniques for enriching collections of social media content at minimal cost. Tools for building and enriching collections include NCSU’s Social Media Combine—which pre-assembles the open source Social Feed Manager, developed at George Washington University for Twitter data harvesting, and NCSU’s own open source Lentil program for Instagram—into a single package that can be deployed on Windows, OSX, and Linux computers.

“By harvesting social media data (such as Tweets and Instagram photos), based on tags, accounts, or locations, researchers and cultural heritage professionals are able to develop accurate historical assessments and democratize access to archival contributors, who would otherwise never be represented in the historical record,” NCSU explained in an announcement.

“A lot of activity that used to take place as paper correspondence is now taking place on social media—the establishment of academic and artistic communities, political organizing, activism, awareness raising, personal and professional interactions,” Jason Casden, interim associate head of digital library initiatives, told LJ. Historians and researchers will want to have access to this correspondence, but unlike traditional letters, this content is extremely ephemeral and can’t be collected retroactively like traditional paper-based collections.

“So we collect proactively—as these events are happening or shortly after,” Casden explained.

I saw this too late today to install but I’m sure I will be posting about it later this week!

Do you see the potential of such tooling for defeating would-be censors of Twitter and other social media?

More on that later this week as well.

Harvard Library adopts LibraryCloud

Wednesday, January 7th, 2015

Harvard Library adopts LibraryCloud by David Weinberger.

From the post:

According to a post by the Harvard Library, LibraryCloud is now officially a part of the Library toolset. It doesn’t even have the word “pilot” next to it. I’m very happy and a little proud about this.

LibraryCloud is two things at once. Internal to Harvard Library, it’s a metadata hub that lets lots of different data inputs be normalized, enriched, and distributed. As those inputs change, you can change LibraryCloud’s workflow process once, and all the apps and services that depend upon those data can continue to work without making any changes. That’s because LibraryCloud makes the data that’s been input available through an API which provides a stable interface to that data. (I am overstating the smoothness here. But that’s the idea.)

To the Harvard community and beyond, LibraryCloud provides open APIs to access tons of metadata gathered by Harvard Library. LibraryCloud already has metadata about 18M items in the Harvard Library collection — one of the great collections — including virtually all the books and other items in the catalog (nearly 13M), a couple of million of images in the VIA collection, and archives at the folder level in Harvard OASIS. New data can be added relatively easily, and because LibraryCloud is workflow based, that data can be updated, normalized and enriched automatically. (Note that we’re talking about metadata here, not the content. That’s a different kettle of copyrighted fish.)

LibraryCloud began as an idea of mine (yes, this is me taking credit for the idea) about 4.5 years ago. With the help of the Harvard Library Innovation Lab, which I co-directed until a few months ago, we invited in local libraries and had a great conversation about what could be done if there were an open API to metadata from multiple libraries. Over time, the Lab built an initial version of LibraryCloud primarily with Harvard data, but with scads of data from non-Harvard sources. (Paul Deschner, take many many bows. Matt Phillips, too.) This version of LibraryCloud — now called lilCloud — is still available and is still awesome.

Very impressive news from Harvard!

Plus, the LibraryCloud is open source!

Documentation. Well, that’s the future home of the documentation. For now, the current documentation is on Google Doc: LibraryCloud Item API


The LibraryCloud Item API provides access to metadata about items in the Harvard Library collections. For the purposes of this API, an “item” is the metadata describing a catalog record within the Harvard Library.


Google Search Appliance and Libraries

Monday, March 24th, 2014

Using Google Search Appliance (GSA) to Search Digital Library Collections: A Case Study of the INIS Collection Search by Dobrica Savic.

From the post:

In February 2014, I gave a presentation at the conference on Faster, Smarter and Richer: Reshaping the library catalogue (FSR 2014), which was organized by the Associazione Italiana Biblioteche (AIB) and Biblioteca Apostolica Vaticana in Rome, Italy. My presentation focused on the experience of the International Nuclear Information System (INIS) in using Google Search Appliance (GSA) to search digital library collections at the International Atomic Energy Agency (IAEA). 

Libraries are facing many challenges today. In addition to diminished funding and increased user expectations, the use of classic library catalogues is becoming an additional challenge. Library users require fast and easy access to information resources, regardless of whether the format is paper or electronic. Google Search, with its speed and simplicity, has established a new standard for information retrieval which did not exist with previous generations of library search facilities. Put in a position of David versus Goliath, many small, and even larger libraries, are losing the battle to Google, letting many of its users utilize it rather than library catalogues.

The International Nuclear Information System (INIS)

The International Nuclear Information System (INIS) hosts one of the world's largest collections of published information on the peaceful uses of nuclear science and technology. It offers on-line access to a unique collection of 3.6 million bibliographic records and 483,000 full texts of non-conventional (grey) literature. This large digital library collection suffered from most of the well-known shortcomings of the classic library catalogue. Searching was complex and complicated, it required training in Boolean logic, full-text searching was not an option, and response time was slow. An opportune moment to improve the system came with the retirement of the previous catalogue software and the adoption of Google Search Appliance (GSA) as an organization-wide search engine standard.

To be completely honest, my first reaction wasn’t a favorable one.

But even the complete blog post does not do justice to the project in question.

Take a look at the slides, which include screen shots of the new interface before reaching an opinion.

Take this as a lesson on what your search interface should be offering by default.

There are always other screens you can fill with advanced features.

UX Crash Course: 31 Fundamentals

Monday, February 3rd, 2014

UX Crash Course: 31 Fundamentals by Joel Marsh.

From the post:

Basic UX Principles: How to get started

The following list isn’t everything you can learn in UX. It’s a quick overview, so you can go from zero-to-hero as quickly as possible. You will get a practical taste of all the big parts of UX, and a sense of where you need to learn more. The order of the lessons follows a real-life UX process (more or less) so you can apply these ideas as-you-go. Each lesson also stands alone, so feel free to bookmark them as a reference!

Main topics:

Introduction & Key Ideas

How to Understand Users

Information Architecture

Visual Design Principles

Functional Layout Design

User Psychology

Designing with Data

Users who interact with designers, librarians and library students come to mind, would do well to review these posts. If nothing else, it will give users better questions to ask vendors about their web interface design process.

British Library Labs – Competition 2013

Sunday, May 5th, 2013

British Library Labs – Competition 2013

Deadline for entry: Wednesday 26 June , 2013 (midnight GMT)

From the webpage:

We want you to propose an innovative and transformative project using the British Library’s digital collections and if your idea is chosen, the Labs team will work with you to make it happen and you could win a prize of up to £3,000.

From the digitisation of thousands of books, newspapers and manuscripts, the curation of UK websites, bird sounds or location data for our maps, over the last two decades we’ve been faithfully amassing a vast and wide-ranging number of digital collections for the nation. What remains elusive, however, is understanding what researchers need in place in order to unlock the potential for new discoveries within these fascinating and diverse sets of digital content.

The Labs competition is designed to attract scholars, explorers, trailblazers and software developers who see the potential for new and innovative research and development opportunities lurking within these immense digital collections. Through soliciting imaginative and transformative projects utilising this content you will be giving us a steer as to the types of new processes, platforms, arrangements, services and tools needed to make it more accessible. We’ll even throw the Library’s resources behind you to make your idea a reality.

Numerous ways to get support for developing your idea before submission.

In terms of PR for your solution (hopefully topic maps based) do note:


Winners will get direct curatorial and financial support for completing their project from the Labs team, which may involve an expenses paid residency at the British Library for a mutually agreed period of time (dependent on the winners’ circumstances, the winning ideas, access to resources and budget allowing).

  • Winners will receive £3000 for completing their project
  • Runners-up will receive £1000 for completing their project

The work will take place between between Saturday July 6 and Monday 4 November, 2013, with the completed projects being showcased during November 2013 when prizes will be awarded.

What happens to your ideas?

All ideas will be posted on the Labs website after they have been judged. All project ideas submitted for the competition can continue to be worked on and where possible the Labs team will provide support (time and resources permitting). Well developed projects will be showcased together with the competition winners during November 2013.

This is also a good excuse to spend more time at the British Library website. I don’t spend nearly enough time there myself.

NewGenLib FOSS Library Management System [March 15th, 2013 Webinar]

Monday, March 11th, 2013

NewGenLib FOSS Library Management System

From the post:

EIFL-FOSS is organising a free webinar on NewGenLib (NGL), an open-source Library Management System (ILS). The event will take place this coming Friday, March 15th, 2013 at 09.00-10.00 GMT / UK time (10.00-11.00 CET / Rome, Italy). The session is open to anyone to attend but places are limited, so registration is recommended.

NGL, an outcome of collaboration between Verus and Kesavan Institute of Information and Knowledge management, has been implemented in over 30 countries in at least 4 different languages supporting fully international library metadata standards. The software runs on Windows or Linux and is designed to work equally well in one single library as it does across a dispersed network of libraries.

URL for more info:

As you already know, there is no shortage of vendor-based and open source library information systems.

That diversity is an opportunity to show how topic maps can make distinct systems appear as one, while retaining their separate character.

NewGenLib Open Source…Update! [Library software]

Wednesday, January 9th, 2013

NewGenLib Open Source releases version 3.0.4 R1 Update 1

From the blog:

The NewGenLib Open Source has announced the release of a new version 3.0.4 R1 Update 1. NewGenLib is an integrated library management system developed by Verus Solutions in conjunction with Kesaran Institute of Information and Knowledge Management in India. The software has the modules acquisitions, technical processing, serials management, circulation, administration, and MIS reports and OPAC.

What’s new in the Update?

This new update comes with a basket of additional features and enhancements, these include:

  • Full text indexing and searching of digital attachments: NewGenLib now uses Apache Tika. With this new tool not only catalogue records but their digital attachments and URLs are indexed. Now you can also search based on the content of your digital attachments
  • Web statistics: The software facilitates the generation of statistics on OPAC usage by having an allowance for Google Analytics code.
  • User ratings of Catalogue Records: An enhancement for User reviews is provided in OPAC. Users can now rate a catalogue record on a scale of 5 (Most useful to not useful). Also, one level of approval is added for User reviews and ratings. 
  • Circulation history download: Users can now download their Circulation history as a PDF file in OPAC

NewGenLib supports MARC 21 bibliographic data, MARC authority files, Z39.50 Client for federated searching. Bibliographic records can be exported in MODS 3.0 and AGRIS AP . The software is OAI-PMH compliant. NewGenLib has a user community with an online discussion forum.

If you are looking for potential topic map markets, the country population rank graphic from Wikipedia may help:
World Population Graph

Population isn’t everything but it should not be ignored either.

Bibliographic Framework Transition Initiative

Tuesday, October 30th, 2012

Bibliographic Framework Transition Initiative

The original announcement for this project lists its requirements but the requirements are not listed on the homepage.

The requirements are found at: The Library of Congress issues its initial plan for its Bibliographic Framework Transition Initiative for dissemination, sharing, and feedback (October 31, 2011) . Nothing in the link text says “requirements here” to me.

To effectively participate in discussions about this transition you need to know the requirements.

Requirements as of the original announcement:

Requirements for a New Bibliographic Framework Environment

Although the MARC-based infrastructure is extensive, and MARC has been adapted to changing technologies, a major effort to create a comparable exchange vehicle that is grounded in the current and expected future shape of data interchange is needed. To assure a new environment will allow reuse of valuable data and remain supportive of the current one, in addition to advancing it, the following requirements provide a basis for this work. Discussion with colleagues in the community has informed these requirements for beginning the transition to a "new bibliographic framework". Bibliographic framework is intended to indicate an environment rather than a "format".

  • Broad accommodation of content rules and data models. The new environment should be agnostic to cataloging rules, in recognition that different rules are used by different communities, for different aspects of a description, and for descriptions created in different eras, and that some metadata are not rule based. The accommodation of RDA (Resource Description and Access) will be a key factor in the development of elements, as will other mainstream library, archive, and cultural community rules such as Anglo-American Cataloguing Rules, 2nd edition (AACR2) and its predecessors, as well as DACS (Describing Archives, a Content Standard), VRA (Visual Resources Association) Core, CCO (Cataloging Cultural Objects).
  • Provision for types of data that logically accompany or support bibliographic description, such as holdings, authority, classification, preservation, technical, rights, and archival metadata. These may be accommodated through linking technological components in a modular way, standard extensions, and other techniques.
  • Accommodation of textual data, linked data with URIs instead of text, and both. It is recognized that a variety of environments and systems will exist with different capabilities for communicating and receiving and using textual data and links.
  • Consideration of the relationships between and recommendations for communications format tagging, record input conventions, and system storage/manipulation. While these environments tend to blur with today’s technology, a future bibliographic framework is likely to be seen less by catalogers than the current MARC format. Internal storage, displays from communicated data, and input screens are unlikely to have the close relationship to a communication format that they have had in the past.
  • Consideration of the needs of all sizes and types of libraries, from small public to large research. The library community is not homogeneous in the functionality needed to support its users in spite of the central role of bibliographic description of resources within cultural institutions. Although the MARC format became a key factor in the development of systems and services, libraries implement services according to the needs of their users and their available resources. The new bibliographic framework will continue to support simpler needs in addition to those of large research libraries.
  • Continuation of maintenance of MARC until no longer necessary. It is recognized that systems and services based on the MARC 21 communications record will be an important part of the infrastructure for many years. With library budgets already stretched to cover resource purchases, large system changes are difficult to implement because of the associated costs. With the migration in the near term of a large segment of the library community from AACR to RDA, we will need to have RDA-adapted MARC available. While that need is already being addressed, it is recognized that RDA is still evolving and additional changes may be required. Changes to MARC not associated with RDA should be minimal as the energy of the community focuses on the implementation of RDA and on this initiative.
  • Compatibility with MARC-based records. While a new schema for communications could be radically different, it will need to enable use of data currently found in MARC, since redescribing resources will not be feasible. Ideally there would be an option to preserve all data from a MARC record.
  • Provision of transformation from MARC 21 to a new bibliographic environment. A key requirement will be software that converts data to be moved from MARC to the new bibliographic framework and back, if possible, in order to enable experimentation, testing, and other activities related to evolution of the environment.

The Library of Congress (LC) and its MARC partners are interested in a deliberate change that allows the community to move into the future with a more robust, open, and extensible carrier for our rich bibliographic data, and one that better accommodates the library community’s new cataloging rules, RDA. The effort will take place in parallel with the maintenance of MARC 21 as new models are tested. It is expected that new systems and services will be developed to help libraries and provide the same cost savings they do today. Sensitivity to the effect of rapid change enables gradual implementation by systems and infrastructures, and preserves compatibility with existing data.

Ongoing discussion at: Bibliographic Framework Transition Initiative Forum, BIBFRAME@LISTSERV.LOC.GOV.

The requirements recognize a future of semantic and technological heterogeneity.

Similar to the semantic and technological heterogeneity we have now and have had in the past.

A warning to those expecting a semantic and technological rapture of homogeneity.

(I first saw this initiative at: NoSQL Bibliographic Records: Implementing a Native FRBR Datastore with Redis.)


Wednesday, October 24th, 2012


Most publishers have TOC services for new issues of their journals.

JournalTOCs aggregates TOCs from publishers and maintains a searchable database of their TOC postings.

A database that is accessible via a free API I should add.

The API should be a useful way to add journal articles to a topic map, particularly when you want to add selected articles and not entire issues.

I am looking forward to using and exploring JournalTOCs.

Suggest you do the same.

What is Umlaut anyway?

Monday, April 30th, 2012

What is Umlaut anyway?

From the webpage:

Umlaut is software for libraries (you know the kind with books), which deals with advertising services for specific known citations. It runs as Ruby on Rails application via an engine gem.

Umlaut could be called an ‘open source front-end for a link resolver’ — Umlaut accepts requests in OpenURL format, but has no knowledge base of it’s own, it can be used as a front-end for an existing knowledge base. (Currently SFX, but other plugins can be written).

And that describes Umlaut’s historical origin and one of it’s prime use cases. But in using and further developing Umlaut, I’ve come to realize that it has a more general purpose, as a new kind of infrastructural component.

Better, although a bit buzzword laden:

Umlaut is a just-in-time aggregator of “last mile” specific citation services, taking input as OpenURL, and providing an HTML UI as well as an api suite for embedding Umlaut services in other applications.

(In truth, that’s just a generalization of what your OpenURL Link Resolver does now, but considered from a different more flexible vantage).

Reading under Last Mile, Specific Citation I find:

Umlaut is not concerned with the search/discovery part of user research. Umlaut’s role begins when a particular item has been identified, with a citation in machine-accessible form (ie, title, author, journal, page number, etc., all in seperate elements).

Umlaut’s role is to provide the user with services that apply to the item of interest. Services provided by the hosting institution, licensed by the hosting institution, or free services the hosting institution wishes to advertise/recommend to it’s users.

Umlaut strives to supply links that take the user in as few clicks as possible to the service listed, without ever listing ‘blind links’ that you first have to click on to find out whether they are available. Umlaut pre-checks things when neccesary to only list services, with any needed contextual info, such that the user knows what they get when they click on it. Save the time of the user.

Starts with a particular subject (nee item) and maps known services to it.

Although links to subscriber services are unlikely to be interchangeable, links to public domain resources or those with public identifiers would be interchangeable. Potential for a mapping syntax? Or transmission of the “discovery” of such resources?

The Library In Your Pocket

Friday, March 2nd, 2012

The Library In Your Pocket by Meredith Farkas.

A delightful slidedeck of suggestions for effective delivery of content to mobile devices for libraries.

Since topic maps deliver content as well, I thought at least some of the suggestions would be useful there as well.

The effective design of library websites for mobile access seems particularly appropriate for topic maps.

Do you have a separate interface for mobile access? Care to say a few words about it?


Saturday, November 26th, 2011


From the webpage:

Invenio is a free software suite enabling you to run your own digital library or document repository on the web. The technology offered by the software covers all aspects of digital library management from document ingestion through classification, indexing, and curation to dissemination. Invenio complies with standards such as the Open Archives Initiative metadata harvesting protocol (OAI-PMH) and uses MARC 21 as its underlying bibliographic format. The flexibility and performance of Invenio make it a comprehensive solution for management of document repositories of moderate to large sizes (several millions of records).

Invenio has been originally developed at CERN to run the CERN document server, managing over 1,000,000 bibliographic records in high-energy physics since 2002, covering articles, books, journals, photos, videos, and more. Invenio is being co-developed by an international collaboration comprising institutes such as CERN, DESY, EPFL, FNAL, SLAC and is being used by about thirty scientific institutions worldwide (see demo).

If you would like to try it out yourself, please download our latest version. If you have any questions about the software or the support behind it, please join our mailing lists or contact us.

A stage where even modest improvements in results would be likely to attract attention.

Towards georeferencing archival collections

Friday, October 21st, 2011

Towards georeferencing archival collections

From the post:

One of the most effective ways to associate objects in archival collections with related objects is with controlled access terms: personal, corporate, and family names; places; subjects. These associations are meaningless if chosen arbitrarily. With respect to machine processing, Thomas Jefferson and Jefferson, Thomas are not seen as the same individual when judging by the textual string alone. While EADitor has incorporated authorized headings from LCSH and local vocabulary (scraped from terms found in EAD files currently in the eXist database) almost since its inception, it has not until recently interacted with other controlled vocabulary services. Interacting with EAC-CPF and geographical services is high on the development priority list.

Over the last week, I have been working on incorporating queries into the XForms application. Geonames provides stable URIs for more than 7.5 million place names internationally. XML representations of each place are accessible through various REST APIs. These XML datastreams also include the latitude and longitude, which will make it possible to georeference archival collections as a whole or individual items within collections (an item-level indexing strategy will be offered in EADitor as an alternative to traditional, collection-based indexing soon).

This looks very interesting.


EADitor project site (Google Code):
Installation instructions (specific for Ubuntu but broadly applies to all Unix-based systems):
Google Group:

International Conference on Theory and Practice of Digital Libraries

Tuesday, January 11th, 2011

International Conference on Theory and Practice of Digital Libraries – Call for papers in four general areas:

Foundations: Technology and Methodologies

  • Digital libraries: architectures and infrastructures
  • Metadata standards and protocols in digital library systems
  • Interoperability in digital libraries, data and information integration
  • Distributed and collaborative information spaces
  • Systems, algorithms, and models for digital preservation
  • Personalization in digital libraries
  • Information access: retrieval and browsing
  • Information organization
  • Information visualization
  • Multimedia information management and retrieval
  • Multilinguality in digital libraries
  • Knowledge organization and ontologies in digital libraries

Digital Humanities

  • Digital libraries in cultural heritage
  • Computational linguistics: text mining and retrieval
  • Organizational aspects of digital preservation
  • Information policy and legal aspects (e.g., copyright laws)
  • Social networks and networked information
  • Human factors in networked information
  • Scholarly primitives

Research Data

  • Architectures for large-scale data management (e.g., Grids, Clouds)
  • Cyberinfrastructures: architectures, operation and evolution
  • Collaborative information environments
  • Data mining and extraction of structure from networked information
  • Scientific data curation
  • Metadata for scientific data, data provenance
  • Services and workflows for scientific data
  • Data and knowledge management in virtual organizations

Applications and User Experience

  • Multi-national digital library federations (e.g., Europeana)
  • Digital Libraries in eGovernment, elearning, eHealth, eScience, ePublishing
  • Semantic Web and Linked Data
  • User studies for and evaluation of digital library systems and applications
  • Personal information management and personal digital libraries
  • Enterprise-scale knowledge and information management
  • User behavior and modeling
  • User mobility and context awareness in information access
  • User interfaces for digital libraries

Topic maps have a contribution to make in these areas. Don’t be shy!

Important dates

Abstract submission (full and short papers): March 21, 2011

Research paper submission: March 28, 2011 (midnight HAST, GMT -10hrs)

Notification of acceptance: May 23, 2011

Submission of final version: June 6, 2011

PS: Note the call for demos on all the same areas. Demo submission – Due March 28, 2011; Notification of acceptance – May 23, 2011; Submission of final version – June 6, 2011

Moving Forward – Library Project Blog

Thursday, January 6th, 2011

Moving Forward is a blog I discovered via alls things cataloged.

From the Forward blog:

Forward is a Resource Discovery experiment that builds a unified search interface for library data.

Today Forward is 100% of the UW System Library catalogs and two UW digital collections. The project also experiments with additional search contextualization by using web service APIs.

Forward can be accessed at the URL:

Sounds like a great opportunity for topic map fans with an interest in library interfaces to make a contribution.

Invenio – Library Software

Tuesday, December 14th, 2010

Invenio (new release)

From the website:

Invenio is a free software suite enabling you to run your own digital library or document repository on the web. The technology offered by the software covers all aspects of digital library management from document ingestion through classification, indexing, and curation to dissemination. Invenio complies with standards such as the Open Archives Initiative metadata harvesting protocol (OAI-PMH) and uses MARC 21 as its underlying bibliographic format. The flexibility and performance of Invenio make it a comprehensive solution for management of document repositories of moderate to large sizes (several millions of records).

Invenio has been originally developed at CERN to run the CERN document server, managing over 1,000,000 bibliographic records in high-energy physics since 2002, covering articles, books, journals, photos, videos, and more. Invenio is being co-developed by an international collaboration comprising institutes such as CERN, DESY, EPFL, FNAL, SLAC and is being used by about thirty scientific institutions worldwide (see demo).

One of many open source library projects where topic maps are certainly relevant.


Choose one site for review and one for comparison from General/Demo – Invenio

  1. What features of the site you are reviewing could be enhanced by the use of topic maps? Give five (5) specific search results that could be improved and then say how they could be improved. (3-5 pages, include search results)
  2. Are your improvements domain specific? Use the comparison site in answering this question. (3-5 pages, no citations)
  3. How would you go about making the case for altering the current distribution? What is the payoff for the end user? (not the same as enhancement, asking about when end users would find easier/better/faster. Perhaps you should ask end users? How would you do that?) (3-5 pages, no citations)