Cataloguing projects

Tuesday, March 11th, 2014

Cataloguing projects (UK National Archive)

From the webpage:

The National Archives’ Cataloguing Strategy

The overall objective of our cataloguing work is to deliver more comprehensive and searchable catalogues, thus improving access to public records. To make online searches work well we need to provide adequate data and prioritise cataloguing work that tackles less adequate descriptions. For example, we regard ranges of abbreviated names or file numbers as inadequate.

I was lead to this delightful resource by a tweet from David Underdown, advising that his presentation from National Catalogue Day in 2013 was now onlne.

His presentation along with several others and reports about projects in prior years are available at this projects page.

I thought the presentation titled: Opening up of Litigation: 1385-1875 by Amanda Bevan and David Foster, was quite interesting in light of various projects that want to create new “public” citation systems for law and litigation.

I haven’t seen such a proposal yet that gives sufficient consideration to the enormity of what do you do with old legal materials?

The litigation presentation could be a poster child for topic maps.

I am looking forward to reading the other presentations as well.

Linked Data for Holdings and Cataloging

Monday, February 25th, 2013

From the ALA Midwinter Meeting:

Linked Data for Holdings and Cataloging: The First Step Is Always the Hardest! by Eric Miller (Zepheira) and Richard Wallis (OCLC). (Video + Slides)

Linked Data for Holdings and Cataloging: Interactive Session. (Audio)

Since linked data wasn’t designed for human users, the advantage for library catalogs isn’t clear.

Most users can’t use LCSH so perhaps the lack of utility will go unnoticed. (Subject Headings and the Semantic Web)

I first saw this at: Linked Data for Holdings and Cataloging – recordings now available!

Paper Machines: About Cards & Catalogs, 1548-1929

Sunday, January 27th, 2013

Paper Machines: About Cards & Catalogs, 1548-1929 by Markus Krajewski, translated by Peter Krapp.

From the webpage:

Today on almost every desk in every office sits a computer. Eighty years ago, desktops were equipped with a nonelectronic data processing machine: a card file. In Paper Machines, Markus Krajewski traces the evolution of this proto-computer of rearrangeable parts (file cards) that became ubiquitous in offices between the world wars.

The story begins with Konrad Gessner, a sixteenth-century Swiss polymath who described a new method of processing data: to cut up a sheet of handwritten notes into slips of paper, with one fact or topic per slip, and arrange as desired. In the late eighteenth century, the card catalog became the librarian’s answer to the threat of information overload. Then, at the turn of the twentieth century, business adopted the technology of the card catalog as a bookkeeping tool. Krajewski explores this conceptual development and casts the card file as a “universal paper machine” that accomplishes the basic operations of Turing’s universal discrete machine: storing, processing, and transferring data. In telling his story, Krajewski takes the reader on a number of illuminating detours, telling us, for example, that the card catalog and the numbered street address emerged at the same time in the same city (Vienna), and that Harvard University’s home-grown cataloging system grew out of a librarian’s laziness; and that Melvil Dewey (originator of the Dewey Decimal System) helped bring about the technology transfer of card files to business.

Before ordering a copy, you may want to read Alistair Black’s review. Despite an overall positive impression, Alistair records:

Be warned, Paper Machines is not an easy read. It is not just that in some sections the narrative jumps around, points already firmly made are needlessly repeated, the characters in the plot are not always introduced carefully enough, and a great deal seems to have been lost in translation. More serious than these difficulties, the book is written entirely in the present tense. This is both disconcerting and distracting. I’m surprised the editorial team (the book is part of a monograph series titled “History and Foundations of Information Science”) and a publisher as reputable as the MIT Press allowed this to happen; unless, that is, the original German version was itself written in the present tense, which for a historical discourse I would find baffling.

Alistair does conclude:

My final advice with respect to this book: it is a good addition to the emerging field of information history and the reader should persevere with it, despite its deficiencies in narrative style. The excellent illustrations will help in this regard.

Taking Alistair’s comments at face value, I would have to agree that correction of them would make the book an easier read.

On the other hand, working through Paper Machines and perhaps developing references in addition to those given, will give many hours of delight.

Constructing a true LCSH tree of a science and engineering collection

Monday, November 19th, 2012

Constructing a true LCSH tree of a science and engineering collection by Charles-Antoine Julien, Pierre Tirilly, John E. Leide and Catherine Guastavino.


The Library of Congress Subject Headings (LCSH) is a subject structure used to index large library collections throughout the world. Browsing a collection through LCSH is difficult using current online tools in part because users cannot explore the structure using their existing experience navigating file hierarchies on their hard drives. This is due to inconsistencies in the LCSH structure, which does not adhere to the specific rules defining tree structures. This article proposes a method to adapt the LCSH structure to reflect a real-world collection from the domain of science and engineering. This structure is transformed into a valid tree structure using an automatic process. The analysis of the resulting LCSH tree shows a large and complex structure. The analysis of the distribution of information within the LCSH tree reveals a power law distribution where the vast majority of subjects contain few information items and a few subjects contain the vast majority of the collection.

After a detailed analysis of records from the McGill University Libraries (204,430 topical authority records) and 130,940 bibliographic records (Schulich Science and Engineering Library), the authors conclude in part:

This revealed that the structure was large, highly redundant due to multiple inheritances, very deep, and unbalanced. The complexity of the LCSH tree is a likely usability barrier for subject browsing and navigation of the information collection.

For me the most compelling part of this research was the focus on LCSH as used and not as it imagines itself. Very interesting reading. A slow walk through the bibliography will interest those researching LCSH or classification more generally.

Demonstration of the power law with the use of LCSH makes one wonder about other classification systems as used.

Dragsters, Drag Cars & Drag Racing Cars

Friday, February 10th, 2012

I still remember the cover of Hot Rod magazine that announced (from memory) “The 6’s are here!” Don “The Snake” Prudhomme had broken the 200 mph barrier in a drag race. Other memories follow on from that one but I mention it to explain my interest in a recent Subject Authority Cooperative Program decision to not have a cross-reference from dragster (the term I would have used) to more recent terms, drag cars or drag racing cars.

The expected search (in this order) due to this decision is:

Cars (Automobiles) -> redirect to Automobiles -> Automobiles -> narrower term -> Automobiles, racing -> narrower term -> Dragsters

Adam L. Schiff, proposer of drag cars & drag racing cars says below “This just is not likely to happen.”

Question: Is there a relationship between users “work[ing] their way up and down hierarchies” and display of relationships methods? Who chooses which items will be the starting point to lead to other items? How do you integrate a keyword search into such a system?

Question: And what of the full phrase/sentence AI systems where keywords work less well? How does that work with relationship display systems?

Question: I wonder if the relationship display methods are closer to the up and down hierarchies, but with less guidance?

Adam’s Dragster proposal post in full:


Automobiles has a UF Cars (Automobiles). Since the UF already exists on the basic heading, it is not necessary to add it to Dragsters. The proposal was not approved.

Our proposal was to add two additional cross-references to Dragsters: Drag cars, and Drag racing cars. While I understand, in principle, the reasoning behind the rejection of these additional references, I do not see how it serves users. A user coming to a catalog to search for the subject “Drag cars” will now get nothing, no redirection to the established heading. I don’t see how the presence of a reference from Cars (Automobiles) to Automobiles helps any user who starts a search with “Drag cars”. Only if they begin their search with Cars would they get led to Automobiles, and then only if they pursue narrower terms under that heading would they find Automobiles, Racing, which they would then have to follow further down to Dragsters. This just is not likely to happen. Instead they will probably start with a keyword search on “Drag cars” and find nothing, or if lucky, find one or two resources and think they have it all. And if they are astute enough to look at the subject headings on one of the records and see “Dragsters”, perhaps they will then redo their search.

Since the proposed cross-refs do not begin with the word Cars, I do not at all see how a decision like this is in the service of users of our catalogs. I think that LCSH rules for references were developed when it was expected that users would consult the big red books and work their way up and down hierarchies. While some online systems do provide for such navigation, it is doubtful that many users take this approach. Keyword searching is predominant in our catalogs and on the Web. Providing as many cross-refs to established headings as we can would be desirable. If the worry is that the printed red books will grow to too many volumes if we add more variant forms that weren’t made in the card environment, then perhaps there needs to be a way to include some references in authority records but mark them as not suitable for printing in printed products.

PS: According to ODLIS: Online Dictionary for Library and Information Science by Joan M. Reitz, UF, has the following definition:

used for (UF)

A phrase indicating a term (or terms) synonymous with an authorized subject heading or descriptor, not used in cataloging or indexing to avoid scatter. In a subject headings list or thesaurus of controlled vocabulary, synonyms are given immediately following the official heading. In the alphabetical list of indexing terms, they are included as lead-in vocabulary followed by a see or USE cross-reference directing the user to the correct heading. See also: syndetic structure.

I did not attempt to reproduce the extremely rich cross-linking in this entry but commend the entire resource to your attention, particularly if you are a library science student.

RDA: Resource Description and Access

Wednesday, November 17th, 2010

RDA: Resource Description and Access

From the website:

RDA: Resource Description and Access is the new standard for resource description and access designed for the digital world. Built on the foundations established by AACR2, RDA provides a comprehensive set of guidelines and instructions on resource description and access covering all types of content and media. (emphasis in original)

In case you are interested in the draft of 2008 version, just to get the flavor of it, see:

More to follow on RDA and topic maps.

VoxPopuLII – Blog

Friday, October 29th, 2010


From the blog:

VoxPopuLII is a guest-blogging project sponsored by the Legal Information Institute at the Cornell Law School. It presents the insights of a the very diverse group of people working on legal informatics issues and government information, all around the world. It emphasizes new voices and big ideas.

Not your average blog.

I first encountered: LexML Brazil Project

Questions (about LexML):

  1. What do you think about the strategy to deal with semantic diversity? Pluses? Minuses?
  2. The project says they are following: “Ranganathan’s ‘stratification planes’ classification system…” Your evaluation?
  3. Identify 3 instances of equivalents to the “stratification planes” classification system.
  4. How would you map those 3 instances to Ranganathan’s “stratification planes?”

CASPAR (Cultural, Artistic, and Scientific Knowledge for Preservation, Access and Retrieval)

Saturday, October 23rd, 2010

CASPAR (Cultural, Artistic, and Scientific Knowledge for Preservation, Access and Retrieval).

From the website:

CASPAR methodological and technological solution:

  • is compliant to the OAIS Reference Model – the main standard of reference in digital preservation
  • is technology-neutral: the preservation environment could be implemented using any kind of emerging technology
  • adopts a distributed, asynchronous, loosely coupled architecture and each key component is self-contained and portable: it may be deployed without dependencies on different platform and framework
  • is domain independent: it could be applied with low additional effort to multiple domains/contexts.
  • preserves knowledge and intelligibility, not just the “bits”
  • guarantees the integrity and identity of the information preserved as well as the protection of digital rights

FYI: OAIS Reference Model

As a librarian, you will be confronted with claims similar to these in vendor literature, grant applications and other marketing materials.


  1. Pick one of these claims. What documentation/software produced by the project would you review to evaluate the claim you have chosen?
  2. What other materials do you think would be relevant to your review?
  3. Perform the actual review (10 – 15 pages, with citations, project)

Rethinking Library Linking: Breathing New Life into OpenURL

Friday, October 22nd, 2010

Rethinking Library Linking: Breathing New Life into OpenURL Authors: Cindi Trainor and Jason Price


OpenURL was devised to solve the “appropriate copy problem.” As online content proliferated, it became possible for libraries to obtain the same content from multiple locales: directly from publishers and subscription agents; indirectly through licensing citation databases that contain full text; and, increasingly, from free online sources. Before the advent of OpenURL, the only way to know whether a journal was held by the library was to search multiple resources. An OpenURL link resolver accepts links from library citation databases (sources) and returns to the user a menu of choices (targets) that may include links to full text, the library catalog, and other related services (figure 1). Key to understanding OpenURL is the concept of “context sensitive” linking: links to the same item will be different for users of different libraries, and are dependent on the library’s collections. This issue of Library Technology Reports provides practicing librarians with real-world examples and strategies for improving resolver usability and functionality in their own institutions.


OpenURL (ANSI/NISO Z39.88-2004 archives


  1. OCLC says of OpenURL

    Remember the card catalog? Everything in a library was represented in the card catalog with one or more cards carrying bibliographic information. OpenURL is the internet equivalent of those index cards.

  2. True? 3-5 pages, no citations, or
  3. False? 3-5 pages, no citations.

Catalogue & Index Blog

Monday, September 20th, 2010

Catalogue & Index Blog.

Blog from the Chartered Institute of Library and Information Professionals (CILIP) Cataloging and Indexing Group.

News about cataloging, indexing and Cataloging and Indexing Group activities.

Planet Cataloging

Friday, September 17th, 2010

Planet Cataloging

Aggregation of > 60 blogs on cataloging.

Read to improve your topic mapping (and cataloging) skills.

Almost A Topic Map

Thursday, September 16th, 2010

Ann Arbor District Library, a very cool library that has added a topic map like characteristic to its catalog.

User tags are stored separately but displayed alongside the controlled vocabulary of the library.

Some subject identifications are more equal than others.

A legitimate choice that enhances both the formal vocabulary as well as the user supplied “tags.”

One small step towards topic maps, ….

Supplemental: 17 September 2010

More that one reader reported that my post was unclear. Here is a bit fuller explanation.

Follow the link Catalog. Next to the search catalog text book you will see a drop down menu. Select that and see “Tags” as one of the options. Those “tags” are supplied by users of the catalog. In other words, you can search by the controlled vocabulary of the library or by user tags. Both are associated with particular items in the collection.

…Library of Congress Subject Heading for Social Tags

Monday, August 2nd, 2010

“A Semantic Similarity Approach for Predicting Library of Congress Subject Headings for Social Tags,” by Kwan Yi, appears in JASIST, 61(8):1658-1672, 2010. This is an important article for library students to read. Carefully.

The author recognizes that linking social tags to controlled vocabularies may help with the organization of information that is only socially tagged. And the article is a good review of the application of five popular measures of semantic similarity metrics.

The interesting step for the article would be the reverse of the author’s suggested: “The study of introducing the LCSH to give a control to social tags…”(p. 1670).

Why not introduce “social tags” to enrich the finding experience of users in LCSH settings?

A substantial body of users find information with “social tags,” so why not offer that option?

The user experience with “social tags” along side LCSH headings in a library setting awaits future research.

From Moby-Dick To Mashups: Thinking About Bibliographic Networks

Monday, July 26th, 2010

From Moby-Dick To Mashups: Thinking About Bibliographic Networks was reported by the The FRBR Blog with the following summary:

Summary: Traditional and contemporary attempts to identify and describe simple and complex bibliographic resources have overlooked useful and powerful possibilities, due to the insufficient modeling of “bibliographic things of interest.” The presentation will introduce a resource description approach that remodels and strengthens FRBR by borrowing key concepts from Information Science and the History of Science. The presentation will reveal portions of a network of bibliographic (and other useful) relationships between printings of Melville?s novel dating from 1851-1975 into the present. In addition, structural similarities between the print publication network and the multimedia “mash-ups” seen on YouTube and other websites will be demonstrated and discussed.

Anyone creating a topic map for library resources needs to review these slides.

Semantic Compression

Saturday, June 26th, 2010

It isn’t difficult to find indexing terms to represent documents.

But, whatever indexing terms are used, a large portion of relevant documents will go unfound. As much as 80% of the relevant documents. See Size Really Does Matter… (A study of full text searching but the underlying problem is the same: “What term was used?”)

You read a document, are familiar with its author, concepts, literature it cites, the relationships of that literature to the document and the relationships between the ideas in the document. Now you have to choose one or more terms to represent all the semantics and semantic relationships in the document. The exercise you are engaged in is compressing the semantics in a document into one or more terms.

Unlike data compression, a la Shannon, the semantic compression algorithm used by any user is unknown. We know it isn’t possible to decompress an indexing term to recover all the semantics of a document it purports to represent. Since a term is used to represent several documents, the problem is even worse. We would have to decompress the term to recover the semantics of all the documents it represents.

Even without the algorithm used to assign indexing (or tagging) terms, investigation of semantic compression could be useful. For example, encoding the semantics of a set of documents (to a set depth) and then asking groups of users to assign those documents indexing or tagging terms. By varying the semantics in the documents, it may, emphasis on may, be possible to experimentally derive partial semantic decompression for some terms and classes of users.

Subject World

Tuesday, May 11th, 2010

Subject World (Japanese only)

Subject World is a project to visualize heterogeneous terminology, including catalogs, for use with library catalogs. Uses BSH4 subject headings (Basic Subject Headings) and NDC9 index terms (Nippon Decimal Classification) to visualize and retrieve information from the Osaka City University OPAC.

English language resources:

Subject World: A System for Visualizing OPAC (paper)

Slides with the same title (but different publication from the paper):

Subject World: A System for Visualizing OPAC (slides)

See also: Murakami Harumi Laboratory, in particular its research and publication pages.

Subject Headings and Topic Maps

Monday, May 10th, 2010

Leveraging on prior work should be part of any topic map project.

Building topic maps with subject headings? See: Making topic maps from Subject Headings, a slide pack from Motomu Naito, a regular contributor in the topic maps community.

Project is using NDLSH 2008 (National Diet Library Subject Headings, subject headings 17,953), BSH4 (Basic Subject Headings, Japanese Library Association, subject headings, 7847), LCSH (Library of Congress Subject Headings, subject headings, 372,399).

Slides describe organizing Wikipedia using subject headings, merging subjects with subject headings, and, using LSCH subjects as a bridges to map between subject headings in different languages.

Forward to your local library researcher.

Kilroy Was Here

Monday, March 15th, 2010

Have you ever had one of those “Kilroy Was Here” sort of moments? You think that you are exploring some new idea, only to turn the corner and there you see: “Kilroy was here” in bright bold letters? Except that most of the time for me, it doesn’t read “Kilroy was here,” but rather “Librarians were here.”

I was reading Lois Mai Chan’s Cataloging and Classification: An Introduction when I ran across the concept of access points. Or in Chan’s words, “…the ways a given item may be retrieved.” (page 9) If you broaden that out to say the “…ways a given subject may be retrieved from a topic map…” then it sounds very much like useful information for anyone who wants to build a topic map.

Librarians have spent years researching, implementing, testing and improving ways of accessing information. I think the smart money is going to be on using that knowledge and experience in building topic maps. Look for me in the periodical shelves with library journals. I will try to post short notices of anything that looks particularly interesting. Suggestions more than welcome.