Archive for the ‘MARC’ Category

2015 Medical Subject Headings (MeSH) Now Available

Thursday, September 18th, 2014

2015 Medical Subject Headings (MeSH) Now Available

From the post:

Introduction to MeSH 2015
The Introduction to MeSH 2015 is now available, including information on its use and structure, as well as recent updates and availability of data.

MeSH Browser
The default year in the MeSH Browser remains 2014 MeSH for now, but the alternate link provides access to 2015 MeSH. The MeSH Section will continue to provide access via the MeSH Browser for two years of the vocabulary: the current year and an alternate year. Sometime in November or December, the default year will change to 2015 MeSH and the alternate link will provide access to the 2014 MeSH.

Download MeSH
Download 2015 MeSH in XML and ASCII formats. Also available for 2015 from the same MeSH download page are:

  • Pharmacologic Actions (Forthcoming)
  • New Headings with Scope Notes
  • MeSH Replaced Headings
  • MeSH MN (tree number) changes
  • 2015 MeSH in MARC format


Isochronic Passage Chart for Travelers

Tuesday, June 24th, 2014

isochronic map

From the blog of Arthur Charpentier, Somewhere Else, part 142

(departing from London, ht ) by Francis Galton, 1881

A much larger image that is easier to read.

Although not on such a grand scale, an isochronic passage map for data could be interesting for your enterprise.

How much time does elapse from your request until a response from another department or team?

Presented visually, with this map as a reference for the technique, your evidence of data bottlenecks could be persuasive!

Aligning Controlled Vocabularies

Tuesday, March 18th, 2014

Tutorial on the use of SILK for aligning controlled vocabularies

From the post:

A tutorial on the use of SILK has been published.The SILK framework is a tool for discovering relationships between data items within different Linked Data sources.This tutorial explains how SILK can be used to discover links between concepts in controlled vocabularies.

Example used in this Tutorial

The tutorial uses an example where SILK is used to create a mapping between the Named Authority Lists (NALs) of the Publications Office of the EU and the MARC countries list of the US Library of Congress. Both controlled vocabularies (NALs & MARC Countries list) use URIs to identify countires, compare for example, the following URIs for the country of Luxembourg

SILK represents mappings between NALs using the SKOS language (skos:exactMatch). In the case of the URIs for Luxembourg this is expressed as N-Triples:

The tutorial is here.

If you bother to look up the documentation on skos:exactMatch:

The property skos:exactMatch is used to link two concepts, indicating a high degree of confidence that the concepts can be used interchangeably across a wide range of information retrieval applications. skos:exactMatch is a transitive property, and is a sub-property of skos:closeMatch.

Are you happy with “…a high degree of confidence that the concepts can be used interchangeably across a wide range of information retrieval applications?”

I’m not really sure what that means?

Not to mention that if 97% of the people in a geographic region want a new government, some will say it can join a new country, but if the United States disagrees (for reasons best known to itself), then the will of 97% of the people is a violation of international law.

What? Too much democracy? I didn’t know that was a violation of international law.

If SKOS statements had some content, properties I suppose, along with authorship (and properties there as well), you could make an argument for skos:exactMatch being useful.

So far as I can see, it is not even a skos:closeMatch to “useful.”

Data Mining the Internet Archive Collection [Librarians Take Note]

Wednesday, March 12th, 2014

Data Mining the Internet Archive Collection by Caleb McDaniel.

From the “Lesson Goals:”

The collections of the Internet Archive (IA) include many digitized sources of interest to historians, including early JSTOR journal content, John Adams’s personal library, and the Haiti collection at the John Carter Brown Library. In short, to quote Programming Historian Ian Milligan, “The Internet Archive rocks.”

In this lesson, you’ll learn how to download files from such collections using a Python module specifically designed for the Internet Archive. You will also learn how to use another Python module designed for parsing MARC XML records, a widely used standard for formatting bibliographic metadata.

For demonstration purposes, this lesson will focus on working with the digitized version of the Anti-Slavery Collection at the Boston Public Library in Copley Square. We will first download a large collection of MARC records from this collection, and then use Python to retrieve and analyze bibliographic information about items in the collection. For example, by the end of this lesson, you will be able to create a list of every named place from which a letter in the antislavery collection was written, which you could then use for a mapping project or some other kind of analysis.

This rocks!

In particular for librarians and library students who will already be familiar with MARC records.

Some 7,000 items from the Boston Public Library’s anti-slavery collection at Copley Square are the focus of this lesson.

That means historians have access to rich metadata, full images, and partial descriptions for thousands of antislavery letters, manuscripts, and publications.

Would original anti-slavery materials, written by actual participants, have interested you as a student? Do you think such materials would interest students now?

I first saw this in a tweet by Gregory Piatetsky.

Bibliographic Framework Transition Initiative

Tuesday, October 30th, 2012

Bibliographic Framework Transition Initiative

The original announcement for this project lists its requirements but the requirements are not listed on the homepage.

The requirements are found at: The Library of Congress issues its initial plan for its Bibliographic Framework Transition Initiative for dissemination, sharing, and feedback (October 31, 2011) . Nothing in the link text says “requirements here” to me.

To effectively participate in discussions about this transition you need to know the requirements.

Requirements as of the original announcement:

Requirements for a New Bibliographic Framework Environment

Although the MARC-based infrastructure is extensive, and MARC has been adapted to changing technologies, a major effort to create a comparable exchange vehicle that is grounded in the current and expected future shape of data interchange is needed. To assure a new environment will allow reuse of valuable data and remain supportive of the current one, in addition to advancing it, the following requirements provide a basis for this work. Discussion with colleagues in the community has informed these requirements for beginning the transition to a "new bibliographic framework". Bibliographic framework is intended to indicate an environment rather than a "format".

  • Broad accommodation of content rules and data models. The new environment should be agnostic to cataloging rules, in recognition that different rules are used by different communities, for different aspects of a description, and for descriptions created in different eras, and that some metadata are not rule based. The accommodation of RDA (Resource Description and Access) will be a key factor in the development of elements, as will other mainstream library, archive, and cultural community rules such as Anglo-American Cataloguing Rules, 2nd edition (AACR2) and its predecessors, as well as DACS (Describing Archives, a Content Standard), VRA (Visual Resources Association) Core, CCO (Cataloging Cultural Objects).
  • Provision for types of data that logically accompany or support bibliographic description, such as holdings, authority, classification, preservation, technical, rights, and archival metadata. These may be accommodated through linking technological components in a modular way, standard extensions, and other techniques.
  • Accommodation of textual data, linked data with URIs instead of text, and both. It is recognized that a variety of environments and systems will exist with different capabilities for communicating and receiving and using textual data and links.
  • Consideration of the relationships between and recommendations for communications format tagging, record input conventions, and system storage/manipulation. While these environments tend to blur with today’s technology, a future bibliographic framework is likely to be seen less by catalogers than the current MARC format. Internal storage, displays from communicated data, and input screens are unlikely to have the close relationship to a communication format that they have had in the past.
  • Consideration of the needs of all sizes and types of libraries, from small public to large research. The library community is not homogeneous in the functionality needed to support its users in spite of the central role of bibliographic description of resources within cultural institutions. Although the MARC format became a key factor in the development of systems and services, libraries implement services according to the needs of their users and their available resources. The new bibliographic framework will continue to support simpler needs in addition to those of large research libraries.
  • Continuation of maintenance of MARC until no longer necessary. It is recognized that systems and services based on the MARC 21 communications record will be an important part of the infrastructure for many years. With library budgets already stretched to cover resource purchases, large system changes are difficult to implement because of the associated costs. With the migration in the near term of a large segment of the library community from AACR to RDA, we will need to have RDA-adapted MARC available. While that need is already being addressed, it is recognized that RDA is still evolving and additional changes may be required. Changes to MARC not associated with RDA should be minimal as the energy of the community focuses on the implementation of RDA and on this initiative.
  • Compatibility with MARC-based records. While a new schema for communications could be radically different, it will need to enable use of data currently found in MARC, since redescribing resources will not be feasible. Ideally there would be an option to preserve all data from a MARC record.
  • Provision of transformation from MARC 21 to a new bibliographic environment. A key requirement will be software that converts data to be moved from MARC to the new bibliographic framework and back, if possible, in order to enable experimentation, testing, and other activities related to evolution of the environment.

The Library of Congress (LC) and its MARC partners are interested in a deliberate change that allows the community to move into the future with a more robust, open, and extensible carrier for our rich bibliographic data, and one that better accommodates the library community’s new cataloging rules, RDA. The effort will take place in parallel with the maintenance of MARC 21 as new models are tested. It is expected that new systems and services will be developed to help libraries and provide the same cost savings they do today. Sensitivity to the effect of rapid change enables gradual implementation by systems and infrastructures, and preserves compatibility with existing data.

Ongoing discussion at: Bibliographic Framework Transition Initiative Forum, BIBFRAME@LISTSERV.LOC.GOV.

The requirements recognize a future of semantic and technological heterogeneity.

Similar to the semantic and technological heterogeneity we have now and have had in the past.

A warning to those expecting a semantic and technological rapture of homogeneity.

(I first saw this initiative at: NoSQL Bibliographic Records: Implementing a Native FRBR Datastore with Redis.)

NoSQL Bibliographic Records:…

Tuesday, October 30th, 2012

NoSQL Bibliographic Records: Implementing a Native FRBR Datastore with Redis by Jeremy Nelson.

From the background:

Using the Library of Congress Bibliographic Framework for the Digital Age as the starting point for software development requirements; the FRBR-Redis-Datastore project is a proof-of-concept for a next-generation bibliographic NoSQL system within the context of improving upon the current MARC catalog and digital repository of a small academic library at a top-tier liberal arts college.

The FRBR-Redis-Datastore project starts with a basic understanding of the MARC, MODS, and FRBR implemented using a NoSQL technology called Redis.

This presentation guides you through the theories and technologies behind one such proof-of-concept bibliographic framework for the 21st century.

I found the answer to “Well, Why Not Hadoop?”

Hadoop was just too complicated compared to the simple three-step Redis server set-up.


Simply because a technology is popular doesn’t mean it meets your requirements. Such as administration by non-full time technical experts.

An Oracle database supports applications that could manage garden club finances but that’s a poor choice under most circumstances.

The Redis part of the presentation is apparently not working (I get Python errors) as of today and I have sent a note with the error messages.

A “proof-of-concept” that merits your attention!

MaRC and SolrMaRC

Sunday, July 8th, 2012

MaRC and SolrMaRC by Owen Stephens.

From the post:

At the recent Mashcat event I volunteered to do a session called ‘making the most of MARC’. What I wanted to do was demonstrate how some of the current ‘resource discovery’ software are based on technology that can really extract value from bibliographic data held in MARC format, and how this creates opportunities for in both creating tools for users, and also library staff.

One of the triggers for the session was seeing, over a period of time, a number of complaints about the limitations of ‘resource discovery’ solutions – I wanted to show that many of the perceived limitations were not about the software, but about the implementation. I also wanted to show that while some technical knowledge is needed, some of these solutions can be run on standard PCs and this puts the tools, and the ability to experiment and play with MARC records, in the grasp of any tech-savvy librarian or user.

Many of the current ‘resource discovery’ solutions available are based on a search technology called Solr – part of a project at the Apache software foundation. Solr provides a powerful set of indexing and search facilities, but what makes it especially interesting for libraries is that there has been some significant work already carried out to use Solr to index MARC data – by the SolrMARC project. SolrMARC delivers a set of pre-configured indexes, and the ability to extract data from MARC records (gracefully handling ‘bad’ MARC data – such as badly encoded characters etc. – as well). While Solr is powerful, it is SolrMARC that makes it easy to implement and exploit in a library context.

SolrMARC is used by two open source resource discovery products – VuFind and Blacklight. Although VuFind and Blacklight have differences, and are written in different languages (VuFind is PHP while Blacklight is Ruby), since they both use Solr and specifically SolrMARC to index MARC records the indexing and search capabilities underneath are essentially the same. What makes the difference between implementations is not the underlying technology but the configuration. The configuration allows you to define what data, from which part of the MARC records, goes into which index in Solr.

Owen explains his excitement over these tools as:

These tools excite me for a couple of reasons:

  1. A shared platform for MARC indexing, with a standard way of programming extensions gives the opportunty to share techniques and scripts across platforms – if I write a clever set of bean shell scripts to calculate page counts from the 300 field (along the lines demonstrated by Tom Meehan in another Mashcat session), you can use the same scripts with no effort in your SolrMARC installation
  2. The ability to run powerful, but easy to configure, search tools on standard computers. I can get Blacklight or VuFind running on a laptop (Windows, Mac or Linux) with very little effort, and I can have a few hundred thousand MARC records indexed using my own custom routines and searchable via an interface I have complete control over

I like the “geek” appeal of #2, but creating value-add interfaces for the casual user is more likely to attract positive PR for a library.

As far as #1, how uniform are the semantics of MARC fields?

I suspect physical data, page count, etc., are fairly stable/common, what about more subjective fields? How would you test that proposition?


Tuesday, October 4th, 2011


From the webpage:

Solrmarc can index your marc records into apache solr. It also comes with an improved version of marc4j that improves handling of UTF-8 characters, is more forgiving of malformed marc data, and can recover from data errors gracefully. This indexer is used by blacklight ( and vufind ( but it can also be used as a standalone project.

Nice if short discussion of custom indexing with SolrMarc.

Springer MARC Records

Saturday, October 1st, 2011

Springer Marc Records

From the webpage:

Springer offers two options for MARC records for Springer eBook collections:

1. Free Springer MARC records, SpringerProtocols MARC records & eBook Title Lists

  • Available free of charge
  • Generated using Springer metadata containing most common fields
  • Pick, download and install Springer MARC records in 4 easy steps

2.Free OCLC MARC records

  • Available free of charge
  • More enhanced MARC records
  • Available through OCLC WORLDCAT service

This looks like very good topic map fodder.

I saw this at all things cataloged.

MARCXML to Topic Map – Sneak Preview

Wednesday, July 21st, 2010

Wandora – Sneak Preview offers support for converting MARCXML into a topic map. This link will go away when the official Wandora release supports this feature.

Aki Kivelä’s posted details at: [topicmapmail] MARCXML to Topic Maps implementation!

Aki also created an example if you don’t want to install Wandora to see this feature: Example MARCXML to topic map conversion.

As Aki would be the first to admit, this isn’t a finished solution. It is an important step on the way towards one possible solution.

Another important step is for members of this list t0 use, evaluate, test the software and give constructive feedback. Can be negative but try to offer a solution for any problem you uncover.