## Archive for the ‘MARC’ Category

### Black Womxn Authors, Library of Congress and MarcXML (Part 2)

Thursday, April 20th, 2017

(After writing this post I got a message from Clifford Anderson on a completely different way to approach the Marc to XML problem. A very neat way. But, I thought the directions on installing MarcEdit on Ubuntu 16.04 would be helpful anyway. More on Clifford’s suggestion to follow.)

If your just joining, read Black Womxn Authors, Library of Congress and MarcXML (Part 1) for the background on why this flurry of installation is at all meaningful!

The goal is to get a working copy of MarcEdit installed on my Ubuntu 16.04 machine.

MarcEdit Linux Installation Instructions reads in part:

Installation Steps:

2. Unzip the file and open the MarcEdit folder. Find the Install.txt file and read it.
3. Ensure that you have the Mono framework installed. What is Mono? Mono is an open source implementation of Microsoft’s .NET framework. The best way to describe it is that .NET is very Java-like; it’s a common runtime that can work across any platform in which the framework has been installed. There are a number of ways to get the Mono framework — for MarcEdit’s purposes, it is recommended that you download and install the official package available from the Mono Project’s website. You can find the Mac OSX download here: http://www.go-mono.com/mono-downloads/download.html
4. Run MarEdit via the command-line using mono MarcEdit.exe from within the MarcEdit directory.

Well, sort of. 😉

First, you need to go to the Mono Project Download page. From there, under Xamarin packages, follow Debian, Ubuntu, and derivatives.

There is a package for Ubuntu 16.10, but it’s Mono 4.2.1. By installing the Xamarin packages, I am running Mono 4.7.0. Your call but as a matter of habit, I run the latest compatible packages.

Updating your package lists for Debian, Ubuntu, and derivatives:

Add the Mono Project GPG signing key and the package repository to your system (if you don’t use sudo, be sure to switch to root):

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 3FA7E0328081BFF6A14DA29AA6A19B38D3D831EF

 

echo "deb http://download.mono-project.com/repo/debian wheezy main" | sudo tee /etc/apt/sources.list.d/mono-xamarin.list 

And for Ubuntu 16.10:

echo "deb http://download.mono-project.com/repo/debian wheezy-apache24-compat main" | sudo tee -a /etc/apt/sources.list.d/mono-xamarin.list

Now run:

sudo apt-get update

The Usage section suggests:

The package mono-devel should be installed to compile code.

The package mono-complete should be installed to install everything – this should cover most cases of “assembly not found” errors.

The package referenceassemblies-pcl should be installed for PCL compilation support – this will resolve most cases of “Framework not installed: .NETPortable” errors during software compilation.

The package ca-certificates-mono should be installed to get SSL certificates for HTTPS connections. Install this package if you run into trouble making HTTPS connections.

The package mono-xsp4 should be installed for running ASP.NET applications.

Find and select mono-complete first. Most decent package managers will show dependencies that will be installed. Add any of these that were missed.

Do follow the hints here to verify that Mono is working correctly.

Are We There Yet?

Not quite. It was at this point that I unpacked http://marcedit.reeset.net/software/marcedit.bin.zip and discovered there is no “Install.txt file.” Rather there is a linux_install.txt, which reads:

a) Ensure that the dependencies have been installed
1) Dependency list:
i) MONO 3.4+ (Runtime plus the System.Windows.Forms library [these are sometimes separate])
ii) YAZ 5 + YAZ 5 develop Libraries + YAZ++ ZOOM bindings
iii) ZLIBC libraries
iV) libxml2/libxslt libraries
b) Unzip marcedit.zip
c) On first run:
a) mono MarcEdit.exe
b) Preferences tab will open, click on other, and set the following two values:
i) Temp path: /tmp/
ii) MONO path: [to your full mono path]

** For Z39.50 Support
d) Yaz.Sharp.dll.config — ensure that the dllmap points to the correct version of the shared libyaz object.
e) main_icon.bmp can be used for a desktop icon

Opps! Without unzipping marcedit.zip, you won’t see the dependencies:

ii) YAZ 5 + YAZ 5 develop Libraries + YAZ++ ZOOM bindings
iii) ZLIBC libraries
iV) libxml2/libxslt libraries

The YAZ site has a readme file for Ubuntu, but here is the very abbreviated version:

 wget http://ftp.indexdata.dk/debian/indexdata.asc sudo apt-key add indexdata.asc

 

echo "deb http://ftp.indexdata.dk/ubuntu xenial main" | sudo tee -a /etc/apt/sources.list echo "deb-src http://ftp.indexdata.dk/ubuntu xenial main" | sudo tee -a /etc/apt/sources.list 

(That sequence only works for Ubuntu xenial. See the readme file for other versions.)

Of course:

sudo apt-get update

As of of today, you are looking for yaz 5.21.0-1 and libyaz5-dev 5.21.0-1.

Check for and/or install ZLIBC and libxml2/libxslt libraries.

Personal taste but I reboot at this point to make sure all the libraries re-load to the correct versions, etc. Should work without rebooting but that’s up to you.

Fire it up with

mono MarcEdit.ext

Choose Locations (not Other) and confirm “Set Temporary Path:” is /tmp/ and MONO Path (the location of mono, try which mono, input the results and select OK.

I did the install on Sunday evening and so after all this, the software on loading announces it has been ungraded! Yes, while I was installing all the dependencies, a new and improved version of MarcEdit was posted.

The XML extraction is a piece of cake so I am working on the XQuery on the resulting MarcXML records for part 3.

### Black Womxn Authors, Library of Congress and MarcXML (Part 1)

Monday, April 17th, 2017

This adventure started innocently enough with the 2017 Womxn of Color Reading Challenge by Der Vang. As an “older” White male Southerner working in technology, I don’t encounter works by womxn of color unless it is intentional.

The first book, “A book that became a movie,” was easy. I read the deeply moving Beloved by Toni Morrison. I recommend reading a non-critical edition before you read a critical one. Let Morrison speak for herself before you read others offering their views on the story.

The second book, “A book that came out the year you were born,” have proven to be more difficult. Far more difficult. You see I think Der Vang was assuming a reading audience younger than I am, for which womxn of color authors would not be difficult to find. That hasn’t proven to be the case for me.

I searched the usual places but likely collections did not denote an author’s gender or race. The Atlanta-Fulton Public Library reference service came riding to the rescue after I had exhausted my talents with this message:

‘Attached is a “List of Books Published by Negro Writers in 1954 and Late 1953” (pp. 10-12) by Blyden Jackson, IN “The Blithe Newcomers: Resume of Negro Literature in 1954: Part I,” Phylon v.16, no.1 (1st Quarter 1955): 5-12, which has been annotated with classifications (Biography) or subjects (Poetry). Thirteen are written by women; however, just two are fiction. The brief article preceding the list does not mention the books by the women novelists–Elsie Jordan (Strange Sinner) or Elizabeth West Wallace (Scandal at Daybreak). No Part II has been identified. And AARL does not own these two. Searching AARL holdings in Classic Catalog by year yields seventeen by women but no fiction. Most are biographies. Two is better than none but not exactly a list.

A Celebration of Women Writers – African American Writers (http://digital.library.upenn.edu/women/_generate/
AFRICAN%20AMERICAN.html
) seems to have numerous [More Information] links which would possibly allow the requestor to determine the 1954 novelists among them.’
(emphasis in original)

Using those two authors/titles as leads, I found in the Library of Congress online catalog:

https://lccn.loc.gov/54007603
Jordan, Elsie. Strange sinner / Elsie Jordan. 1st ed. New York : Pageant, c1954.
172 p. ; 21 cm.
PZ4.J818 St

https://lccn.loc.gov/54012342
Wallace, Elizabeth West. [from old catalog] Scandal at daybreak. [1st ed.] New York, Pageant Press [1954]
167 p. 21 cm.
PZ4.W187 Sc

Checking elsewhere, both titles are out of print, although I did see one (1) copy of Elise Jordan’s Strange Sinner for \$100. I think I have located a university with a digital scan but will have to report back on that later.

Since both Jordan and Wallace published with Pageant Press the same year, I reasoned that other womxn of color may have also published with them and that could lead me to more accessible works.

Experienced librarians are no doubt already grinning because if you search for “Pageant Press,” with the Library of Congress online catalog, you get 961 “hits,” displayed 25 “hits” at a time. Yes, you can set the page to return 100 “hits at a time, but not while you have sort by date of publication selected. 🙁

That is you can display 100 “hits” per page in no particular order, or, you can display the “hits” in date of publication order, but only 25 “hits” at a time. (Or at least that was my experience, please correct me if that’s wrong.)

But, with the 100 “hits” per page, you can “save as,” but only as Marc records, Unicode (UTF-8) or not. No MarcXML format.

In the response to my query about the same, the response from the Library of Congress reads:

At the moment we have no plans to provide an option to save search results as MARCXML. We will consider it for future development projects.

I can understand that in the current climate in Washington but a way to convert Marc records to the easier (in my view) to manipulate MarcXMLformat, would be a real benefit to readers and researchers alike.

Fortunately there is a solution, MarcEdit.

From the webpage:

This LibGuide attempts to document the features of MarcEdit, which was developed by Terry Reese. It is open source software designed to facilitate the harvesting, editing, and creation of MARC records. This LibGuide was adapted from a standalone document, and while the structure of the original document has been preserved in this LibGuide, it is also available in PDF form at the link below. The original documentation and this LibGuide were written with the idea that it would be consulted on an as-needed basis. As a result, the beginning steps of many processes may be repeated within the same page or across the LibGuide as a whole so that users would be able to understand the entire process of implementing a function within MarcEdit without having to consult other guides to know where to begin. There are also screenshots that are repeated throughout, which may provide a faster reference for users to understand what steps they may already be familiar with.

Of course, installing MarcEdit on Ubuntu, isn’t a straightforward task. But I have 961 Marc records and possibly more that would be very useful in MarcXML. Tomorrow I will document the installation steps I followed with Ubuntu 16.04.

PS: I’m not ignoring the suggested A Celebration of Women Writers – African American Writers (http://digital.library.upenn.edu/women/_generate/
AFRICAN%20AMERICAN.html)
. But I have gotten distracted by the technical issue of how to convert all the holdings at the Library of Congress for a publisher into MarcXML. Suggestions on how to best use this resource?

### 2015 Medical Subject Headings (MeSH) Now Available

Thursday, September 18th, 2014

2015 Medical Subject Headings (MeSH) Now Available

From the post:

Introduction to MeSH 2015
The Introduction to MeSH 2015 is now available, including information on its use and structure, as well as recent updates and availability of data.

MeSH Browser
The default year in the MeSH Browser remains 2014 MeSH for now, but the alternate link provides access to 2015 MeSH. The MeSH Section will continue to provide access via the MeSH Browser for two years of the vocabulary: the current year and an alternate year. Sometime in November or December, the default year will change to 2015 MeSH and the alternate link will provide access to the 2014 MeSH.

• Pharmacologic Actions (Forthcoming)
• New Headings with Scope Notes
• MeSH MN (tree number) changes
• 2015 MeSH in MARC format

Enjoy!

### Isochronic Passage Chart for Travelers

Tuesday, June 24th, 2014

From the blog of Arthur Charpentier, Somewhere Else, part 142

(departing from London, ht http://mapsontheweb.zoom-maps.com/ ) by Francis Galton, 1881

A much larger image that is easier to read.

Although not on such a grand scale, an isochronic passage map for data could be interesting for your enterprise.

How much time does elapse from your request until a response from another department or team?

Presented visually, with this map as a reference for the technique, your evidence of data bottlenecks could be persuasive!

### Aligning Controlled Vocabularies

Tuesday, March 18th, 2014

Tutorial on the use of SILK for aligning controlled vocabularies

From the post:

A tutorial on the use of SILK has been published.The SILK framework is a tool for discovering relationships between data items within different Linked Data sources.This tutorial explains how SILK can be used to discover links between concepts in controlled vocabularies.

Example used in this Tutorial

The tutorial uses an example where SILK is used to create a mapping between the Named Authority Lists (NALs) of the Publications Office of the EU and the MARC countries list of the US Library of Congress. Both controlled vocabularies (NALs & MARC Countries list) use URIs to identify countires, compare for example, the following URIs for the country of Luxembourg

SILK represents mappings between NALs using the SKOS language (skos:exactMatch). In the case of the URIs for Luxembourg this is expressed as N-Triples:

The tutorial is here.

If you bother to look up the documentation on skos:exactMatch:

The property skos:exactMatch is used to link two concepts, indicating a high degree of confidence that the concepts can be used interchangeably across a wide range of information retrieval applications. skos:exactMatch is a transitive property, and is a sub-property of skos:closeMatch.

Are you happy with “…a high degree of confidence that the concepts can be used interchangeably across a wide range of information retrieval applications?”

I’m not really sure what that means?

Not to mention that if 97% of the people in a geographic region want a new government, some will say it can join a new country, but if the United States disagrees (for reasons best known to itself), then the will of 97% of the people is a violation of international law.

What? Too much democracy? I didn’t know that was a violation of international law.

If SKOS statements had some content, properties I suppose, along with authorship (and properties there as well), you could make an argument for skos:exactMatch being useful.

So far as I can see, it is not even a skos:closeMatch to “useful.”

### Data Mining the Internet Archive Collection [Librarians Take Note]

Wednesday, March 12th, 2014

Data Mining the Internet Archive Collection by Caleb McDaniel.

From the “Lesson Goals:”

The collections of the Internet Archive (IA) include many digitized sources of interest to historians, including early JSTOR journal content, John Adams’s personal library, and the Haiti collection at the John Carter Brown Library. In short, to quote Programming Historian Ian Milligan, “The Internet Archive rocks.”

In this lesson, you’ll learn how to download files from such collections using a Python module specifically designed for the Internet Archive. You will also learn how to use another Python module designed for parsing MARC XML records, a widely used standard for formatting bibliographic metadata.

For demonstration purposes, this lesson will focus on working with the digitized version of the Anti-Slavery Collection at the Boston Public Library in Copley Square. We will first download a large collection of MARC records from this collection, and then use Python to retrieve and analyze bibliographic information about items in the collection. For example, by the end of this lesson, you will be able to create a list of every named place from which a letter in the antislavery collection was written, which you could then use for a mapping project or some other kind of analysis.

This rocks!

In particular for librarians and library students who will already be familiar with MARC records.

Some 7,000 items from the Boston Public Library’s anti-slavery collection at Copley Square are the focus of this lesson.

That means historians have access to rich metadata, full images, and partial descriptions for thousands of antislavery letters, manuscripts, and publications.

Would original anti-slavery materials, written by actual participants, have interested you as a student? Do you think such materials would interest students now?

I first saw this in a tweet by Gregory Piatetsky.

### Bibliographic Framework Transition Initiative

Tuesday, October 30th, 2012

Bibliographic Framework Transition Initiative

The original announcement for this project lists its requirements but the requirements are not listed on the homepage.

The requirements are found at: The Library of Congress issues its initial plan for its Bibliographic Framework Transition Initiative for dissemination, sharing, and feedback (October 31, 2011) . Nothing in the link text says “requirements here” to me.

Requirements as of the original announcement:

Requirements for a New Bibliographic Framework Environment

Although the MARC-based infrastructure is extensive, and MARC has been adapted to changing technologies, a major effort to create a comparable exchange vehicle that is grounded in the current and expected future shape of data interchange is needed. To assure a new environment will allow reuse of valuable data and remain supportive of the current one, in addition to advancing it, the following requirements provide a basis for this work. Discussion with colleagues in the community has informed these requirements for beginning the transition to a "new bibliographic framework". Bibliographic framework is intended to indicate an environment rather than a "format".

• Broad accommodation of content rules and data models. The new environment should be agnostic to cataloging rules, in recognition that different rules are used by different communities, for different aspects of a description, and for descriptions created in different eras, and that some metadata are not rule based. The accommodation of RDA (Resource Description and Access) will be a key factor in the development of elements, as will other mainstream library, archive, and cultural community rules such as Anglo-American Cataloguing Rules, 2nd edition (AACR2) and its predecessors, as well as DACS (Describing Archives, a Content Standard), VRA (Visual Resources Association) Core, CCO (Cataloging Cultural Objects).
• Provision for types of data that logically accompany or support bibliographic description, such as holdings, authority, classification, preservation, technical, rights, and archival metadata. These may be accommodated through linking technological components in a modular way, standard extensions, and other techniques.
• Accommodation of textual data, linked data with URIs instead of text, and both. It is recognized that a variety of environments and systems will exist with different capabilities for communicating and receiving and using textual data and links.
• Consideration of the relationships between and recommendations for communications format tagging, record input conventions, and system storage/manipulation. While these environments tend to blur with today’s technology, a future bibliographic framework is likely to be seen less by catalogers than the current MARC format. Internal storage, displays from communicated data, and input screens are unlikely to have the close relationship to a communication format that they have had in the past.
• Consideration of the needs of all sizes and types of libraries, from small public to large research. The library community is not homogeneous in the functionality needed to support its users in spite of the central role of bibliographic description of resources within cultural institutions. Although the MARC format became a key factor in the development of systems and services, libraries implement services according to the needs of their users and their available resources. The new bibliographic framework will continue to support simpler needs in addition to those of large research libraries.
• Continuation of maintenance of MARC until no longer necessary. It is recognized that systems and services based on the MARC 21 communications record will be an important part of the infrastructure for many years. With library budgets already stretched to cover resource purchases, large system changes are difficult to implement because of the associated costs. With the migration in the near term of a large segment of the library community from AACR to RDA, we will need to have RDA-adapted MARC available. While that need is already being addressed, it is recognized that RDA is still evolving and additional changes may be required. Changes to MARC not associated with RDA should be minimal as the energy of the community focuses on the implementation of RDA and on this initiative.
• Compatibility with MARC-based records. While a new schema for communications could be radically different, it will need to enable use of data currently found in MARC, since redescribing resources will not be feasible. Ideally there would be an option to preserve all data from a MARC record.
• Provision of transformation from MARC 21 to a new bibliographic environment. A key requirement will be software that converts data to be moved from MARC to the new bibliographic framework and back, if possible, in order to enable experimentation, testing, and other activities related to evolution of the environment.

The Library of Congress (LC) and its MARC partners are interested in a deliberate change that allows the community to move into the future with a more robust, open, and extensible carrier for our rich bibliographic data, and one that better accommodates the library community’s new cataloging rules, RDA. The effort will take place in parallel with the maintenance of MARC 21 as new models are tested. It is expected that new systems and services will be developed to help libraries and provide the same cost savings they do today. Sensitivity to the effect of rapid change enables gradual implementation by systems and infrastructures, and preserves compatibility with existing data.

Ongoing discussion at: Bibliographic Framework Transition Initiative Forum, BIBFRAME@LISTSERV.LOC.GOV.

The requirements recognize a future of semantic and technological heterogeneity.

Similar to the semantic and technological heterogeneity we have now and have had in the past.

A warning to those expecting a semantic and technological rapture of homogeneity.

(I first saw this initiative at: NoSQL Bibliographic Records: Implementing a Native FRBR Datastore with Redis.)

### NoSQL Bibliographic Records:…

Tuesday, October 30th, 2012

From the background:

Using the Library of Congress Bibliographic Framework for the Digital Age as the starting point for software development requirements; the FRBR-Redis-Datastore project is a proof-of-concept for a next-generation bibliographic NoSQL system within the context of improving upon the current MARC catalog and digital repository of a small academic library at a top-tier liberal arts college.

The FRBR-Redis-Datastore project starts with a basic understanding of the MARC, MODS, and FRBR implemented using a NoSQL technology called Redis.

This presentation guides you through the theories and technologies behind one such proof-of-concept bibliographic framework for the 21st century.

Hadoop was just too complicated compared to the simple three-step Redis server set-up.

refreshing.

Simply because a technology is popular doesn’t mean it meets your requirements. Such as administration by non-full time technical experts.

An Oracle database supports applications that could manage garden club finances but that’s a poor choice under most circumstances.

The Redis part of the presentation is apparently not working (I get Python errors) as of today and I have sent a note with the error messages.

A “proof-of-concept” that merits your attention!

### MaRC and SolrMaRC

Sunday, July 8th, 2012

MaRC and SolrMaRC by Owen Stephens.

From the post:

At the recent Mashcat event I volunteered to do a session called ‘making the most of MARC’. What I wanted to do was demonstrate how some of the current ‘resource discovery’ software are based on technology that can really extract value from bibliographic data held in MARC format, and how this creates opportunities for in both creating tools for users, and also library staff.

One of the triggers for the session was seeing, over a period of time, a number of complaints about the limitations of ‘resource discovery’ solutions – I wanted to show that many of the perceived limitations were not about the software, but about the implementation. I also wanted to show that while some technical knowledge is needed, some of these solutions can be run on standard PCs and this puts the tools, and the ability to experiment and play with MARC records, in the grasp of any tech-savvy librarian or user.

Many of the current ‘resource discovery’ solutions available are based on a search technology called Solr – part of a project at the Apache software foundation. Solr provides a powerful set of indexing and search facilities, but what makes it especially interesting for libraries is that there has been some significant work already carried out to use Solr to index MARC data – by the SolrMARC project. SolrMARC delivers a set of pre-configured indexes, and the ability to extract data from MARC records (gracefully handling ‘bad’ MARC data – such as badly encoded characters etc. – as well). While Solr is powerful, it is SolrMARC that makes it easy to implement and exploit in a library context.

SolrMARC is used by two open source resource discovery products – VuFind and Blacklight. Although VuFind and Blacklight have differences, and are written in different languages (VuFind is PHP while Blacklight is Ruby), since they both use Solr and specifically SolrMARC to index MARC records the indexing and search capabilities underneath are essentially the same. What makes the difference between implementations is not the underlying technology but the configuration. The configuration allows you to define what data, from which part of the MARC records, goes into which index in Solr.

Owen explains his excitement over these tools as:

These tools excite me for a couple of reasons:

1. A shared platform for MARC indexing, with a standard way of programming extensions gives the opportunty to share techniques and scripts across platforms – if I write a clever set of bean shell scripts to calculate page counts from the 300 field (along the lines demonstrated by Tom Meehan in another Mashcat session), you can use the same scripts with no effort in your SolrMARC installation
2. The ability to run powerful, but easy to configure, search tools on standard computers. I can get Blacklight or VuFind running on a laptop (Windows, Mac or Linux) with very little effort, and I can have a few hundred thousand MARC records indexed using my own custom routines and searchable via an interface I have complete control over

I like the “geek” appeal of #2, but creating value-add interfaces for the casual user is more likely to attract positive PR for a library.

As far as #1, how uniform are the semantics of MARC fields?

I suspect physical data, page count, etc., are fairly stable/common, what about more subjective fields? How would you test that proposition?

### SolrMarc

Tuesday, October 4th, 2011

SolrMarc

From the webpage:

Solrmarc can index your marc records into apache solr. It also comes with an improved version of marc4j that improves handling of UTF-8 characters, is more forgiving of malformed marc data, and can recover from data errors gracefully. This indexer is used by blacklight (http://blacklight.rubyforge.org) and vufind (http://www.vufind.org/) but it can also be used as a standalone project.

Nice if short discussion of custom indexing with SolrMarc.

### Springer MARC Records

Saturday, October 1st, 2011

Springer Marc Records

From the webpage:

Springer offers two options for MARC records for Springer eBook collections:

1. Free Springer MARC records, SpringerProtocols MARC records & eBook Title Lists

• Available free of charge
• Generated using Springer metadata containing most common fields
• Pick, download and install Springer MARC records in 4 easy steps

2.Free OCLC MARC records

• Available free of charge
• More enhanced MARC records
• Available through OCLC WORLDCAT service

This looks like very good topic map fodder.

I saw this at all things cataloged.

### MARCXML to Topic Map – Sneak Preview

Wednesday, July 21st, 2010

Wandora – Sneak Preview offers support for converting MARCXML into a topic map. This link will go away when the official Wandora release supports this feature.

Aki Kivelä’s posted details at: [topicmapmail] MARCXML to Topic Maps implementation!

Aki also created an example if you don’t want to install Wandora to see this feature: Example MARCXML to topic map conversion.

As Aki would be the first to admit, this isn’t a finished solution. It is an important step on the way towards one possible solution.

Another important step is for members of this list t0 use, evaluate, test the software and give constructive feedback. Can be negative but try to offer a solution for any problem you uncover.