Archive for the ‘MARCXML’ Category

Black Womxn Authors, Library of Congress and MarcXML (Part 2)

Thursday, April 20th, 2017

(After writing this post I got a message from Clifford Anderson on a completely different way to approach the Marc to XML problem. A very neat way. But, I thought the directions on installing MarcEdit on Ubuntu 16.04 would be helpful anyway. More on Clifford’s suggestion to follow.)

If your just joining, read Black Womxn Authors, Library of Congress and MarcXML (Part 1) for the background on why this flurry of installation is at all meaningful!

The goal is to get a working copy of MarcEdit installed on my Ubuntu 16.04 machine.

MarcEdit Linux Installation Instructions reads in part:

Installation Steps:

  1. Download the MarcEdit app bundle. This file has been zipped to reduce the download size. http://marcedit.reeset.net/software/marcedit.bin.zip
  2. Unzip the file and open the MarcEdit folder. Find the Install.txt file and read it.
  3. Ensure that you have the Mono framework installed. What is Mono? Mono is an open source implementation of Microsoft’s .NET framework. The best way to describe it is that .NET is very Java-like; it’s a common runtime that can work across any platform in which the framework has been installed. There are a number of ways to get the Mono framework — for MarcEdit’s purposes, it is recommended that you download and install the official package available from the Mono Project’s website. You can find the Mac OSX download here: http://www.go-mono.com/mono-downloads/download.html
  4. Run MarEdit via the command-line using mono MarcEdit.exe from within the MarcEdit directory.

Well, sort of. 😉

First, you need to go to the Mono Project Download page. From there, under Xamarin packages, follow Debian, Ubuntu, and derivatives.

There is a package for Ubuntu 16.10, but it’s Mono 4.2.1. By installing the Xamarin packages, I am running Mono 4.7.0. Your call but as a matter of habit, I run the latest compatible packages.

Updating your package lists for Debian, Ubuntu, and derivatives:

Add the Mono Project GPG signing key and the package repository to your system (if you don’t use sudo, be sure to switch to root):

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 3FA7E0328081BFF6A14DA29AA6A19B38D3D831EF

echo "deb http://download.mono-project.com/repo/debian wheezy main" | sudo tee /etc/apt/sources.list.d/mono-xamarin.list

And for Ubuntu 16.10:

echo "deb http://download.mono-project.com/repo/debian wheezy-apache24-compat main" | sudo tee -a /etc/apt/sources.list.d/mono-xamarin.list

Now run:

sudo apt-get update

The Usage section suggests:

The package mono-devel should be installed to compile code.

The package mono-complete should be installed to install everything – this should cover most cases of “assembly not found” errors.

The package referenceassemblies-pcl should be installed for PCL compilation support – this will resolve most cases of “Framework not installed: .NETPortable” errors during software compilation.

The package ca-certificates-mono should be installed to get SSL certificates for HTTPS connections. Install this package if you run into trouble making HTTPS connections.

The package mono-xsp4 should be installed for running ASP.NET applications.

Find and select mono-complete first. Most decent package managers will show dependencies that will be installed. Add any of these that were missed.

Do follow the hints here to verify that Mono is working correctly.

Are We There Yet?

Not quite. It was at this point that I unpacked http://marcedit.reeset.net/software/marcedit.bin.zip and discovered there is no “Install.txt file.” Rather there is a linux_install.txt, which reads:

a) Ensure that the dependencies have been installed
1) Dependency list:
i) MONO 3.4+ (Runtime plus the System.Windows.Forms library [these are sometimes separate])
ii) YAZ 5 + YAZ 5 develop Libraries + YAZ++ ZOOM bindings
iii) ZLIBC libraries
iV) libxml2/libxslt libraries
b) Unzip marcedit.zip
c) On first run:
a) mono MarcEdit.exe
b) Preferences tab will open, click on other, and set the following two values:
i) Temp path: /tmp/
ii) MONO path: [to your full mono path]

** For Z39.50 Support
d) Yaz.Sharp.dll.config — ensure that the dllmap points to the correct version of the shared libyaz object.
e) main_icon.bmp can be used for a desktop icon

Opps! Without unzipping marcedit.zip, you won’t see the dependencies:

ii) YAZ 5 + YAZ 5 develop Libraries + YAZ++ ZOOM bindings
iii) ZLIBC libraries
iV) libxml2/libxslt libraries

The YAZ site has a readme file for Ubuntu, but here is the very abbreviated version:


wget http://ftp.indexdata.dk/debian/indexdata.asc
sudo apt-key add indexdata.asc

echo "deb http://ftp.indexdata.dk/ubuntu xenial main" | sudo tee -a /etc/apt/sources.list
echo "deb-src http://ftp.indexdata.dk/ubuntu xenial main" | sudo tee -a /etc/apt/sources.list

(That sequence only works for Ubuntu xenial. See the readme file for other versions.)

Of course:

sudo apt-get update

As of of today, you are looking for yaz 5.21.0-1 and libyaz5-dev 5.21.0-1.

Check for and/or install ZLIBC and libxml2/libxslt libraries.

Personal taste but I reboot at this point to make sure all the libraries re-load to the correct versions, etc. Should work without rebooting but that’s up to you.

Fire it up with

mono MarcEdit.ext

Choose Locations (not Other) and confirm “Set Temporary Path:” is /tmp/ and MONO Path (the location of mono, try which mono, input the results and select OK.

I did the install on Sunday evening and so after all this, the software on loading announces it has been ungraded! Yes, while I was installing all the dependencies, a new and improved version of MarcEdit was posted.

The XML extraction is a piece of cake so I am working on the XQuery on the resulting MarcXML records for part 3.

Black Womxn Authors, Library of Congress and MarcXML (Part 1)

Monday, April 17th, 2017

This adventure started innocently enough with the 2017 Womxn of Color Reading Challenge by Der Vang. As an “older” White male Southerner working in technology, I don’t encounter works by womxn of color unless it is intentional.

The first book, “A book that became a movie,” was easy. I read the deeply moving Beloved by Toni Morrison. I recommend reading a non-critical edition before you read a critical one. Let Morrison speak for herself before you read others offering their views on the story.

The second book, “A book that came out the year you were born,” have proven to be more difficult. Far more difficult. You see I think Der Vang was assuming a reading audience younger than I am, for which womxn of color authors would not be difficult to find. That hasn’t proven to be the case for me.

I searched the usual places but likely collections did not denote an author’s gender or race. The Atlanta-Fulton Public Library reference service came riding to the rescue after I had exhausted my talents with this message:

‘Attached is a “List of Books Published by Negro Writers in 1954 and Late 1953” (pp. 10-12) by Blyden Jackson, IN “The Blithe Newcomers: Resume of Negro Literature in 1954: Part I,” Phylon v.16, no.1 (1st Quarter 1955): 5-12, which has been annotated with classifications (Biography) or subjects (Poetry). Thirteen are written by women; however, just two are fiction. The brief article preceding the list does not mention the books by the women novelists–Elsie Jordan (Strange Sinner) or Elizabeth West Wallace (Scandal at Daybreak). No Part II has been identified. And AARL does not own these two. Searching AARL holdings in Classic Catalog by year yields seventeen by women but no fiction. Most are biographies. Two is better than none but not exactly a list.

A Celebration of Women Writers – African American Writers (http://digital.library.upenn.edu/women/_generate/
AFRICAN%20AMERICAN.html
) seems to have numerous [More Information] links which would possibly allow the requestor to determine the 1954 novelists among them.’
(emphasis in original)

Using those two authors/titles as leads, I found in the Library of Congress online catalog:

https://lccn.loc.gov/54007603
Jordan, Elsie. Strange sinner / Elsie Jordan. 1st ed. New York : Pageant, c1954.
172 p. ; 21 cm.
PZ4.J818 St

https://lccn.loc.gov/54012342
Wallace, Elizabeth West. [from old catalog] Scandal at daybreak. [1st ed.] New York, Pageant Press [1954]
167 p. 21 cm.
PZ4.W187 Sc

Checking elsewhere, both titles are out of print, although I did see one (1) copy of Elise Jordan’s Strange Sinner for $100. I think I have located a university with a digital scan but will have to report back on that later.

Since both Jordan and Wallace published with Pageant Press the same year, I reasoned that other womxn of color may have also published with them and that could lead me to more accessible works.

Experienced librarians are no doubt already grinning because if you search for “Pageant Press,” with the Library of Congress online catalog, you get 961 “hits,” displayed 25 “hits” at a time. Yes, you can set the page to return 100 “hits at a time, but not while you have sort by date of publication selected. 🙁

That is you can display 100 “hits” per page in no particular order, or, you can display the “hits” in date of publication order, but only 25 “hits” at a time. (Or at least that was my experience, please correct me if that’s wrong.)

But, with the 100 “hits” per page, you can “save as,” but only as Marc records, Unicode (UTF-8) or not. No MarcXML format.

In the response to my query about the same, the response from the Library of Congress reads:

At the moment we have no plans to provide an option to save search results as MARCXML. We will consider it for future development projects.

I can understand that in the current climate in Washington but a way to convert Marc records to the easier (in my view) to manipulate MarcXMLformat, would be a real benefit to readers and researchers alike.

Fortunately there is a solution, MarcEdit.

From the webpage:

This LibGuide attempts to document the features of MarcEdit, which was developed by Terry Reese. It is open source software designed to facilitate the harvesting, editing, and creation of MARC records. This LibGuide was adapted from a standalone document, and while the structure of the original document has been preserved in this LibGuide, it is also available in PDF form at the link below. The original documentation and this LibGuide were written with the idea that it would be consulted on an as-needed basis. As a result, the beginning steps of many processes may be repeated within the same page or across the LibGuide as a whole so that users would be able to understand the entire process of implementing a function within MarcEdit without having to consult other guides to know where to begin. There are also screenshots that are repeated throughout, which may provide a faster reference for users to understand what steps they may already be familiar with.

Of course, installing MarcEdit on Ubuntu, isn’t a straightforward task. But I have 961 Marc records and possibly more that would be very useful in MarcXML. Tomorrow I will document the installation steps I followed with Ubuntu 16.04.

PS: I’m not ignoring the suggested A Celebration of Women Writers – African American Writers (http://digital.library.upenn.edu/women/_generate/
AFRICAN%20AMERICAN.html)
. But I have gotten distracted by the technical issue of how to convert all the holdings at the Library of Congress for a publisher into MarcXML. Suggestions on how to best use this resource?

Data Mining the Internet Archive Collection [Librarians Take Note]

Wednesday, March 12th, 2014

Data Mining the Internet Archive Collection by Caleb McDaniel.

From the “Lesson Goals:”

The collections of the Internet Archive (IA) include many digitized sources of interest to historians, including early JSTOR journal content, John Adams’s personal library, and the Haiti collection at the John Carter Brown Library. In short, to quote Programming Historian Ian Milligan, “The Internet Archive rocks.”

In this lesson, you’ll learn how to download files from such collections using a Python module specifically designed for the Internet Archive. You will also learn how to use another Python module designed for parsing MARC XML records, a widely used standard for formatting bibliographic metadata.

For demonstration purposes, this lesson will focus on working with the digitized version of the Anti-Slavery Collection at the Boston Public Library in Copley Square. We will first download a large collection of MARC records from this collection, and then use Python to retrieve and analyze bibliographic information about items in the collection. For example, by the end of this lesson, you will be able to create a list of every named place from which a letter in the antislavery collection was written, which you could then use for a mapping project or some other kind of analysis.

This rocks!

In particular for librarians and library students who will already be familiar with MARC records.

Some 7,000 items from the Boston Public Library’s anti-slavery collection at Copley Square are the focus of this lesson.

That means historians have access to rich metadata, full images, and partial descriptions for thousands of antislavery letters, manuscripts, and publications.

Would original anti-slavery materials, written by actual participants, have interested you as a student? Do you think such materials would interest students now?

I first saw this in a tweet by Gregory Piatetsky.

MARCXML to Topic Map – Sneak Preview

Wednesday, July 21st, 2010

Wandora – Sneak Preview offers support for converting MARCXML into a topic map. This link will go away when the official Wandora release supports this feature.

Aki Kivelä’s posted details at: [topicmapmail] MARCXML to Topic Maps implementation!

Aki also created an example if you don’t want to install Wandora to see this feature: Example MARCXML to topic map conversion.

As Aki would be the first to admit, this isn’t a finished solution. It is an important step on the way towards one possible solution.

Another important step is for members of this list t0 use, evaluate, test the software and give constructive feedback. Can be negative but try to offer a solution for any problem you uncover.