Data Mining the Internet Archive Collection [Librarians Take Note]

Wednesday, March 12th, 2014

Data Mining the Internet Archive Collection by Caleb McDaniel.

From the “Lesson Goals:”

The collections of the Internet Archive (IA) include many digitized sources of interest to historians, including early JSTOR journal content, John Adams’s personal library, and the Haiti collection at the John Carter Brown Library. In short, to quote Programming Historian Ian Milligan, “The Internet Archive rocks.”

In this lesson, you’ll learn how to download files from such collections using a Python module specifically designed for the Internet Archive. You will also learn how to use another Python module designed for parsing MARC XML records, a widely used standard for formatting bibliographic metadata.

For demonstration purposes, this lesson will focus on working with the digitized version of the Anti-Slavery Collection at the Boston Public Library in Copley Square. We will first download a large collection of MARC records from this collection, and then use Python to retrieve and analyze bibliographic information about items in the collection. For example, by the end of this lesson, you will be able to create a list of every named place from which a letter in the antislavery collection was written, which you could then use for a mapping project or some other kind of analysis.

This rocks!

In particular for librarians and library students who will already be familiar with MARC records.

Some 7,000 items from the Boston Public Library’s anti-slavery collection at Copley Square are the focus of this lesson.

That means historians have access to rich metadata, full images, and partial descriptions for thousands of antislavery letters, manuscripts, and publications.

Would original anti-slavery materials, written by actual participants, have interested you as a student? Do you think such materials would interest students now?

I first saw this in a tweet by Gregory Piatetsky.

MARCXML to Topic Map – Sneak Preview

Wednesday, July 21st, 2010

Wandora – Sneak Preview offers support for converting MARCXML into a topic map. This link will go away when the official Wandora release supports this feature.

Aki Kivelä’s posted details at: [topicmapmail] MARCXML to Topic Maps implementation!

Aki also created an example if you don’t want to install Wandora to see this feature: Example MARCXML to topic map conversion.

As Aki would be the first to admit, this isn’t a finished solution. It is an important step on the way towards one possible solution.

Another important step is for members of this list t0 use, evaluate, test the software and give constructive feedback. Can be negative but try to offer a solution for any problem you uncover.