Archive for the ‘Museums’ Category

Digitised Manuscripts hyperlinks Spring 2017

Thursday, June 1st, 2017

Digitised Manuscripts hyperlinks Spring 2017

From the post:

From ancient papyri to a manuscript given by the future Queen Elizabeth I to King Henry VIII for New Year’s Day, from books written entirely in gold to Leonardo da Vinci’s notebook, there is a wealth of material on the British Library’s Digitised Manuscripts site. At the time of writing, you can view on Digitised Manuscripts no fewer than 1,783 manuscripts made in Europe before 1600, and more are being added all the time. For a full list of what is currently available, please see this file: Download PDF of Digitised MSS Spring 2017. This is also available in the form of a spreadsheet (although this format can not be downloaded on all web browsers): Download Spreadsheet of Digitised MSS Spring 2017.

The post is replete with guidance on use of the Digitised Manuscripts and other aids for the reader.

These works won’t interest Washington illiterati, but I don’t read to please others, only myself.

So should you.

30,000 Getty Museum Images Published Online as IIIF

Thursday, June 1st, 2017

30,000 Getty Museum Images Published Online as IIIF by Rob Sanderson.

From the post:

Today we published more than 30,000 images from the Getty Museum’s collection on using IIIF. You can see and click on the red-and-blue logo underneath the main image of any of the Museum collections, such as Van Gogh’s Irises, to explore our content through any IIIF-compatible viewer.

We’re happy to join another IIIF partner, the Yale Center for British Art, which is also releasing images as IIIF today—you can read their announcement here and browse their collection here.

About IIIF

IIIF (pronounced “triple eye eff”) is the acronym for the International Image Interoperability Framework. This framework comes from a broad community of primarily cultural heritage organizations that are working together to come to practical consensus around the publishing of digital images. By adopting the framework, the public as well as scholars can bring together images from any of the participating organizations for comparison, manipulation, and annotation in a single user interface. This community has agreed upon, published, and implemented two major specifications. Representing the Getty in this community, and working toward implementation of IIIF here, has been one of my major roles since joining the Getty as semantic architect.

The images now available via IIIF are from the Open Content Program. These were selected as the first tranche of content, as the rights have already been cleared to make them openly available. Any new images added to the Open Content set will automatically be available via IIIF, and images from Getty Research Institute collections are expected to be available before the end of the year.

I could attempt to describe the visualization capabilities of IIIF, but it’s best that you explore Van Gogh’s Irises on your own.


Egyptological Museum Search

Tuesday, November 22nd, 2016

Egyptological Museum Search

From the post:

The Egyptological museum search is a PHP tool aimed to facilitate locating the descriptions and images of ancient Egyptian objects in online catalogues of major museums. Online catalogues (ranging from selections of highlights to complete digital inventories) are now offered by almost all major museums holding ancient Egyptian items and have become indispensable in research work. Yet the variety of web interfaces and of search rules may overstrain any person performing many searches in different online catalogues.

Egyptological museum search was made to provide a single search point for finding objects by their inventory numbers in major collections of Egyptian antiquities that have online catalogues. It tries to convert user input into search queries recognised by museums’ websites. (Thus, for example, stela Geneva D 50 is searched as “D 0050,” statue Vienna ÄS 5046 is searched as “AE_INV_5046,” and coffin Turin Suppl. 5217 is searched as “S. 05217.”) The following online catalogues are supported:

The search interface uses a short list of aliases for museums.

Once you see/use the interface proper, here, I hope you are interested in volunteering to improve it.

Download 422 Free Art Books from The Metropolitan Museum of Art

Tuesday, April 7th, 2015

Download 422 Free Art Books from The Metropolitan Museum of Art by Colin Marshall.

From the post:


You could pay $118 on Amazon for the Metropolitan Museum of Art’s catalog The Art of Illumination: The Limbourg Brothers and the Belles Heures of Jean de France, Duc de Berry. Or you could pay $0 to download it at MetPublications, the site offering “five decades of Met Museum publications on art history available to read, download, and/or search for free.” If that strikes you as an obvious choice, prepare to spend some serious time browsing MetPublications’ collection of free art books and catalogs.

Judging from the speed of my download today, this is a really popular announcement!

Stash this with your other links for art, artwork, etc. as resources for a topic map.

Galleries, Libraries, Archives, and Museums (GLAM CC Licensing)

Friday, March 6th, 2015

Galleries, Libraries, Archives, and Museums (GLAM CC Licensing)

A very extensive list of galleries, libraries, archives, and museums (GLAM) that are using CC licensing.

A good resource to have at hand if you need to argue for CC licensing with your gallerys, library, archive, or museum.

I first saw this in a tweet by Adrianne Russell.

Update: Resource List for March 5 Open Licensing Online Program

Museums: The endangered dead [Physical Big Data]

Saturday, February 21st, 2015

Museums: The endangered dead by Christopher Kemp.

Ricardo Moratelli surveys several hundred dead bats — their wings neatly folded — in a room deep inside the Smithsonian Institution in Washington DC. He moves methodically among specimens arranged in ranks like a squadron of bombers on a mission. Attached to each animal’s right ankle is a tag that tells Moratelli where and when the creature was collected, and by whom. Some of the tags have yellowed with age — they mark bats that were collected more than a century ago. Moratelli selects a small, compact individual with dark wings and a luxurious golden pelage. It fits easily in his cupped palm.

To the untrained eye, this specimen looks identical to the rest. But Moratelli, a postdoctoral fellow at the Smithsonian’s National Museum of Natural History, has discovered that the bat in his hands is a new species. It was collected in February 1979 in an Ecuadorian forest on the western slopes of the Andes. A subadult male, it has been waiting for decades for someone such as Moratelli to recognize its uniqueness. He named it Myotis diminutus1. Before Moratelli could take that step, however, he had to collect morphometric data — precise measurements of the skull and post-cranial skeleton — from other specimens. In all, he studied 3,000 other bats from 18 collections around the world.

Myotis diminutus is not alone. And neither is Ricardo Moratelli.

Across the world, natural-history collections hold thousands of species awaiting identification. In fact, researchers today find many more novel animals and plants by sifting through decades-old specimens than they do by surveying tropical forests and remote landscapes. An estimated three-quarters of newly named mammal species are already part of a natural-history collection at the time they are identified. They sometimes sit unrecognized for a century or longer, hidden in drawers, half-forgotten in jars, misidentified, unlabelled.

A reminder that not all “big data” is digital, at least not yet.

The number of specimens already collected number in the billions worldwide. As Chris makes clear, many are languishing for lack of curators and in some cases, the collected specimens are the only evidence such creatures ever lived on the Earth.

Vint Cerf (“Father of the Internet,” not Al Gore) has warned of a “…’forgotten century…” of digital data.

As bad as a lost century of digital data may sound, our neglect of natural history collections threatens the loss millions of years of evolutionary history, forever.

PS: Read Chris’ post in full and push for greater funding for natural history collections. The history we save may turn out to be critically important.

Unsustainable Museum Data

Friday, January 30th, 2015

Unsustainable Museum Data by Matthew Lincoln.

From the post:

In which I ask museums to give less API, more KISS and LOCKSS, please.

“How can we ensure our [insert big digital project title here] is sustainable?” So goes the cry from many a nascent digital humanities project, and rightly so! We should be glad that many new ventures are starting out by asking this question, rather than waiting until the last minute to come up with a sustainability plan. But Adam Crymble asks whether an emphasis on web-based digital projects instead of producing and sharing static data files is needlessly worsening our sustainability problem. Rather than allowing users to download the underlying data files (a passel of data tables, or marked-up text files, or even serialized linked data), these web projects mediate those data with user interfaces and guided searching, essentially making the data accessible to the casual user. But serving data piecemeal to users has its drawbacks, notes Crymble. If and when the web server goes down, access to the data disappears:

When something does go wrong we quickly realise it wasn’t the website we needed. It was the data, or it was the functionality. The online element, which we so often see as an asset, has become a liability.

I would broaden the scope of this call to include library and other data as well. Yes, APIs can be very useful but so can a copy of the original data.

Matthew mentions “creative re-use” near the end of his post but I would highlight that as a major reason for providing the original data. No doubt museums and others work very hard at offering good APIs of data but any API is only one way to obtain and view data.

For data, any data, to reach its potential, it needs to be available for multiple views of the same data. Some you may think are better, some you may think are worse than the original. But it is the potential for a multiplicity of views that opens up those possibilities. Keeping data behind an API is an act of preventing data from reaching its potential.

Likenesses Within the Reach of All

Tuesday, December 2nd, 2014

Likenesses Within the Reach of All

From the webpage:

The Southern Cartes de Visite Collection is a recently digitized group of 3,356 photographs from circa 1850 to 1900. The map below depicts the locations of the collection’s photographers, studios, and galleries between about 1850 and 1900. Users can browse the map and select locations to see information and examples of the cartes-de-visite taken there. Users can also filter the collection by photographer and zoom in to cities like Baltimore, Louisville, or New Orleans to see the individual studio addresses. By clicking on the locations, users can access an Acumen link to see the photographs and manipulate them as if they were in the archive.

Great resource for Southern history buffs who want to map between period resources that are online.

Then, like now, some people were more photogenic than others. 😉

I first saw this in a tweet by Stewart Varner.

National Museum of Denmark – Images

Friday, August 29th, 2014

Nationalmuseet frigiver tusindvis af historiske fotos

The National Museum of Denmark has released nearly 50,000 images with a long term goal of 750,000 images, licensed under Creative Commons license BY-SA to the photos where the museum owns the copyright.

Should have an interesting impact on object recognition in images. What objects are “common” in a particular period? What objects are associated with particular artists or themes?


I first saw this in a tweet by Michael Peter Edson.

Cooper Hewitt, Color Interface

Tuesday, July 29th, 2014

From the about page:

Cooper Hewitt, Smithsonian Design Museum is the only museum in the nation devoted exclusively to historic and contemporary design. The Museum presents compelling perspectives on the impact of design on daily life through active educational and curatorial programming.

It is the mission of Cooper Hewitt’s staff and Board of Trustees to advance the public understanding of design across the thirty centuries of human creativity represented by the Museum’s collection. The Museum was founded in 1897 by Amy, Eleanor, and Sarah Hewitt—granddaughters of industrialist Peter Cooper—as part of The Cooper Union for the Advancement of Science and Art. A branch of the Smithsonian since 1967, Cooper-Hewitt is housed in the landmark Andrew Carnegie Mansion on Fifth Avenue in New York City.

I thought some background might be helpful because the Cooper Hewitt has a new interface:


Color, or colour, is one of the attributes we’re interested in exploring for collection browsing. Bearing in mind that only a fraction of our collection currently has images, here’s a first pass.

Objects with images now have up to five representative colors attached to them. The colors have been selected by our robotic eye machines who scour each image in small chunks to create color averages. These have then been harvested and “snapped” to the grid of 120 different colors — derived from the CSS3 palette and naming conventions — below to make navigation a little easier.

My initial reaction was to recall the old library joke where a patron comes to the circulation desk and doesn’t know a book’s title or author, but does remember it had a blue cover. 😉 At which point you wish Basil from Faulty Towers was manning the circulation desk. 😉

It may be a good idea with physical artifacts because color/colour is a fixed attribute that may be associated with a particular artifact.

If you know the collection, you can amuse yourself by trying to guess what objects will be returned for particular colors.

BTW, the collection is interlinked by people, roles, periods, types, countries. Very impressive!

Don’t miss the resources for developers at: and their GitHub account.

I first saw this in a tweet by Lyn Marie B.

PS: The use of people, roles, objects, etc. for browsing has a topic map-like feel. Since their data and other resources are downloadable, more investigation will follow.

US Museums

Wednesday, June 18th, 2014

US Museums

From the webpage:

There are over 35,000 museums in the United States! Click on one below, browse around, or use the search form to find the next one you will visit.

Minimal information at present but certainly a starting place for collaboration and enrichment!

Such as harvesting the catalogs of the museums that have them online.


I first saw this in a tweet by Lincoln Mullen.

Getty Art & Architecture Thesaurus Now Available

Saturday, February 22nd, 2014

Art & Architecture Thesaurus Now Available as Linked Open Data by James Cuno.

From the post:

We’re delighted to announce that today, the Getty has released the Art & Architecture Thesaurus (AAT)® as Linked Open Data. The data set is available for download at under an Open Data Commons Attribution License (ODC BY 1.0).

The Art & Architecture Thesaurus is a reference of over 250,000 terms on art and architectural history, styles, and techniques. It’s one of the Getty Research Institute’s four Getty Vocabularies, a collection of databases that serves as the premier resource for cultural heritage terms, artists’ names, and geographical information, reflecting over 30 years of collaborative scholarship. The other three Getty Vocabularies will be released as Linked Open Data over the coming 18 months.

In recent months the Getty has launched the Open Content Program, which makes thousands of images of works of art available for download, and the Virtual Library, offering free online access to hundreds of Getty Publications backlist titles. Today’s release, another collaborative project between our scholars and technologists, is the next step in our goal to make our art and research resources as accessible as possible.

What’s Next

Over the next 18 months, the Research Institute’s other three Getty Vocabularies—The Getty Thesaurus of Geographic Names (TGN)®, The Union List of Artist Names®, and The Cultural Objects Name Authority (CONA)®—will all become available as Linked Open Data. To follow the progress of the Linked Open Data project at the Research Institute, see their page here.

A couple of points of particular interest:

Getty documentation says this is the first industrial application of ISO 25964 Information and documentation – Thesauri and interoperability with other vocabularies..

You will probably want to read AAT Semantic Representation rather carefully.

A great source of data and interesting reading on the infrastructure as well.

I first saw this in a tweet by Semantic Web Company.

CIDOC Conceptual Reference Model

Saturday, February 22nd, 2014

CIDOC Conceptual Reference Model (pdf)

From the “Definition of the CIDOC Conceptual Reference Model:”

This document is the formal definition of the CIDOC Conceptual Reference Model (“CRM”), a formal ontology intended to facilitate the integration, mediation and interchange of heterogeneous cultural heritage information. The CRM is the culmination of more than a decade of standards development work by the International Committee for Documentation (CIDOC) of the International Council of Museums (ICOM). Work on the CRM itself began in 1996 under the auspices of the ICOM-CIDOC Documentation Standards Working Group. Since 2000, development of the CRM has been officially delegated by ICOM-CIDOC to the CIDOC CRM Special Interest Group, which collaborates with the ISO working group ISO/TC46/SC4/WG9 to bring the CRM to the form and status of an International Standard.

Objectives of the CIDOC CRM

The primary role of the CRM is to enable information exchange and integration between heterogeneous sources of cultural heritage information. It aims at providing the semantic definitions and clarifications needed to transform disparate, localised information sources into a coherent global resource, be it with in a larger institution, in intranets or on the Internet. Its perspective is supra-institutional and abstracted from any specific local context. This goal determines the constructs and level of detail of the CRM.

More specifically, it defines and is restricted to the underlying semantics of database schemata and document structures used in cultural heritage and museum documentation in terms of a formal ontology. It does not define any of the terminology appearing typically as data in the respective data structures; however it foresees the characteristic relationships for its use. It does not aim at proposing what cultural institutions should document. Rather it explains the logic of what they actually currently document, and thereby enables semantic interoperability.

It intends to provide a model of the intellectual structure of cultural documentation in logical terms. As such, it is not optimised for implementation-specific storage and processing aspects. Implementations may lead to solutions where elements and links between relevant elements of our conceptualizations are no longer explicit in a database or other structured storage system. For instance the birth event that connects elements such as father, mother, birth date, birth place may not appear in the database, in order to save storage space or response time of the system. The CRM allows us to explain how such apparently disparate entities are intellectually interconnected, and how the ability of the database to answer certain intellectual questions is affected by the omission of such elements and links.

The CRM aims to support the following specific functionalities:

  • Inform developers of information systems as a guide to good practice in conceptual modelling, in order to effectively structure and relate information assets of cultural documentation.
  • Serve as a common language for domain experts and IT developers to formulate requirements and to agree on system functionalities with respect to the correct handling of cultural contents.
  • To serve as a formal language for the identification of common information contents in different data formats; in particular to support the implementation of automatic data transformation algorithms from local to global data structures without loss of meaning. The latter being useful for data exchange, data migration from legacy systems, data information integration and mediation of heterogeneous sources.
  • To support associative queries against integrated resources by providing a global model of the basic classes and their associations to formulate such queries.
  • It is further believed, that advanced natural language algorithms and case-specific heuristics can take significant advantage of the CRM to resolve free text information into a formal logical form, if that is regarded beneficial. The CRM is however not thought to be a means to replace scholarly text, rich in meaning, by logical forms, but only a means to identify related data.

(emphasis in original)

Apologies for the long quote but this covers a number of important topic map issues.

For example:

For instance the birth event that connects elements such as father, mother, birth date, birth place may not appear in the database, in order to save storage space or response time of the system. The CRM allows us to explain how such apparently disparate entities are intellectually interconnected, and how the ability of the database to answer certain intellectual questions is affected by the omission of such elements and links.

In topic map terms I would say that the database omits a topic to represent “birth event” and therefore there is no role player for an association with the various role players. What subjects will have representatives in a topic map is always a concern for topic map authors.

Helpfully, CIDOC explicitly separates the semantics it documents from data structures.

Less helpfully:

Because the CRM’s primary role is the meaningful integration of information in an Open World, it aims to be monotonic in the sense of Domain Theory. That is, the existing CRM constructs and the deductions made from them must always remain valid and well-formed, even as new constructs are added by extensions to the CRM.

Which restricts integration using CRM to systems where CRM is the primary basis for integration, as opposed to be one way to integrate several data sets.

That may not seem important in “web time,” where 3 months equals 1 Internet year. But when you think of integrating data and integration practices as they evolve over decades if not centuries, the limitations of monotonic choices come to the fore.

To take one practical discussion under way, how to handle warning about radioactive waste, which must endure anywhere from 10,000 to 1,000,000 years? A far simpler task than preserving semantics over centuries.

If you think that is easy, remember that lots of people saw the pyramids of Egypt being built. But it was such common knowledge, that no one thought to write it down.

Preservation of semantics is a daunting task.

CIDOC merits a slow read by anyone interested in modeling, semantics, vocabularies, and preservation.

PS: CIDOC: Conceptual Reference Model as a Word file.

Wellcome Images

Tuesday, January 21st, 2014

Thousands of years of visual culture made free through Wellcome Images

From the post:

We are delighted to announce that over 100,000 high resolution images including manuscripts, paintings, etchings, early photography and advertisements are now freely available through Wellcome Images.

Drawn from our vast historical holdings, the images are being released under the Creative Commons Attribution (CC-BY) licence.

This means that they can be used for commercial or personal purposes, with an acknowledgement of the original source (Wellcome Library, London). All of the images from our historical collections can be used free of charge.

The images can be downloaded in high-resolution directly from the Wellcome Images website for users to freely copy, distribute, edit, manipulate, and build upon as you wish, for personal or commercial use. The images range from ancient medical manuscripts to etchings by artists such as Vincent Van Gogh and Francisco Goya.

The earliest item is an Egyptian prescription on papyrus, and treasures include exquisite medieval illuminated manuscripts and anatomical drawings, from delicate 16th century fugitive sheets, whose hinged paper flaps reveal hidden viscera to Paolo Mascagni’s vibrantly coloured etching of an ‘exploded’ torso.

Other treasures include a beautiful Persian horoscope for the 15th-century prince Iskandar, sharply sketched satires by Rowlandson, Gillray and Cruikshank, as well as photography from Eadweard Muybridge’s studies of motion. John Thomson’s remarkable nineteenth century portraits from his travels in China can be downloaded, as well a newly added series of photographs of hysteric and epileptic patients at the famous Salpêtrière Hospital

Semantics or should I say semantic confusion is never far away. While viewing an image of Gladstone as Scrooge:


When “search by keyword” offered “colonies,” I assumed either the colonies of the UK at the time.

Imagine my surprise when among other images, Wellcome Images offered:

petri dish

The search by keywords had found fourteen petri dish images, three images of Batavia, seven maps of India (salt, leporsy), one half naked woman being held down, and the Gladstone image from earlier.

About what one expects from search these days but we could do better. Much better.

I first saw this in a tweet by Neil Saunders.

Download Cooper-Hewitt Collections Data

Sunday, November 3rd, 2013

Download Cooper-Hewitt Collections Data

From the post:

Cooper-Hewitt is committed to making its collection data available for public access. To date, we have made public approximately 60% of the documented collection available online. Whilst we have a web interface for searching the collection, we are now also making the dataset available for free public download. By being able to see “everything” at once, new connections and understandings may emerge.

What is it?

The download contains only text metadata, or “tombstone” information—a brief object description that includes temporal, geographic, and provenance information—for over 120,000 objects.

Is it complete?

No. The data is only tombstone information. Tombstone information is the raw data that is created by museum staff at the time of acquisition for recording the basic ‘facts’ about an object. As such, it is unedited. Historically, museum staff have used this data only for identifying the object, tracking its whereabouts in storage or exhibition, and for internal report and label creation. Like most museums, Cooper-Hewitt had never predicted that the public might use technologies, such as the web, to explore museum collections in the way that they do now. As such, this data has not been created with a “public audience” in mind. Not every field is complete for each record, nor is there any consistency in the way in which data has been entered over the many years of its accumulation. Considerable additional information is available in research files that have not yet been digitized and, as the research work of the museum is ongoing, the records will continue to be updated and change over time.

Which all sounds great, if you know what the Cooper-Hewitt collection houses.

From the about page:

Smithsonian’s Cooper-Hewitt, National Design Museum is the only museum in the nation devoted exclusively to historic and contemporary design. The Museum presents compelling perspectives on the impact of design on daily life through active educational and curatorial programming.

It is the mission of Cooper-Hewitt’s staff and Board of Trustees to advance the public understanding of design across the thirty centuries of human creativity represented by the Museum’s collection. The Museum was founded in 1897 by Amy, Eleanor, and Sarah Hewitt—granddaughters of industrialist Peter Cooper—as part of The Cooper Union for the Advancement of Science and Art. A branch of the Smithsonian since 1967, Cooper-Hewitt is housed in the landmark Andrew Carnegie Mansion on Fifth Avenue in New York City.

The campus also includes two historic townhouses renovated with state-of-the-art conservation technology and a unique terrace and garden. Cooper-Hewitt’s collections include more than 217,000 design objects and a world-class design library. Its exhibitions, in-depth educational programs, and on-site, degree-granting master’s program explore the process of design, both historic and contemporary. As part of its mission, Cooper-Hewitt annually sponsors the National Design Awards, a prestigious program which honors innovation and excellence in American design. Together, these resources and programs reinforce Cooper-Hewitt’s position as the preeminent museum and educational authority for the study of design in the United States.

Even without images, I can imagine enhancing library catalog holdings with annotations about particular artifacts being located at the Cooper-Hewitt.


Monday, June 24th, 2013


From the FAQ:

What is OpenGLAM?

OpenGLAM (Galleries, Libraries, Archives and Museum) is an initiative coordinated by the Open Knowledge Foundation that is committed to building a global cultural commons for everyone to use, access and enjoy.

OpenGLAM helps cultural institutions to open up their content and data through hands-on workshops, documentation and guidance and it supports a network of open culture evangelists through its Working Group.

What do we mean by “open”?

“Open” is a term you hear a lot these days. We’ve tried to get some clarity around this important issue by developing a clear and succinct definition of openness – see Open Definition.

The Open Definition says that a piece of content or data is open if “anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike.”

There a number of Open Definition compliant licenses that GLAMs are increasingly using to license digital content and data that they hold. Popular ones for data include CC-0 and for content CC-BY or CC-BY-SA are often used.

Open access to cultural heritage materials will grow the need for better indexing/organization. As if you needed another reason to support it. 😉

SPARQL end-point of

Friday, December 21st, 2012

SPARQL end-point of

From the webpage:

Welcome on the SPARQL end-point of! currently contains open metadata on 20 million texts, images, videos and sounds gathered by Europeana. Data is following the terms of the Creative Commons CC0 public domain dedication. Data is described the Resource Description Framework (RDF) format, and structured using the Europeana Data Model (EDM). We give more detail on the EDM data we publish on the technical details page.

Please take the time to check out the list of collections currently included in the pilot.

The terms of use and external data sources appearing at are provided on the Europeana Data sources page.

Sample queries are available on the sparql page.

At first I wondered why this was news because: Europeana opens up data on 20 million cultural items appeared on 12 September 2012 in the Guardian

I assume the data has been in use since its release last September.

If you have been using it, can you comment on how your use will change now that the data is available as a SPARQL end-point?

Europeana opens up data on 20 million cultural items

Thursday, September 13th, 2012

Europeana opens up data on 20 million cultural items by Jonathan Gray (Open Knowledge Foundation):

From the post:

Europe‘s digital library Europeana has been described as the ‘jewel in the crown’ of the sprawling web estate of EU institutions.

It aggregates digitised books, paintings, photographs, recordings and films from over 2,200 contributing cultural heritage organisations across Europe – including major national bodies such as the British Library, the Louvre and the Rijksmuseum.

Today [Wednesday, 12 September 2012] Europeana is opening up data about all 20 million of the items it holds under the CC0 rights waiver. This means that anyone can reuse the data for any purpose – whether using it to build applications to bring cultural content to new audiences in new ways, or analysing it to improve our understanding of Europe’s cultural and intellectual history.

This is a coup d’etat for advocates of open cultural data. The data is being released after a grueling and unenviable internal negotiation process that has lasted over a year – involving countless meetings, workshops, and white papers presenting arguments and evidence for the benefits of openness.

That is good news!

A familiar issue that it overcomes:

To complicate things even further, many public institutions actively prohibit the redistribution of information in their catalogues (as they sell it to – or are locked into restrictive agreements with – third party companies). This means it is not easy to join the dots to see which items live where across multiple online and offline collections.

Oh, yeah! That was one of Google’s reasons for pulling the plug on the Open Knowledge Graph. Google had restrictive agreements so you can only connect the dots with Google products. (I think there is a name for that, let me think about it. Maybe an EU prosecutor might know it. You could always ask.)

What are you going to be mapping from this collection?

Linked Data in Libraries, Archives, and Museums

Tuesday, September 11th, 2012

Linked Data in Libraries, Archives, and Museums Information Standards Quarterly (ISQ) Spring/Summer 2012, Volume 24, no. 2/3

Interesting reading on linked data.

I have some comments on the “discovery” of the need to manage “diverse, heterogeneous metadata” but will save them for another post.

From the “flyer” that landed in my inbox:

The National Information Standards Organization (NISO) announces the publication of a special themed issue of the Information Standards Quarterly (ISQ) magazine on Linked Data for Libraries, Archives, and Museums. ISQ Guest Content Editor, Corey Harper, Metadata Services Librarian, New York University has pulled together a broad range of perspectives on what is happening today with linked data in cultural institutions. He states in his introductory letter, “As the Linked Data Web continues to expand, significant challenges remain around integrating such diverse data sources. As the variance of the data becomes increasingly clear, there is an emerging need for an infrastructure to manage the diverse vocabularies used throughout the Web-wide network of distributed metadata. Development and change in this area has been rapidly increasing; this is particularly exciting, as it gives a broad overview on the scope and breadth of developments happening in the world of Linked Open Data for Libraries, Archives, and Museums.”

The feature article by Gordon Dunsire, Corey Harper, Diane Hillmann, and Jon Phipps on Linked Data Vocabulary Management describes the shift in popular approaches to large-scale metadata management and interoperability to the increasing use of the Resource Description Framework to link bibliographic data into the larger web community. The authors also identify areas where best practices and standards are needed to ensure a common and effective linked data vocabulary infrastructure.

Four “in practice” articles illustrate the growth in the implementation of linked data in the cultural sector. Jane Stevenson in Linking Lives describes the work to enable structured and linked data from the Archives Hub in the UK. In Joining the Linked Data Cloud in a Cost-Effective Manner, Seth van Hooland, Ruben Verborgh, and Rik Van de Walle show how general purpose Interactive Data Transformation tools, such as Google Refine, can be used to efficiently perform the necessary task of data cleaning and reconciliation that precedes the opening up of linked data. Ted Fons, Jeff Penka, and Richard Wallis discuss OCLC’s Linked Data Initiative and the use of in WorldCat to make library data relevant on the web. In Europeana: Moving to Linked Open Data , Antoine Isaac, Robina Clayphan, and Bernhard Haslhofer explain how the metadata for over 23 million objects are being converted to an RDF-based linked data model in the European Union’s flagship digital cultural heritage initiative.

Jon Voss provides a status on Linked Open Data for Libraries, Archives, and Museums (LODLAM) State of Affairs and the annual summit to advance this work. Thomas Elliott, Sebastian Heath, John Muccigrosso Report on the Linked Ancient World Data Institute, a workshop to further the availability of linked open data to create reusable digital resources with the classical studies disciplines.

Kevin Ford wraps up the contributed articles with a standard spotlight article on LC’s Bibliographic Framework Initiative and the Attractiveness of Linked Data. This Library of Congress-led community effort aims to transition from MARC 21 to a linked data model. “The move to a linked data model in libraries and other cultural institutions represents one of the most profound changes that our community is confronting,” stated Todd Carpenter, NISO Executive Director. “While it completely alters the way we have always described and cataloged bibliographic information, it offers tremendous opportunities for making this data accessible and usable in the larger, global web community. This special issue of ISQ demonstrates the great strides that libraries, archives, and museums have already made in this arena and illustrates the future world that awaits us.”

Shadow-Activated QR Code Actually Useful and Cool

Tuesday, May 1st, 2012

Shadow-Activated QR Code Actually Useful and Cool Retailer’s sign scannable only at lunch by David Griner.

From the post:

For all the talk of mobile-marketing tech, there remains a pretty wide gap between the potential and the practicality of QR codes. That’s why it’s nice to see this case study from Korea, where a retailer increased lunchtime sales by 25 percent with a shadow-based QR code that’s only scannable in the middle of the day. Emart’s “Sunny Sale” codes are created with three-dimensional displays outside several dozen locations in Seoul. When the sun is at its zenith, the shadows line up, allowing the code to be scanned for access to coupons and online ordering. It’s a smart idea that, in the short term at least, has generated plenty of strong PR and sales. While the wow factor is sure to fade quickly, it’s still a great example of a marketer finding a way to turn QR codes into something actually worth scanning.

From Seoul. No surprise there. Heavy investment in education and technology infrastructure. Some soon-to-be-former technology leaders did the same thing but then lost their way.

If you think of QR codes as a cheap equivalent to a secure RFID tag, you have to “see” it to scan it, it should be more popular than it is. Physical security being the first principle of network security (to “see” the QR code).

Museums could use QR codes (linking into topic maps) to provide information in multiple languages. With sponsors for coupons to local eateries. No expensive tags, networks, sensors, etc.

Using an RDF Data Pipeline to Implement Cross-Collection Search

Saturday, March 31st, 2012

Using an RDF Data Pipeline to Implement Cross-Collection Search by David Henry and Eric Brown.


This paper presents an approach to transforming data from many diverse sources in support of a semantic cross-collection search application. It describes the vision and goals for a semantic cross-collection search and examines the challenges of supporting search of that kind using very diverse data sources. The paper makes the case for supporting semantic cross-collection search using semantic web technologies and standards including Resource Descriptive Framework (RDF), SPARQL Protocol and RDF Query Language (SPARQL ), and an XML mapping language. The Missouri History Museum has developed a prototype method for transforming diverse data sources into a data repository and search index that can support a semantic cross-collection search. The method presented in this paper is a data pipeline that transforms diverse data into localized RDF; then transforms the localized RDF into more generalized RDF graphs using common vocabularies; and ultimately transforms generalized RDF graphs into a Solr search index to support a semantic cross-collection search. Limitations and challenges of this approach are detailed in the paper.

A great report on the issues you will face with diverse data resources. (And who doesn’t have those?)

The “practical considerations” section is particularly interesting and I am sure the project participants would appreciate any suggestions you may have.

High-Quality Images from the Amsterdam Rijksmuseum

Saturday, December 31st, 2011

High-Quality Images from the Amsterdam Rijksmuseum

From the post:

The Amsterdam Rijksmuseum has made images from its “basic collection” – a little over 103,000 objects – available under a Creative Commons BY 3.0 license which allows you to:

  • Share — to copy, distribute and transmit the work
  • Remix — to adapt the work
  • Make commercial use of the work

These images may be used not only for classroom study and research but also for publishing, as long as the museum receives proper attribution. The collections database, in Dutch, is available here. Over 70,000 objects are also cataloged using ICONCLASS subject headings in English; this interface is available here. Click here for an example of the scan quality.

Geertje Jacobs posted a response:

Geertje Jacobs says:
December 14, 2011 at 1:16 am

Thank you for the post on our new API service!

I’d like to add an extra link to the API page. On this page, you’ll find information about our service (very soon also in English). This is also the place to ask for the key to make use of our data and images!
If there are any questions please contact

Enjoy our collection!

A very promising resource for use in European history, historical theology and the intellectual history of Europe studies. Coupled with a topic map, geographic, written and other resources can be combined together with the visual resources from the Amsterdam Rijksmuseum.

The Getty Search Gateway

Saturday, October 1st, 2011

The Getty Search Gateway at all things cataloged

Interesting review of the new search capabilities at the Getty. Covers their use of Solr and some of its more interesting capabilities. Searches across collections and other information sources.

After reading the post and using the site, what would you do differently with a topic map? In particular?

Europeana: think culture

Sunday, December 12th, 2010

europena: think culture

More than 14.6 million items from over 1500 organizations.

Truly an embarrassment of riches for anyone writing a topic map about Europe, its history, literature, influence on other parts of the world, etc.

I have just begun to explore the site and its interfaces. Will report back from time to time.

You can create your own tags but creation of an account requires the following agreement:

I understand that My Europeana gives me the opportunity to create tags for any item I wish. I agree that I will not create any tags that could be considered libelous, harmful, threatening, unlawful, defamatory,infringing, abusive, inflammatory, harassing, pornographic, obscene, fraudulent, invasive of privacy or publicity rights, hateful, or racially, ethnically or otherwise objectionable. By clicking this box I agree to abide by this agreement, and understand that if I don’t my membership of My Europeana will be terminated.

Just so you know.


  1. Select ten (10) artifacts to be integrated with local resources, using a topic map. Create a topic map. (The artifacts can be occurrences but associations provide richer opportunities.)
  2. Select one of the projects on the Thought Lab page and review it.
  3. What would you suggest as an improvement to the project you selected in #2? (3-5 pages, citations)