Get Yourself a Linked Data Piece of WorldCat to Play With

Get Yourself a Linked Data Piece of WorldCat to Play With by Richard Wallis.

From the post:

You may remember my frustration a couple of months ago, at being in the air when OCLC announced the addition of Schema.org marked up Linked Data to all resources in WorldCat.org. Those of you who attended the OCLC Linked Data Round Table at IFLA 2012 in Helsinki yesterday, will know that I got my own back on the folks who publish the press releases at OCLC, by announcing the next WorldCat step along the Linked Data road whilst they were still in bed.

The Round Table was an excellent very interactive session with Neil Wilson from the British Library, Emmanuelle Bermes from Centre Pompidou, and Martin Malmsten of the Nation Library of Sweden, which I will cover elsewhere. For now, you will find my presentation Library Linked Data Progress on my SlideShare site.

After we experimentally added RDFa embedded linked data, using Schema.org markup and some proposed Library extensions, to WorldCat pages, one the most often questions I was asked was where can I get my hands on some of this raw data?

We are taking the application of linked data to WorldCat one step at a time so that we can learn from how people use and comment on it. So at that time if you wanted to see the raw data the only way was to use a tool [such as the W3C RDFA 1.1 Distiller] to parse the data out of the pages, just as the search engines do.

So I am really pleased to announce that you can now download a significant chunk of that data as RDF triples. Especially in experimental form, providing the whole lot as a download would have bit of a challenge, even just in disk space and bandwidth terms. So which chunk to choose was a question. We could have chosen a random selection, but decided instead to pick the most popular, in terms of holdings, resources in WorldCat – an interesting selection in it’s own right.

To make the cut, a resource had to be held by more than 250 libraries. It turns out that almost 1.2 million fall in to this category, so a sizeable chunk indeed. To get your hands on this data, download the 1Gb gzipped file. It is in RDF n-triples form, so you can take a look at the raw data in the file itself. Better still, download and install a triplestore [such as 4Store], load up the approximately 80 million triples and practice some SPARQL on them.

That’s a nice sized collection of data. In any format.

But next to last sentence of the post reads:

As I say in the press release, posted after my announcement, we are really interested to see what people will do with this data.

Déjà vu?

I think I have heard that question asked with other linked data releases. You? Pointers?

I first saw this at SemanticWeb.com.

Leave a Reply

You must be logged in to post a comment.