Archive for the ‘WorldCat’ Category

Shanghai Library adds 2 million records to WorldCat…

Tuesday, September 16th, 2014

Shanghai Library adds 2 million records to WorldCat to share its collection with the world Compiled by Ming POON, Josephine SCHE, and Mi Chu WIENS (November, 2004).

From the post:

Shanghai Library, the largest public library in China and one of the largest libraries in the world, has contributed 2 million holdings to WorldCat, including some 770,000 unique bibliographic records, to share its collection worldwide.

These records, which represent books and journals published between 1911 and 2013, were loaded in WorldCat earlier this year. The contribution from Shanghai Library, an OCLC member since 1996, enhances the richness and depth of Chinese materials in WorldCat as well as the discoverability of these collections around the world.

“We are pleased to add Shanghai Library’s holdings to WorldCat, which is the global union catalog of library collections,” said Dr. Jianzhong Wu, Director, Shanghai Library “Shanghai is a renowned, global city, and the library should be as well. With WorldCat, we not only raise the visibility of our collection to a global level but we also share our national heritage and identity with other libraries and their users through the OCLC WorldShare Interlibrary Loan service.”

“The leadership of Shanghai Library has a bold global vision,” says Andrew H. Wang, Vice President, OCLC Asia Pacific. “The addition of Shanghai Library’s holdings and unique records enriches coverage of the Chinese collection in WorldCat for researchers everywhere.”

I don’t have a feel for how many unique Chinese bibliographic records are online but 770,000 sounds like a healthy addition.

You may also be interested in: Online Resources for Chinese Studies in North American Libraries.

Given the compilation date, 2004, I ran the W3C Link Checker on http://www.loc.gov/rr/asian/china-bib/.

You can review the results at: http://www.durusau.net/publications/W3CLinkChecker:http:_www.loc.gov_rr_asian_china-bib_.html

Summary of results:

Code Occurrences What to do
(N/A) 6 The link was not checked due to robots exclusion rules. Check the link manually, and see also the link checker documentation on robots exclusion.
(N/A) 2 The hostname could not be resolved. Check the link for typos.
403 1 The link is forbidden! This needs fixing. Usual suspects: a missing index.html or Overview.html, or a missing ACL.
404 61 The link is broken. Double-check that you have not made any typo, or mistake in copy-pasting. If the link points to a resource that no longer exists, you may want to remove or fix the link.
500 5 This is a server side problem. Check the URI.

(emphasis added)

At a minimum, the broken links need to be corrected but updating the listing to include new resources would make a nice graduate student project.

I don’t have the background or language skills with Chinese resources to embark on such a project but would be happy to assist anyone who undertakes the task.

OCLC releases WorldCat Works as linked data

Saturday, May 3rd, 2014

OCLC releases WorldCat Works as linked data

From the press release:

OCLC has made 197 million bibliographic work descriptions—WorldCat Works—available as linked data, a format native to the Web that will improve discovery of library collections through a variety of popular sites and Web services.

Release of this data marks another step toward providing interconnected linked data views of WorldCat. By making this linked data available, library collections can be exposed to the wider Web community, integrating these collections and making them more easily discoverable through websites and services that library users visit daily, such as Google, Wikipedia and social networks.

“Bibliographic data stored in traditional record formats has reached its limits of efficiency and utility,” said Richard Wallis, OCLC Technology Evangelist. “New technologies, influenced by the Web, now enable us to move toward managing WorldCat data as entities—such as ‘Works,’ ‘People,’ ‘Places’ and more—as part of the global Web of data.”

OCLC has created authoritative work descriptions for bibliographic resources found in WorldCat, bringing together multiple manifestations of a work into one logical authoritative entity. The release of “WorldCat Works” is the first step in providing linked data views of rich WorldCat entities. Other WorldCat descriptive entities will be created and released over time.

If you are looking for a smallish set of entity identifiers, this is a good start on bibliographic materials.

I say smallish because as of 2009, there were 672 million assigned phone numbers in the United States (Numbering Resource Utilization in the United States).

Each of those phone numbers has the potential to identify some subject. The assigned number if nothing else. Although other uses suggest themselves.

Trapping Users with Linked Data (WorldCat)

Friday, March 7th, 2014

WorldCat Works Linked Data – Some Answers To Early Questions by Richard Wallis.

The most interesting question Richard answers:

Q Is there a bulk download available?
No there is no bulk download available. This is a deliberate decision for several reasons.
Firstly this is Linked Data – its main benefits accrue from its canonical persistent identifiers and the relationships it maintains between other identified entities within a stable, yet changing, web of data. WorldCat.org is a live data set actively maintained and updated by the thousands of member libraries, data partners, and OCLC staff and processes. I would discourage reliance on local storage of this data, as it will rapidly evolve and become out of synchronisation with the source. The whole point and value of persistent identifiers, which you would reference locally, is that they will always dereference to the current version of the data.

I will give you one guess on who is deciding on the entities, identifiers and relationships to be maintained.

Hint: It’s not you.

Which in my view is one of the principal weaknesses of Linked Data.

In order to participate, you have to forfeit your right to organize your world differently than it has been organized by Richard Wallis, WorldCat and others.

I am sure they all have good intentions and WorldCat will come close enough for most of my purposes, but I’m not interested in a one world view, whoever agrees with it. Even me.

If you are good with graphics, take the original Apple commercial:

and reverse it.

Show users and screen of vivid diversity and show a Richard Wallis look alike touching the side of the projection screen and the uniform grayness of linked data starts to spread across it. As it does, the users in the audience who have been in traditional dress start to look like the starting audience in Apple’s 1984 commercial.

That’s the intellectual landscape that linked data promises. Do you really want to go there?

Nothing against standards, I have helped write one or two them. But I do oppose uniformity for the sake of empowering self-appointed guardians.

Particularly when that uniformity is a tepid grey that doesn’t reflect the rich and discordant hues of human intellectual history.

OCLC Preview 194 Million…

Tuesday, February 25th, 2014

OCLC Preview 194 Million Open Bibliographic Work Descriptions by Richard Wallis.

From the post:

I have just been sharing a platform, at the OCLC EMEA Regional Council Meeting in Cape Town South Africa, with my colleague Ted Fons. A great setting for a great couple of days of the OCLC EMEA membership and others sharing thoughts, practices, collaborative ideas and innovations.

Ted and I presented our continuing insight into The Power of Shared Data, and the evolving data strategy for the bibliographic data behind WorldCat. If you want to see a previous view of these themes you can check out some recordings we made late last year on YouTube, from Ted – The Power of Shared Data – and me – What the Web Wants.

Today, demonstrating on-going progress towards implementing the strategy, I had the pleasure to preview two upcoming significant announcements on the WorldCat data front:

  1. The release of 194 Million Linked Data Bibliographic Work descriptions
  2. The WorldCat Linked Data Explorer interface

A preview release to be sure but one worth following!

Particularly with 194 million bibliographic work descriptions!

See Ralph’s post for the details.

Content-Negotiation for WorldCat

Monday, June 3rd, 2013

Content-Negotiation for WorldCat by Richard Wallis.

From the post:

I am pleased to share with you a small but significant step on the Linked Data journey for WorldCat and the exposure of data from OCLC.

Content-negotiation has been implemented for the publication of Linked Data for WorldCat resources.

For those immersed in the publication and consumption of Linked Data, there is little more to say. However I suspect there are a significant number of folks reading this who are wondering what the heck I am going on about. It is a little bit techie but I will try to keep it as simple as possible.

Back last year, a linked data representation of each (of the 290+ million) WorldCat resources was imbedded in it’s web page on the WorldCat site. For full details check out that announcement but in summary:

  • All resource pages include Linked Data
  • Human visible under a Linked Data tab at the bottom of the page
  • Embedded as RDFa within the page html
  • Described using the Schema.org vocabulary
  • Released under an ODC-BY open data license

That is all still valid – so what’s new from now?

That same data is now available in several machine readable RDF serialisations. RDF is RDF, but dependant on your use it is easier to consume as RDFa, or XML, or JSON, or Turtle, or as triples.

In many Linked Data presentations, including some of mine, you will hear the line “As I clicked on the link a web browser we are seeing a html representation. However if I was a machine I would be getting XML or another format back.” This is the mechanism in the http protocol that makes that happen.

I use WorldCat often. It enables readers to search for a book at their local library or to order online.

The Correct End Of Your Telescope – Viewing Schema.org Adoption

Sunday, November 4th, 2012

The Correct End Of Your Telescope – Viewing Schema.org Adoption by Richard Wallis.

telescope graphic

I have been banging on about Schema.org for a while.  For those that have been lurking under a structured data rock for the last year, it is an initiative of cooperation between Google, Bing, Yahoo!, and Yandex to establish a vocabulary for embedding structured data in web pages to describe ‘things’ on the web.  Apart from the simple significance of having those four names in the same sentence as the word cooperation, this initiative is starting to have some impact.  As I reported back in June, the search engines are already seeing some 7%-10% of pages they crawl containing Schema.org markup.  Like it or not, it is clear that Schema.org is rapidly becoming a de facto way of marking up your data if you want it to be shared on the web and have it recognised by the major search engines.

It is no coincidence then, at OCLC we chose Schema.org as the way to expose linked data in WorldCat.  If you haven’t seen it, just search for any item at worldcat.org, scroll to the bottom of the page and open up the Linked Data tab and there you will see the [not very pretty, but hay it’s really designed for systems not humans] Schema.org marked up linked data for the item, with links out to other data sources such as VIAF, LCSH, FAST, and Dewey.

Schema.org has much to recommend itself but I suspect that HTML remains the “…de facto way of marking up your data if you want it to be shared on the web and have it recognised by the major search engines.”

Ten percent is no mean feat but it is still ten percent.

Putting WorldCat Data Into A Triple Store

Tuesday, August 21st, 2012

Putting WorldCat Data Into A Triple Store by Richard Wallis.

From the post:

I can not really get away with making a statement like “Better still, download and install a triplestore [such as 4Store], load up the approximately 80 million triples and practice some SPARQL on them” and then not following it up.

I made it in my previous post Get Yourself a Linked Data Piece of WorldCat to Play With in which I was highlighting the release of a download file containing RDF descriptions of the 1.2 million most highly held resources in WorldCat.org – to make the cut, a resource had to be held by more than 250 libraries.

So here for those that are interested is a step by step description of what I did to follow my own encouragement to load up the triples and start playing.

Have you loaded the WorldCat linked data into a triple store?

Some other storage mechanism?

Get Yourself a Linked Data Piece of WorldCat to Play With

Thursday, August 16th, 2012

Get Yourself a Linked Data Piece of WorldCat to Play With by Richard Wallis.

From the post:

You may remember my frustration a couple of months ago, at being in the air when OCLC announced the addition of Schema.org marked up Linked Data to all resources in WorldCat.org. Those of you who attended the OCLC Linked Data Round Table at IFLA 2012 in Helsinki yesterday, will know that I got my own back on the folks who publish the press releases at OCLC, by announcing the next WorldCat step along the Linked Data road whilst they were still in bed.

The Round Table was an excellent very interactive session with Neil Wilson from the British Library, Emmanuelle Bermes from Centre Pompidou, and Martin Malmsten of the Nation Library of Sweden, which I will cover elsewhere. For now, you will find my presentation Library Linked Data Progress on my SlideShare site.

After we experimentally added RDFa embedded linked data, using Schema.org markup and some proposed Library extensions, to WorldCat pages, one the most often questions I was asked was where can I get my hands on some of this raw data?

We are taking the application of linked data to WorldCat one step at a time so that we can learn from how people use and comment on it. So at that time if you wanted to see the raw data the only way was to use a tool [such as the W3C RDFA 1.1 Distiller] to parse the data out of the pages, just as the search engines do.

So I am really pleased to announce that you can now download a significant chunk of that data as RDF triples. Especially in experimental form, providing the whole lot as a download would have bit of a challenge, even just in disk space and bandwidth terms. So which chunk to choose was a question. We could have chosen a random selection, but decided instead to pick the most popular, in terms of holdings, resources in WorldCat – an interesting selection in it’s own right.

To make the cut, a resource had to be held by more than 250 libraries. It turns out that almost 1.2 million fall in to this category, so a sizeable chunk indeed. To get your hands on this data, download the 1Gb gzipped file. It is in RDF n-triples form, so you can take a look at the raw data in the file itself. Better still, download and install a triplestore [such as 4Store], load up the approximately 80 million triples and practice some SPARQL on them.

That’s a nice sized collection of data. In any format.

But next to last sentence of the post reads:

As I say in the press release, posted after my announcement, we are really interested to see what people will do with this data.

Déjà vu?

I think I have heard that question asked with other linked data releases. You? Pointers?

I first saw this at SemanticWeb.com.