A million first steps by Ben O’Steen.
From the post:
We have released over a million images onto Flickr Commons for anyone to use, remix and repurpose. These images were taken from the pages of 17th, 18th and 19th century books digitised by Microsoft who then generously gifted the scanned images into the Public Domain. The images themselves cover a startling mix of subjects: There are maps, geological diagrams, beautiful illustrations, comical satire, illuminated and decorative letters, colourful illustrations, landscapes, wall-paintings and so much more that even we are not aware of.
Which brings me to the point of this release. We are looking for new, inventive ways to navigate, find and display these ‘unseen illustrations’. The images were plucked from the pages as part of the ‘Mechanical Curator’, a creation of the British Library Labs project. Each image is individually addressible, online, and Flickr provies an API to access it and the image’s associated description.
We may know which book, volume and page an image was drawn from, but we know nothing about a given image. Consider the image below. The title of the work may suggest the thematic subject matter of any illustrations in the book, but it doesn’t suggest how colourful and arresting these images are.
(Aside from any educated guesses we might make based on the subject matter of the book of course.)
See more from this book: “Historia de las Indias de Nueva-España y islas de Tierra Firme…” (1867)
Next steps
We plan to launch a crowdsourcing application at the beginning of next year, to help describe what the images portray. Our intention is to use this data to train automated classifiers that will run against the whole of the content. The data from this will be as openly licensed as is sensible (given the nature of crowdsourcing) and the code, as always, will be under an open licence.
The manifests of images, with descriptions of the works that they were taken from, are available on github and are also released under a public-domain ‘licence’. This set of metadata being on github should indicate that we fully intend people to work with it, to adapt it, and to push back improvements that should help others work with this release.
There are very few datasets of this nature free for any use and by putting it online we hope to stimulate and support research concerning printed illustrations, maps and other material not currently studied. Given that the images are derived from just 65,000 volumes and that the library holds many millions of items.
If you need help or would like to collaborate with us, please contact us on email, or twitter (or me personally, on any technical aspects)
Think about the numbers. One million images from 65,000 volumes. The British Library holds millions of items.
Encourage more releases like this one with good use of and suggestions for this release!