Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 21, 2014

Wellcome Images

Filed under: Data,Data Integration,Library,Museums — Patrick Durusau @ 5:47 pm

Thousands of years of visual culture made free through Wellcome Images

From the post:

We are delighted to announce that over 100,000 high resolution images including manuscripts, paintings, etchings, early photography and advertisements are now freely available through Wellcome Images.

Drawn from our vast historical holdings, the images are being released under the Creative Commons Attribution (CC-BY) licence.

This means that they can be used for commercial or personal purposes, with an acknowledgement of the original source (Wellcome Library, London). All of the images from our historical collections can be used free of charge.

The images can be downloaded in high-resolution directly from the Wellcome Images website for users to freely copy, distribute, edit, manipulate, and build upon as you wish, for personal or commercial use. The images range from ancient medical manuscripts to etchings by artists such as Vincent Van Gogh and Francisco Goya.

The earliest item is an Egyptian prescription on papyrus, and treasures include exquisite medieval illuminated manuscripts and anatomical drawings, from delicate 16th century fugitive sheets, whose hinged paper flaps reveal hidden viscera to Paolo Mascagni’s vibrantly coloured etching of an ‘exploded’ torso.

Other treasures include a beautiful Persian horoscope for the 15th-century prince Iskandar, sharply sketched satires by Rowlandson, Gillray and Cruikshank, as well as photography from Eadweard Muybridge’s studies of motion. John Thomson’s remarkable nineteenth century portraits from his travels in China can be downloaded, as well a newly added series of photographs of hysteric and epileptic patients at the famous Salpêtrière Hospital

Semantics or should I say semantic confusion is never far away. While viewing an image of Gladstone as Scrooge:

Gladstone

When “search by keyword” offered “colonies,” I assumed either the colonies of the UK at the time.

Imagine my surprise when among other images, Wellcome Images offered:

petri dish

The search by keywords had found fourteen petri dish images, three images of Batavia, seven maps of India (salt, leporsy), one half naked woman being held down, and the Gladstone image from earlier.

About what one expects from search these days but we could do better. Much better.

I first saw this in a tweet by Neil Saunders.

January 16, 2014

Library of Congress RSS Feeds

Filed under: Library — Patrick Durusau @ 5:34 pm

Library of Congress RSS Feeds

Quite by accident I stumbled upon a list of Library of Congress RSS feeds and email subscriptions in the following categories:

  • Collections Preservation
  • Copyright
  • Digital Preservation
  • Events
  • Folklife
  • For Librarians
  • For Teachers
  • General News
  • Hispanic Division
  • Legal
  • Music Division
  • Journalism
  • Poetry & Literature
  • Science
  • Site Updates
  • Veterans History
  • Visual Resources

If you think about it, libraries are aggregations of diverse semantics from across many domains.

Quite at odds with any particular cultural monotone of the day.

Subversive places. That must be why I like them so much!

…Digital Asset Sustainability…

Filed under: Archives,Digital Library,Library,Preservation — Patrick Durusau @ 5:14 pm

A National Agenda Bibliography for Digital Asset Sustainability and Preservation Cost Modeling by Butch Lazorchak.

From the post:

The 2014 National Digital Stewardship Agenda, released in July 2013, is still a must-read (have you read it yet?). It integrates the perspective of dozens of experts to provide funders and decision-makers with insight into emerging technological trends, gaps in digital stewardship capacity and key areas for development.

The Agenda suggests a number of important research areas for the digital stewardship community to consider, but the need for more coordinated applied research in cost modeling and sustainability is high on the list of areas prime for research and scholarship.

The section in the Agenda on “Applied Research for Cost Modeling and Audit Modeling” suggests some areas for exploration:

“Currently there are limited models for cost estimation for ongoing storage of digital content; cost estimation models need to be robust and flexible. Furthermore, as discussed below…there are virtually no models available to systematically and reliably predict the future value of preserved content. Different approaches to cost estimation should be explored and compared to existing models with emphasis on reproducibility of results. The development of a cost calculator would benefit organizations in making estimates of the long‐term storage costs for their digital content.”

In June of 2012 I put together a bibliography of resources touching on the economic sustainability of digital resources. I’m pleasantly surprised as all the new work that’s been done in the meantime, but as the Agenda suggests, there’s more room for directed research in this area. Or perhaps, as Paul Wheatley suggests in this blog post, what’s really needed are coordinated responses to sustainability challenges that build directly on this rich body of work, and that effectively communicate the results out to a wide audience.

I’ve updated the bibliography, hoping that researchers and funders will explore the existing body of projects, approaches and research, note the gaps in coverage suggested by the Agenda and make efforts to address the gaps in the near future through new research or funding.

I count some seventy-one (71) items in this bibliography.

Digital preservation is an area where topic maps can help maintain access over changing customs and vocabularies, but just like migrating from one form of media to another, it doesn’t happen by itself.

Nor is there any “free lunch,” because the data is culturally important, rare, etc. Someone has to pay the bill for it being preserved.

Having the cost of semantic access included in digital preservation would not hurt the cause of topic maps.

Yes?

January 13, 2014

December 14, 2013

JITA Classification System of Library and Information Science

Filed under: Classification,Library,Linked Data — Patrick Durusau @ 5:00 pm

JITA Classification System of Library and Information Science

From the post:

JITA is a classification schema of Library and Information Science (LIS). It is used by E-LIS, an international open repository for scientific papers in Library and Information Science, for indexing and searching. Currently JITA is available in English and has been translated into 14 languages (tr, el, nl, cs, fr, it, ro, ca, pt, pl, es, ar, sv, ru). JITA is also accessible as Linked Open Data, containing 3500 triples.

You had better enjoy triples before link rot overtakes them.

Today CSV, tomorrow JSON?

How long do you think the longest lived triple will last?

December 13, 2013

Ancient texts published online…

Filed under: Bible,Data,Library — Patrick Durusau @ 5:58 pm

Ancient texts published online by the Bodleian and the Vatican Libraries

From the post:

The Bodleian Libraries of the University of Oxford and the Biblioteca Apostolica Vaticana (BAV) have digitized and made available online some of the world’s most unique and important Bibles and biblical texts from their collections, as the start of a major digitization initiative undertaken by the two institutions.

The digitized texts can be accessed on a dedicated website which has been launched today (http://bav.bodleian.ox.ac.uk). This is the first launch of digitized content in a major four-year collaborative project.
Portions of the Bodleian and Vatican Libraries’ collections of Hebrew manuscripts, Greek manuscripts, and early printed books have been selected for digitization by a team of scholars and curators from around the world. The selection process has been informed by a balance of scholarly and practical concerns; conservation staff at the Bodleian and Vatican Libraries have worked with curators to assess not only the significance of the content, but the physical condition of the items. While the Vatican and the Bodleian have each been creating digital images from their collections for a number of years, this project has provided an opportunity for both libraries to increase the scale and pace with which they can digitize their most significant collections, whilst taking great care not to expose books to any damage, as they are often fragile due to their age and condition.

The newly-launched website features zoomable images which enable detailed scholarly analysis and study. The website also includes essays and a number of video presentations made by scholars and supporters of the digitization project including the Archbishop of Canterbury and Archbishop Jean-Louis Bruguès, o.p. The website blog will also feature articles on the conservation and digitized techniques and methods used during the project. The website is available both in English and Italian.

Originally announced in April 2012, the four-year collaboration aims to open up the two libraries’ collections of ancient texts and to make a selection of remarkable treasures freely available online to researchers and the general public worldwide. Through the generous support of the Polonsky Foundation, this project will make 1.5 million digitized pages freely available over the next three years.

Only twenty-one (21) works up now but 1.5 million pages by the end of the project. This is going to be a treasure trove without end!

Associating these items with their cultural contexts of production, influence on other works, textual history, comments by subsequent works, across multiple languages, is a perfect fit for topic maps.

Kudos to both the Bodleian and the Vatican Libraries!

A million first steps [British Library Image Release]

Filed under: Data,Image Understanding,Library — Patrick Durusau @ 4:48 pm

A million first steps by Ben O’Steen.

From the post:

We have released over a million images onto Flickr Commons for anyone to use, remix and repurpose. These images were taken from the pages of 17th, 18th and 19th century books digitised by Microsoft who then generously gifted the scanned images into the Public Domain. The images themselves cover a startling mix of subjects: There are maps, geological diagrams, beautiful illustrations, comical satire, illuminated and decorative letters, colourful illustrations, landscapes, wall-paintings and so much more that even we are not aware of.

Which brings me to the point of this release. We are looking for new, inventive ways to navigate, find and display these ‘unseen illustrations’. The images were plucked from the pages as part of the ‘Mechanical Curator’, a creation of the British Library Labs project. Each image is individually addressible, online, and Flickr provies an API to access it and the image’s associated description.

We may know which book, volume and page an image was drawn from, but we know nothing about a given image. Consider the image below. The title of the work may suggest the thematic subject matter of any illustrations in the book, but it doesn’t suggest how colourful and arresting these images are.

(Aside from any educated guesses we might make based on the subject matter of the book of course.)

BL-image

See more from this book: “Historia de las Indias de Nueva-España y islas de Tierra Firme…” (1867)

Next steps

We plan to launch a crowdsourcing application at the beginning of next year, to help describe what the images portray. Our intention is to use this data to train automated classifiers that will run against the whole of the content. The data from this will be as openly licensed as is sensible (given the nature of crowdsourcing) and the code, as always, will be under an open licence.

The manifests of images, with descriptions of the works that they were taken from, are available on github and are also released under a public-domain ‘licence’. This set of metadata being on github should indicate that we fully intend people to work with it, to adapt it, and to push back improvements that should help others work with this release.

There are very few datasets of this nature free for any use and by putting it online we hope to stimulate and support research concerning printed illustrations, maps and other material not currently studied. Given that the images are derived from just 65,000 volumes and that the library holds many millions of items.

If you need help or would like to collaborate with us, please contact us on email, or twitter (or me personally, on any technical aspects)

Think about the numbers. One million images from 65,000 volumes. The British Library holds millions of items.

Encourage more releases like this one with good use of and suggestions for this release!

December 5, 2013

German Digital Library releases API

Filed under: Digital Library,Library — Patrick Durusau @ 8:11 am

German Digital Library releases API by Lieke Ploeger.

From the post:

Last month the German Digital Library (Deutsche Digitale Bibliothek – DDB) made a promising step forward toward further opening up their data by releasing its API (Application Programming Interface) to the public. This API provides access to all the metadata of the DDB released under a CC0 license, which is the predominant share. The release of this API opens up a wide range of possibilities for users to build applications, create combinations with other data or include the German digitised cultural heritage on other platforms. In the future, the DDB also plans to organize a programming competition for API applications as well as a series of workshops for developers.

The official press release.

Technical documentation on the API (German).

A good excuse for you to brush up on your German. Besides, not all of it is in German.

October 17, 2013

Open Discovery Initiative Recommended Practice [Comments due 11-18-2013]

Filed under: Discovery Informatics,Library,NISO,Standards — Patrick Durusau @ 4:20 pm

ODI Recommended Practice (NISO RP-19-201x)

From the Open Discovery Initiative (NISO) webpage:

The Open Discovery Initiative (ODI) aims at defining standards and/or best practices for the new generation of library discovery services that are based on indexed search. These discovery services are primarily based upon indexes derived from journals, ebooks and other electronic information of a scholarly nature. The content comes from a range of information providers and products–commercial, open access, institutional, etc. Given the growing interest and activity in the interactions between information providers and discovery services, this group is interested in establishing a more standard set of practices for the ways that content is represented in discovery services and for the interactions between the creators of these services and the information providers whose resources they represent.

If you are interested in the discovery of information, as a publisher, consumer of information, library or otherwise, please take the time to read and comment on this recommended practice.

Spend some time with the In Scope and Out of Scope sections.

So that your comments reflect what the recommendation intended to cover and not what you would prefer that it covered. (That’s advice I need to heed as well.)

September 26, 2013

JSTOR & JPASS

Filed under: Library — Patrick Durusau @ 6:13 pm

JSTOR & JPASS

From the webpage:

JPASS gives you personal access to a library of more than 1,500 academic journals on JSTOR. If you don’t have access to JSTOR through a school or public library, JPASS may be a perfect fit.

With JPASS, a substantial portion of the most influential research and ideas published over centuries is available to you anywhere, anytime. Access includes a vast collection of archival journals in the humanities, social sciences, and sciences. Coverage begins for each journal at the first volume and issue ever published, and extends up to a publication date usually set in the past three to five years. Current issues are not part of the JPASS Collection.

Current rates: $19.95/month or $199/year, with download permission for ten articles a month or one hundred and twenty for a year subscription.

It’s not much but if you don’t have access to a major academic library, it is better than nothing.

The amazing part of this story is that until quite recently JSTOR had no individual subscriptions.

Can’t imagine someone outside of a traditional academic setting wanting to read substantive academic research.

That may sound like I am not a fan of JSTOR. Truth is I’m not. But like I said, if you have no meaningful access at all, this will be better than nothing.

For CS and related topics, I would spend the money on the ACM Digital Library and/or the IEEE Xplore Digital Library.

August 21, 2013

Child of the Library

Filed under: Library — Patrick Durusau @ 6:29 pm

Public libraries, for me, are in a category all their own.

I was fortunate to grow up in a community that supported public libraries. And I have spent most of my life across several careers using libraries of one sort or another, including public ones.

Public libraries offer at no cost to patrons opportunities to be informed about current events, to learn more than is taught in any university, to be entertained by stories from near and far and even long ago.

Public libraries are also community centers where anyone can meet, where economic limitations don’t prevent access to the latest technologies or information streams.

Visit and support your local public library.

Every public library is a visible symbol that government thought control may be closing in, but it hasn’t won, yet.

July 25, 2013

The LibraryThing – Update

Filed under: Books,Library — Patrick Durusau @ 12:47 pm

The New Home Page

LibraryThing has a new homepage!

I should have asked if you like to read books first. 😉

What is LibraryThing?

LibraryThing is a cataloging and social networking site for book lovers.

LibraryThing helps you create a library-quality catalog of books: books you own, books you’ve read, books you’d like to read, books you’ve lent out … whatever grouping you’d like.

Since everyone catalogs online, they also catalog together. You can contribute tags, ratings and reviews for a book, and Common Knowledge (facts about a book or author, like character names and awards), as well as participate in member forums or join the Early Reviewers program. Everyone gets the benefit of everyone else’s work. LibraryThing connects people based on the books they share.

New modules, new features and of course, books!

Social networking opportunity for book lovers.

You may find other people who own a copy of Sowa’s “Knowledge Representation,” Eco’s “A Theory of Semiotics,” and the “Anarchist Cookbook.” 😉

June 30, 2013

Preservation Vocabularies [3 types of magnetic storage medium?]

Filed under: Archives,Library,Linked Data,Vocabularies — Patrick Durusau @ 12:30 pm

Preservation Datasets

From the webpage:

The Linked Data Service is to provide access to commonly found standards and vocabularies promulgated by the Library of Congress. This includes data values and the controlled vocabularies that house them. Below are descriptions of each preservation vocabulary derived from the PREMIS standard. Inside each, a search box allows you to search the vocabularies individually .

New preservation vocabularies from the Library of Congress.

Your mileage will vary with these vocabularies.

Take storage for example.

As we all learned in school, there are only three kinds of magnetic “storage medium:”

  • hard disk
  • magnetic tape
  • TSM

😉

In case you don’t recognize TSM, it stands for IBM Tivoli Storage Manager.

Hmmmm, what about the twenty (20) types of optical disks?

Or other forms of magnetic media? Such as thumb drives, floppy disks, etc.

I pick “storage medium” at random.

Take a look at some of the other vocabularies and let me know what you think.

Please include links to more information in case the LOC decides to add more entries to its vocabularies.

I first saw this at: 21 New Preservation Vocabularies available at id.loc.gov.

June 24, 2013

OpenGLAM

Filed under: Archives,Library,Museums,Open Data — Patrick Durusau @ 9:14 am

OpenGLAM

From the FAQ:

What is OpenGLAM?

OpenGLAM (Galleries, Libraries, Archives and Museum) is an initiative coordinated by the Open Knowledge Foundation that is committed to building a global cultural commons for everyone to use, access and enjoy.

OpenGLAM helps cultural institutions to open up their content and data through hands-on workshops, documentation and guidance and it supports a network of open culture evangelists through its Working Group.

What do we mean by “open”?

“Open” is a term you hear a lot these days. We’ve tried to get some clarity around this important issue by developing a clear and succinct definition of openness – see Open Definition.

The Open Definition says that a piece of content or data is open if “anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike.”

There a number of Open Definition compliant licenses that GLAMs are increasingly using to license digital content and data that they hold. Popular ones for data include CC-0 and for content CC-BY or CC-BY-SA are often used.

Open access to cultural heritage materials will grow the need for better indexing/organization. As if you needed another reason to support it. 😉

June 5, 2013

Texas Conference on Digital Libraries 2013

Filed under: Digital Library,Librarian/Expert Searchers,Library — Patrick Durusau @ 9:33 am

Texas Conference on Digital Libraries 2013

Abstracts and in many cases presentations from the Texas Conference on Digital Libraries 2013.

A real treasure trove on digital libraries projects and issues.

Library: A place where IR isn’t limited by software.

June 3, 2013

“Why don’t libraries get better the more they are used?” [Librarians Do]

Filed under: Librarian/Expert Searchers,Library,Topic Maps — Patrick Durusau @ 12:59 pm

“Why don’t libraries get better the more they are used?”

From the post:

On June 19-20, 2013, the 8th Handheld Librarian Online Conference will take place, an online conference about encouraging innovation inside libraries.

Register now, as an individual, group or site, and receive access to all interactive, live online events and recordings of the sessions!

(…)

The keynote presentation is delivered by Michael Edson, Smithsonian Institution’s Director of Web and New Media Strategy, and is entitled “Faking the Internet”. His central question:

“Why don’t libraries get better the more they are used? Not just a little better—exponentially better, like the Internet. They could, and, in a society facing colossal challenges, they must, but we won’t get there without confronting a few taboos about what a library is, who it’s for, and who’s in charge.”

I will register for this conference.

Mostly to hear Michael Edson’s claim that the Internet has gotten “exponentially better.”

In my experience (yours?), the Internet has gotten exponentially noisier.

If you don’t believe me, write down a question (not the query) and give it to ten (10) random people outside your IT department or library.

Have them print out the first page of search results.

Enough proof?

Edson’s point that information resources should improve with use, on the other hand, is a good one.

For example, contrast your local librarian with a digital resource.

The more questions your librarian fields, the better they become with related information and resources on any subject.

A digital resource which no matter how many times it is queried, the result it returns will always be the same.

A librarian is a dynamic accumulator of information and relationships between information. A digital resource is a static reporter of information.

Unlike librarians, digital resources aren’t designed to accumulate new information or relationships between information from users at the point of interest. (A blog response several screen scrolls away is unseen and unhelpful.)

What we need are UIs for digital resources that enable users to map into those digital resources their insights, relationships and links to other resources.

In their own words.

That type of digital resource could become “exponentially better.”

May 14, 2013

Information organization and the philosophy of history

Filed under: History,Library,Philosophy,Subject Identity — Patrick Durusau @ 3:54 pm

Information organization and the philosophy of history by Ryan Shaw. (Shaw, R. (2013), Information organization and the philosophy of history. J. Am. Soc. Inf. Sci., 64: 1092–1103. doi: 10.1002/asi.22843)

Abstract:

The philosophy of history can help articulate problems relevant to information organization. One such problem is “aboutness”: How do texts relate to the world? In response to this problem, philosophers of history have developed theories of colligation describing how authors bind together phenomena under organizing concepts. Drawing on these ideas, I present a theory of subject analysis that avoids the problematic illusion of an independent “landscape” of subjects. This theory points to a broad vision of the future of information organization and some specific challenges to be met.

You are unlikely to find this article directly actionable in your next topic map project.

On the other hand, if you enjoy the challenge of thinking about how we think, you will find it a real treat.

Shaw writes:

Different interpretive judgments result in overlapping and potentially contradictory organizing principles. Organizing systems ought to make these overlappings evident and show the contours of differences in perspective that distinguish individual judgments. Far from providing a more “complete” view of a static landscape, organizing systems should multiply and juxtapose views. As Geoffrey Bowker (2005) has argued,

the goal of metadata standards should not be to produce a convergent unity. We need to open a discourse—where there is no effective discourse now—about the varying temporalities, spatialities and materialities that we might represent in our databases, with a view to designing for maximum flexibility and allowing as much as possible for an emergent polyphony and polychrony. (pp. 183–184)

The demand for polyphony and polychrony leads to a second challenge, which is to find ways to open the construction of organizing systems to wider participation. How might academics, librarians, teachers, public historians, curators, archivists, documentary editors, genealogists, and independent scholars all contribute to a shared infrastructure for linking and organizing historical discourse through conceptual models? If this challenge can be addressed, the next generation of organizing systems could provide the infrastructure for new kinds of collaborative scholarship and organizing practice.

Once upon a time, you could argue that physical limitations of cataloging systems meant that a single classification system (convergent unity) was necessary for systems to work at all.

But that was an artifact of the physical medium of the catalog.

The deepest irony of the digital age is continuation of the single classification system requirement, a requirement past its discard date.

May 6, 2013

You Say Beowulf, I Say Biowulf [Does Print Shape Digital?]

Filed under: Indexing,Library,Manuscripts — Patrick Durusau @ 5:58 pm

You Say Beowulf, I Say Biowulf by Julian Harrison.

From the post:

Students of medieval manuscripts will know that it’s always instructive to consult the originals, rather than to rely on printed editions. There are many aspects of manuscript culture that do not translate easily onto the printed page — annotations, corrections, changes of scribe, the general layout, the decoration, ownership inscriptions.

Beowulf is a case in point. Only one manuscript of this famous Old English epic poem has survived, which is held at the British Library (Cotton MS Vitellius A XV). The writing of this manuscript was divided between two scribes, the first of whom terminated their stint with the first three lines of f. 175v, ending with the words “sceaden mæl scyran”; their counterpart took over at this point, implying that an earlier exemplar lay behind their text, from which both scribes copied.

(…)

Another distinction between those two scribes, perhaps less familiar to modern students of the text, is the varying way in which they spell the name of the eponymous hero Beowulf. On 40 occasions, Beowulf’s name is spelt in the conventional manner (the first is found in line 18 of the standard editions, the last in line 2510). However, in 7 separate instances, the name is instead spelt “Biowulf” (“let’s call the whole thing off), the first case coming in line 1987 of the poem.

I think you will enjoy the post, to say nothing of the images of the manuscript.

My topic map concern is with:

There are many aspects of manuscript culture that do not translate easily onto the printed page — annotations, corrections, changes of scribe, the general layout, the decoration, ownership inscriptions.

I take it that non-facsimile publication in print loses some of the richness of the manuscript.

My question is: To what extent have we duplicated the limitations of print media in digital publications?

For example, a book may have more than one index, but not more than one index of the same kind.

That is you can’t find a book that has multiple distinct subject indexes. Not surprising considering the printing cost of duplicate subject indexes, but we don’t have that limitation with electronic indexes.

Or do we?

In my experience anyway, electronic indexes mimic their print forefathers. Each electronic index stands on its own, even if each index is of the same work.

Assume we have a Spanish and English index, for the casual reader, to the plays of Shakespeare. Even in electronic form, I assume they would be created and stored as separate indexes.

But isn’t that simply replicating what we would experience with a print edition?

Can you think of other cases where our experience with print media has shaped our choices with digital artifacts?

May 5, 2013

British Library Labs – Competition 2013

Filed under: Contest,Library,Library software,Topic Maps — Patrick Durusau @ 5:29 am

British Library Labs – Competition 2013

Deadline for entry: Wednesday 26 June , 2013 (midnight GMT)

From the webpage:

We want you to propose an innovative and transformative project using the British Library’s digital collections and if your idea is chosen, the Labs team will work with you to make it happen and you could win a prize of up to £3,000.

From the digitisation of thousands of books, newspapers and manuscripts, the curation of UK websites, bird sounds or location data for our maps, over the last two decades we’ve been faithfully amassing a vast and wide-ranging number of digital collections for the nation. What remains elusive, however, is understanding what researchers need in place in order to unlock the potential for new discoveries within these fascinating and diverse sets of digital content.

The Labs competition is designed to attract scholars, explorers, trailblazers and software developers who see the potential for new and innovative research and development opportunities lurking within these immense digital collections. Through soliciting imaginative and transformative projects utilising this content you will be giving us a steer as to the types of new processes, platforms, arrangements, services and tools needed to make it more accessible. We’ll even throw the Library’s resources behind you to make your idea a reality.

Numerous ways to get support for developing your idea before submission.

In terms of PR for your solution (hopefully topic maps based) do note:

Prizes

Winners will get direct curatorial and financial support for completing their project from the Labs team, which may involve an expenses paid residency at the British Library for a mutually agreed period of time (dependent on the winners’ circumstances, the winning ideas, access to resources and budget allowing).

  • Winners will receive £3000 for completing their project
  • Runners-up will receive £1000 for completing their project

The work will take place between between Saturday July 6 and Monday 4 November, 2013, with the completed projects being showcased during November 2013 when prizes will be awarded.

What happens to your ideas?

All ideas will be posted on the Labs website after they have been judged. All project ideas submitted for the competition can continue to be worked on and where possible the Labs team will provide support (time and resources permitting). Well developed projects will be showcased together with the competition winners during November 2013.

This is also a good excuse to spend more time at the British Library website. I don’t spend nearly enough time there myself.

March 28, 2013

Cheminformatics Supplements

Filed under: Cheminformatics,Library — Patrick Durusau @ 6:32 pm

Cheminformatics Supplements

I ran across a pointer today to abstracts for the 8th German Conference on Chemoinformatics: 26 CIC-Workshop from Chemistry Central

I will pull several of the abstracts for fuller treatment but whatever I choose, I will miss the very abstract of interest to you.

Moreover, the link at the top of this post takes you to all the “supplements” from Chemistry Central.

I am sure you will find a wealth of information.

Biodiversity Heritage Library (BHL)

Filed under: Biodiversity,Biology,Environment,Library — Patrick Durusau @ 4:38 pm

Biodiversity Heritage Library (BHL)

Best described by their own “about” page:

The Biodiversity Heritage Library (BHL) is a consortium of natural history and botanical libraries that cooperate to digitize and make accessible the legacy literature of biodiversity held in their collections and to make that literature available for open access and responsible use as a part of a global “biodiversity commons.” The BHL consortium works with the international taxonomic community, rights holders, and other interested parties to ensure that this biodiversity heritage is made available to a global audience through open access principles. In partnership with the Internet Archive and through local digitization efforts , the BHL has digitized millions of pages of taxonomic literature , representing tens of thousands of titles and over 100,000 volumes.

The published literature on biological diversity has limited global distribution; much of it is available in only a few select libraries in the developed world. These collections are of exceptional value because the domain of systematic biology depends, more than any other science, upon historic literature. Yet, this wealth of knowledge is available only to those few who can gain direct access to significant library collections. Literature about the biota existing in developing countries is often not available within their own borders. Biologists have long considered that access to the published literature is one of the chief impediments to the efficiency of research in the field. Free global access to digital literature repatriates information about the earth’s species to all parts of the world.

The BHL consortium members digitize the public domain books and journals held within their collections. To acquire additional content and promote free access to information, the BHL has obtained permission from publishers to digitize and make available significant biodiversity materials that are still under copyright.

Because of BHL’s success in digitizing a significant mass of biodiversity literature, the study of living organisms has become more efficient. The BHL Portal allows users to search the corpus by multiple access points, read the texts online, or download select pages or entire volumes as PDF files.

The BHL serves texts with information on over a million species names. Using UBio’s taxonomic name finding tools, researchers can bring together publications about species and find links to related content in the Encyclopedia of Life. Because of its commitment to open access, BHL provides a range of services and APIs which allow users to harvest source data files and reuse content for research purposes.

Since 2009, the BHL has expanded globally. The European Commission’s eContentPlus program has funded the BHL-Europe project, with 28 institutions, to assemble the European language literature. Additionally, the Chinese Academy of Sciences (BHL-China), the Atlas of Living Australia (BHL-Australia), Brazil (through BHL-SciELO) and the Bibliotheca Alexandrinahave created national or regional BHL nodes. Global nodes are organizational structures that may or may not develop their own BHL portals. It is the goal of BHL to share and serve content through the BHL Portal developed and maintained at the Missouri Botanical Garden. These projects will work together to share content, protocols, services, and digital preservation practices.

A truly remarkable effort!

Would you believe they have a copy of “Aristotle’s History of animals.” In ten books. Tr. by Richard Cresswell? For download as a PDF?

Tell me, how would you reconcile the terminology of Aristotle or of Cresswell for that matter in translation, with modern terminology both for species and their features?

In order to enable navigation from this work to other works in the collection?

Moreover, how would you preserve that navigation for others to use?

Document level granularity is better than not finding a document at all but it is a far cry from being efficient.

BHL-Europe web portal opens up…

Filed under: Biodiversity,Biology,Environment,Library — Patrick Durusau @ 4:18 pm

BHL-Europe web portal opens up the world’s knowledge on biological diversity

From the post:

The goal of the Biodiversity Heritage Library for Europe (BHL-Europe) project is to make published biodiversity literature accessible to anyone who’s interested. The project will provide a multilingual access point (12 languages) for biodiversity content through the BHL-Europe web portal with specific biological functionalities for search and retrieval and through the EUROPEANA portal. Currently BHL-Europe involves 28 major natural history museums, botanical gardens and other cooperating institutions.

BHL-Europe is a 3 year project, funded by the European Commission under the eContentplus programme, as part of the i2010 policy.

Unlimited access to biological diversity information

The libraries of the European natural history museums and botanical gardens collectively hold the majority of the world’s published knowledge on the discovery and subsequent description of biological diversity. However, digital access to this knowledge is difficult.

The BHLproject, launched 2007 in the USA, is systematically attempting to address this problem. In May 2009 the ambitious and innovative EU project ‘Biodiversity Heritage Library for Europe’ (BHL-Europe) was launched. BHL-Europe is coordinated by the Museum für Naturkunde Berlin, Germany, and combines the efforts of 26 European and 2 American institutions. For the first time, the wider public, citizen scientists and decision makers will have unlimited access to this important source of information.

A project with enormous potential, although three (3) years seems a bit short.

Mentioned but without a link, the BHLproject has digitized over 100,000 volumes, with information on more than one million species names.

March 16, 2013

From Records to a Web of Library Data – Pt2 Hubs of Authority

Filed under: Library,Linked Data,LOD,RDF — Patrick Durusau @ 4:00 pm

From Records to a Web of Library Data – Pt2 Hubs of Authority by Richard Wallis.

From the post:

Hubs of Authority

Libraries, probably because of their natural inclination towards cooperation, were ahead of the game in data sharing for many years. The moment computing technology became practical, in the late sixties, cooperative cataloguing initiatives started all over the world either in national libraries or cooperative organisations. Two from personal experience come to mind, BLCMP started in Birmingham, UK in 1969 eventually evolved in to the leading Semantic Web organisation Talis, and in 1967 Dublin, Ohio saw the creation of OCLC. Both in their own way having had significant impact on the worlds of libraries, metadata, and the web (and me!).

One of the obvious impacts of inter-library cooperation over the years has been the authorities, those sources of authoritative names for key elements of bibliographic records. A large number of national libraries have such lists of agreed formats for author and organisational names. The Library of Congress has in addition to its name authorities, subjects, classifications, languages, countries etc. Another obvious success in this area is VIAF, the Virtual International Authority File, which currently aggregates over thirty authority files from all over the world – well used and recognised in library land, and increasingly across the web in general as a source of identifiers for people & organisations.

These, Linked Data enabled, sources of information are developing importance in their own right, as a natural place to link to, when asserting the thing, person, or concept you are identifying in your data. As Sir Tim Berners-Lee’s fourth principle of Linked Data tells us to “Include links to other URIs. so that they can discover more things”. VIAF in particular is becoming such a trusted, authoritative, source of URIs that there is now a VIAFbot responsible for interconnecting Wikipedia and VIAF to surface hundreds of thousands of relevant links to each other. A great hat-tip to Max Klein, OCLC Wikipedian in Residence, for his work in this area.

I don’t deny that VIAF is a very useful tool but if you search for personal name, “Marilyn Monroe,” it returns:

1. Miller, Arthur, 1915-2005
National Library of Australia National Library of the Czech Republic National Diet Library (Japan) Deutsche Nationalbibliothek RERO (Switzerland) SUDOC (France) Library and Archives Canada National Library of Israel (Latin) National Library of Sweden NUKAT Center (Poland) Bibliothèque nationale de France Biblioteca Nacional de España Library of Congress/NACO

Miller, Arthur (Arthur Asher), 1915-2005
National Library of the Netherlands-test

Miller, Arthur, 1915-
Vatican Library Biblioteca Nacional de Portugal

ميلر، ارثر، 1915-2005 م.
Bibliotheca Alexandrina (Egypt)

Miller, Arthur
Wikipedia (en)-test

מילר, ארתור, 1915-2005
National Library of Israel (Hebrew)

2. Monroe, Marilyn, 1926-1962
National Library of Israel (Latin) National Library of the Czech Republic National Diet Library (Japan) Deutsche Nationalbibliothek SUDOC (France) Library and Archives Canada National Library of Australia National Library of Sweden NUKAT Center (Poland) Bibliothèque nationale de France Biblioteca Nacional de España Library of Congress/NACO

Monroe, Marilyn
National Library of the Netherlands-test Wikipedia (en)-test RERO (Switzerland)

Monroe, Marilyn American actress, model, and singer, 1926-1962
Getty Union List of Artist Names

Monroe, Marilyn, pseud.
Biblioteca Nacional de Portugal

3. DiMaggio, Joe, 1914-1999
Library of Congress/NACO Bibliothèque nationale de France

Di Maggio, Joe 1914-1999
Deutsche Nationalbibliothek

Di Maggio, Joseph Paul, 1914-1999
National Diet Library (Japan)

DiMaggio, Joe, 1914-
National Library of Australia

Dimaggio, Joseph Paul, 1914-1999
SUDOC (France)

DiMaggio, Joe (Joseph Paul), 1914-1999
National Library of the Netherlands-test

Dimaggio, Joe
Wikipedia (en)-test

4. Monroe, Marilyn
Deutsche Nationalbibliothek

5. Hurst-Monroe, Marlene
Library of Congress/NACO

6. Wolf, Marilyn Monroe
Deutsche Nationalbibliothek

Maybe Sir Tim is right, users “…can discover more things.”

Some of them are related, some of them are not.

From Records to a Web of Library Data – Pt1 Entification

Filed under: Entities,Library,Linked Data — Patrick Durusau @ 3:10 pm

From Records to a Web of Library Data – Pt1 Entification by Richard Wallis.

From the post:

Entification

Entification – a bit of an ugly word, but in my day to day existence one I am hearing more and more. What an exciting life I lead…

What is it, and why should I care, you may be asking.

I spend much of my time convincing people of the benefits of Linked Data to the library domain, both as a way to publish and share our rich resources with the wider world, and also as a potential stimulator of significant efficiencies in the creation and management of information about those resources. Taking those benefits as being accepted, for the purposes of this post, brings me into discussion with those concerned with the process of getting library data into a linked data form.

As you know, I am far from convinced about the “benefits” of Linked Data, at least with its current definition.

Who knows what definition “Linked Data” may have in some future vision of the W3C? (URL Homonym Problem: A Topic Map Solution, a tale of how the W3C decided to redefine URL.)

But Richard’s point about the ugliness and utility of “entification” is well taken.

So long as you remember that every term can be described “in terms of other things.”

There are no primitive terms, not one.

March 11, 2013

NewGenLib FOSS Library Management System [March 15th, 2013 Webinar]

Filed under: Library,Library software — Patrick Durusau @ 1:28 pm

NewGenLib FOSS Library Management System

From the post:

EIFL-FOSS is organising a free webinar on NewGenLib (NGL), an open-source Library Management System (ILS). The event will take place this coming Friday, March 15th, 2013 at 09.00-10.00 GMT / UK time (10.00-11.00 CET / Rome, Italy). The session is open to anyone to attend but places are limited, so registration is recommended.

NGL, an outcome of collaboration between Verus and Kesavan Institute of Information and Knowledge management, has been implemented in over 30 countries in at least 4 different languages supporting fully international library metadata standards. The software runs on Windows or Linux and is designed to work equally well in one single library as it does across a dispersed network of libraries.

URL for more info: http://www.eifl.net/events/newgenlib-ils-windows-and-linux-free-webina

As you already know, there is no shortage of vendor-based and open source library information systems.

That diversity is an opportunity to show how topic maps can make distinct systems appear as one, while retaining their separate character.

March 9, 2013

Research Data Symposium – Columbia

Research Data Symposium – Columbia.

Posters from the Research Data Symposium, held at Columbia University, February 27, 2013.

Subject to the limitations of the poster genre but useful as a quick overview of current projects and directions.

March 6, 2013

VIAF: The Virtual International Authority File

Filed under: Authority Record,Library,Library Associations,Merging,Subject Identity — Patrick Durusau @ 11:19 am

VIAF: The Virtual International Authority File

From the webpage:

VIAF, implemented and hosted by OCLC, is a joint project of several national libraries plus selected regional and trans-national library agencies. The project’s goal is to lower the cost and increase the utility of library authority files by matching and linking widely-used authority files and making that information available on the Web.

The “about” link at the bottom of the page is broken (in the English version). A working “about” link for VIAF reports:

At a glance

  • A collaborative effort between national libraries and organizations contributing name authority files, furthering access to information
  • All authority data for a given entity is linked together into a “super” authority record
  • A convenient way for the library community and other agencies to repurpose bibliographic data produced by libraries serving different language communities

The Virtual International Authority File (VIAF) is an international service designed to provide convenient access to the world’s major name authority files. Its creators envision the VIAF as a building block for the Semantic Web to enable switching of the displayed form of names for persons to the preferred language and script of the Web user. VIAF began as a joint project with the Library of Congress (LC), the Deutsche Nationalbibliothek (DNB), the Bibliothèque nationale de France (BNF) and OCLC. It has, over the past decade, become a cooperative effort involving an expanding number of other national libraries and other agencies. At the beginning of 2012, contributors include 20 agencies from 16 countries.

Most large libraries maintain lists of names for people, corporations, conferences, and geographic places, as well as lists to control works and other entities. These lists, or authority files, have been developed and maintained in distinctive ways by individual library communities around the world. The differences in how to approach this work become evident as library data from many communities is combined in shared catalogs such as OCLC’s WorldCat.

VIAF helps to make library authority files less expensive to maintain and more generally useful to the library domain and beyond. To achieve this, VIAF matches and links the authority files of national libraries and groups all authority records for a given entity into a merged “super” authority record that brings together the different names for that entity. By linking disparate names for the same person or organization, VIAF provides a convenient means for a wider community of libraries and other agencies to repurpose bibliographic data produced by libraries serving different language communities.

If you were to substitute for ‘”super” authority record,” the term topic, you would be part of the way towards a topic map.

Topics gather information about a given entity into a single location.

Topics differ from the authority records you find at VIAF in two very important ways:

  1. First, topics, unlike authority records, have the ability to merge with other topics, creating new topics that have more information than any of the original topics.
  2. Second, authority records are created by, well, authorities. Do you see your name or the name of your organization on the list at VIAF? Topics can be created by anyone and merged with other topics on terms chosen by the possessor of the topic map. You don’t have to wait for an authority to create the topic or approve your merging of it.

There are definite advantages to having authorities and authority records, but there are also advantages to having the freedom to describe your world, in your terms.

March 1, 2013

Shedding Light on the Dark Data in the Long Tail of Science

Filed under: Curation,Librarian/Expert Searchers,Library — Patrick Durusau @ 5:30 pm

Shedding Light on the Dark Data in the Long Tail of Science by P. Bryan Heidorn. (P. Bryan Heidorn. “Shedding Light on the Dark Data in the Long Tail of Science.” Library Trends 57.2 (2008): 280-299. Project MUSE. Web. 28 Feb. 2013. .)

Abstract:

One of the primary outputs of the scientific enterprise is data, but many institutions such as libraries that are charged with preserving and disseminating scholarly output have largely ignored this form of documentation of scholarly activity. This paper focuses on a particularly troublesome class of data, termed dark data. “Dark data” is not carefully indexed and stored so it becomes nearly invisible to scientists and other potential users and therefore is more likely to remain underutilized and eventually lost. The article discusses how the concepts from long-tail economics can be used to understand potential solutions for better curation of this data. The paper describes why this data is critical to scientific progress, some of the properties of this data, as well as some social and technical barriers to proper management of this class of data. Many potentially useful institutional, social, and technical solutions are under development and are introduced in the last sections of the paper, but these solutions are largely unproven and require additional research and development.

From the article:

In this paper we will use the term dark data to refer to any data that is not easily found by potential users. Dark data may be positive or negative research findings or from either “large” or “small” science. Like dark matter, this dark data on the basis of volume may be more important than that which can be easily seen. The challenge for science policy is to develop institutions and practices such as institutional repositories, which make this data useful for society.

Dark Data = Any data that is not easily found by potential users.

A number of causes are discussed, not the least of which is our old friend, the Tower of Babel.

A final barrier that cannot be overlooked is the Digital Tower of Babel that we have created with seemingly countless proprietary as well as open data formats. This can include versions of the same software products that are incompatible. Some of these formats are very efficient for the individual applications for which they were designed including word processing, databases, spreadsheets, and others, but they are ineffective to support interoperability and preservation.

As you know already, I don’t think the answer to data curation, long term, lies in uniform formats.

Uniform formats are very useful but are domain, project and time bound.

The questions always are:

“What do we do when we change data formats?”

“Do we dump data in old formats that we spent $$$ developing?”

“Do we migrate data in old formats, assuming anyone remembers the old format?”

“Do we document and map across old and new formats, preparing for the next ‘new’ format?”

None of the answers are automatic or free.

But it is better to make in informed choice than a default one of letting potentially valuable data rot.

Looking out for the little guy: Small data curation

Filed under: Curation,Librarian/Expert Searchers,Library — Patrick Durusau @ 5:30 pm

Looking out for the little guy: Small data curation by Katherine Goold Akers. (Akers, K. G. (2013), Looking out for the little guy: Small data curation. Bul. Am. Soc. Info. Sci. Tech., 39: 58–59. doi: 10.1002/bult.2013.1720390317)

Abstract:

While big data and its management are in the spotlight, a vast number of important research projects generate relatively small amounts of data that are nonetheless valuable yet rarely preserved. Such studies are often focused precursors to follow-up work and generate less noisy data than grand scale projects. Yet smaller quantity does not equate to simpler management. Data from smaller studies may be captured in a variety of file formats with no standard approach to documentation, metadata or preparation for archiving or reuse, making its curation even more challenging than for big data. As the information managers most likely to encounter small datasets, academic librarians should cooperate to develop workable strategies to document, organize, preserve and disseminate local small datasets so that valuable scholarly information can be discovered and shared.

A reminder that for every “big data” project in need of curation, there are many more smaller, less well known projects that need the same services.

Since topic maps don’t require global or even regional agreement on ontology or methodological issues, it should be easier for academic librarians to create topic maps to curate small datasets.

When it is necessary or desired to merge small datasets that were curated with different topic map assumptions, new topics can be created that merge the data that existed in separate topic maps.

But only when necessary and at the point of merging.

To say it another way, topic maps need not anticipate or fear the future. Tomorrow will take care of itself.

Unlike “now I am awake” approaches, that must fear the next moment of consciousness will bring change.

February 25, 2013

Linked Data for Holdings and Cataloging

Filed under: Cataloging,Library,Linked Data — Patrick Durusau @ 5:45 am

From the ALA Midwinter Meeting:

Linked Data for Holdings and Cataloging: The First Step Is Always the Hardest! by Eric Miller (Zepheira) and Richard Wallis (OCLC). (Video + Slides)

Linked Data for Holdings and Cataloging: Interactive Session. (Audio)

Since linked data wasn’t designed for human users, the advantage for library catalogs isn’t clear.

Most users can’t use LCSH so perhaps the lack of utility will go unnoticed. (Subject Headings and the Semantic Web)

I first saw this at: Linked Data for Holdings and Cataloging – recordings now available!

« Newer PostsOlder Posts »

Powered by WordPress