Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 18, 2013

International Conference on Theory and Practice of Digital Libraries (TPDL)

Filed under: Conferences,Digital Library,Librarian/Expert Searchers,Library — Patrick Durusau @ 5:26 am

International Conference on Theory and Practice of Digital Libraries (TPDL)

Valletta, Malta, September 22-26, 2013. I thought that would get your attention. Details follow.

Dates:

Full and Short papers, Posters, Panels, and Demonstrations deadline: March 23, 2013

Workshops and Tutorials proposals deadline: March 4, 2013

Doctoral Consortium papers submission deadline: June 2, 2013

Notification of acceptance for Papers, Posters, and Demonstrations: May 20, 2013

Notification of acceptance for Panels, Workshops and Tutorials: April 22, 2013

Doctoral Consortium acceptance notification: June 24, 2013

Camera ready versions: June 9, 2013

End of early registration: July 31, 2013

Conference dates: September 22-26, 2013

The general theme of the conference is “Sharing meaningful information,” a theme reflected in the topics for conference submissions:

General areas of interests include, but are not limited to, the following topics, organized in four categories, according to a conceptualization that coincides with the four arms of the Maltese Cross:

Foundations

  • Information models
  • Digital library conceptual models and formal issues
  • Digital library 2.0
  • Digital library education curricula
  • Economic and legal aspects (e.g. rights management) landscape for digital libraries
  • Theoretical models of information interaction and organization
  • Information policies
  • Studies of human factors in networked information
  • Scholarly primitives
  • Novel research tools and methods with emphasis on digital humanities
  • User behavior analysis and modeling
  • Social-technical perspectives of digital information

Infrastructures

  • Digital library architectures
  • Cloud and grid deployments
  • Federation of repositories
  • Collaborative and participatory information environments
  • Data storage and indexing
  • Big data management
  • e-science, e-government, e-learning, cultural heritage infrastructures
  • Semi structured data
  • Semantic web issues in digital libraries
  • Ontologies and knowledge organization systems
  • Linked Data and its applications

Content

  • Metadata schemas with emphasis to metadata for composite content (Multimedia, geographical, statistical data and other special content formats)
  • Interoperability and Information integration
  • Digital Curation and related workflows
  • Preservation, authenticity and provenance
  • Web archiving
  • Social media and dynamically generated content for particular uses/communities (education, science, public, etc.)
  • Crowdsourcing
  • 3D models indexing and retrieval
  • Authority management issues

Services

  • Information Retrieval and browsing
  • Multilingual and Multimedia Information Retrieval
  • Personalization in digital libraries
  • Context awareness in information access
  • Semantic aware services
  • Technologies for delivering/accessing digital libraries, e.g. mobile devices
  • Visualization of large-scale information environments
  • Evaluation of online information environments
  • Quality metrics
  • Interfaces to digital libraries
  • Data mining/extraction of structure from networked information
  • Social networks analysis and virtual organizations
  • Traditional and alternative metrics of scholarly communication
  • Mashups of resources

Do you know if there are plans for recording presentations?

Given the location and diminishing travel funding, an efficient way to increase the impact of the presentations.

January 27, 2013

Paper Machines: About Cards & Catalogs, 1548-1929

Filed under: Cataloging,Library — Patrick Durusau @ 5:43 pm

Paper Machines: About Cards & Catalogs, 1548-1929 by Markus Krajewski, translated by Peter Krapp.

From the webpage:

Today on almost every desk in every office sits a computer. Eighty years ago, desktops were equipped with a nonelectronic data processing machine: a card file. In Paper Machines, Markus Krajewski traces the evolution of this proto-computer of rearrangeable parts (file cards) that became ubiquitous in offices between the world wars.

The story begins with Konrad Gessner, a sixteenth-century Swiss polymath who described a new method of processing data: to cut up a sheet of handwritten notes into slips of paper, with one fact or topic per slip, and arrange as desired. In the late eighteenth century, the card catalog became the librarian’s answer to the threat of information overload. Then, at the turn of the twentieth century, business adopted the technology of the card catalog as a bookkeeping tool. Krajewski explores this conceptual development and casts the card file as a “universal paper machine” that accomplishes the basic operations of Turing’s universal discrete machine: storing, processing, and transferring data. In telling his story, Krajewski takes the reader on a number of illuminating detours, telling us, for example, that the card catalog and the numbered street address emerged at the same time in the same city (Vienna), and that Harvard University’s home-grown cataloging system grew out of a librarian’s laziness; and that Melvil Dewey (originator of the Dewey Decimal System) helped bring about the technology transfer of card files to business.

Before ordering a copy, you may want to read Alistair Black’s review. Despite an overall positive impression, Alistair records:

Be warned, Paper Machines is not an easy read. It is not just that in some sections the narrative jumps around, points already firmly made are needlessly repeated, the characters in the plot are not always introduced carefully enough, and a great deal seems to have been lost in translation. More serious than these difficulties, the book is written entirely in the present tense. This is both disconcerting and distracting. I’m surprised the editorial team (the book is part of a monograph series titled “History and Foundations of Information Science”) and a publisher as reputable as the MIT Press allowed this to happen; unless, that is, the original German version was itself written in the present tense, which for a historical discourse I would find baffling.

Alistair does conclude:

My final advice with respect to this book: it is a good addition to the emerging field of information history and the reader should persevere with it, despite its deficiencies in narrative style. The excellent illustrations will help in this regard.

Taking Alistair’s comments at face value, I would have to agree that correction of them would make the book an easier read.

On the other hand, working through Paper Machines and perhaps developing references in addition to those given, will give many hours of delight.

January 20, 2013

…NCSU Library URLs in the Common Crawl Index

Filed under: Common Crawl,Library — Patrick Durusau @ 8:03 pm

Analysis of the NCSU Library URLs in the Common Crawl Index by Lisa Green.

From the post:

Last week we announced the Common Crawl URL Index. The index has already proved useful to many people and we would like to share an interesting use of the index that was very well described in a great blog post by Jason Ronallo.

Jason is the Associate Head of Digital Library Initiatives at North Carolina State University Libraries. He used the Common Crawl Index to look at NCSU Library URLs in the Common Crawl Index. You can see his description of his work and results below and on his blog. Be sure to follow Jason on Twitter and on his blog to keep up to date with other interesting work he does!

A great starting point for using the Common Crawl Index!

January 16, 2013

Research Data Curation Bibliography

Filed under: Archives,Curation,Data Preservation,Librarian/Expert Searchers,Library — Patrick Durusau @ 7:56 pm

Research Data Curation Bibliography (version 2) by Charles W. Bailey.

From the introduction:

The Research Data Curation Bibliography includes selected English-language articles, books, and technical reports that are useful in understanding the curation of digital research data in academic and other research institutions. For broader coverage of the digital curation literature, see the author's Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works,which presents over 650 English-language articles, books, and technical reports.

The "digital curation" concept is still evolving. In "Digital Curation and Trusted Repositories: Steps toward Success," Christopher A. Lee and Helen R. Tibbo define digital curation as follows:

Digital curation involves selection and appraisal by creators and archivists; evolving provision of intellectual access; redundant storage; data transformations; and, for some materials, a commitment to long-term preservation. Digital curation is stewardship that provides for the reproducibility and re-use of authentic digital data and other digital assets. Development of trustworthy and durable digital repositories; principles of sound metadata creation and capture; use of open standards for file formats and data encoding; and the promotion of information management literacy are all essential to the longevity of digital resources and the success of curation efforts.

This bibliography does not cover conference papers, digital media works (such as MP3 files), editorials, e-mail messages, interviews, letters to the editor, presentation slides or transcripts, unpublished e-prints, or weblog postings. Coverage of technical reports is very selective.

Most sources have been published from 2000 through 2012; however, a limited number of earlier key sources are also included. The bibliography includes links to freely available versions of included works. If such versions are unavailable, italicized links to the publishers' descriptions are provided.

Such links, even to publisher versions and versions in disciplinary archives and institutional repositories, are subject to change. URLs may alter without warning (or automatic forwarding) or they may disappear altogether. Inclusion of links to works on authors' personal websites is highly selective. Note that e prints and published articles may not be identical.

An archive of prior versions of the bibliography is available.

If you are a beginning library student, take the time to know the work of Charles Bailey. He has consistently made a positive contribution for researchers from very early in the so-called digital revolution.

To the extent that you want to design topic maps for data curation, long or short term, the 200+ items in this bibliography will introduce you to some of the issues you will be facing.

January 9, 2013

NewGenLib Open Source…Update! [Library software]

Filed under: Library,Library software,OPACS,Software — Patrick Durusau @ 12:00 pm

NewGenLib Open Source releases version 3.0.4 R1 Update 1

From the blog:

The NewGenLib Open Source has announced the release of a new version 3.0.4 R1 Update 1. NewGenLib is an integrated library management system developed by Verus Solutions in conjunction with Kesaran Institute of Information and Knowledge Management in India. The software has the modules acquisitions, technical processing, serials management, circulation, administration, and MIS reports and OPAC.

What’s new in the Update?

This new update comes with a basket of additional features and enhancements, these include:

  • Full text indexing and searching of digital attachments: NewGenLib now uses Apache Tika. With this new tool not only catalogue records but their digital attachments and URLs are indexed. Now you can also search based on the content of your digital attachments
  • Web statistics: The software facilitates the generation of statistics on OPAC usage by having an allowance for Google Analytics code.
  • User ratings of Catalogue Records: An enhancement for User reviews is provided in OPAC. Users can now rate a catalogue record on a scale of 5 (Most useful to not useful). Also, one level of approval is added for User reviews and ratings. 
  • Circulation history download: Users can now download their Circulation history as a PDF file in OPAC

NewGenLib supports MARC 21 bibliographic data, MARC authority files, Z39.50 Client for federated searching. Bibliographic records can be exported in MODS 3.0 and AGRIS AP . The software is OAI-PMH compliant. NewGenLib has a user community with an online discussion forum.

If you are looking for potential topic map markets, the country population rank graphic from Wikipedia may help:
World Population Graph

Population isn’t everything but it should not be ignored either.

December 21, 2012

SPARQL end-point of data.euorpeana.edu

Filed under: Library,Museums,RDF,SPARQL — Patrick Durusau @ 3:22 pm

SPARQL end-point of data.euorpeana.edu

From the webpage:

Welcome on the SPARQL end-point of data.europeana.eu!

data.europeana.eu currently contains open metadata on 20 million texts, images, videos and sounds gathered by Europeana. Data is following the terms of the Creative Commons CC0 public domain dedication. Data is described the Resource Description Framework (RDF) format, and structured using the Europeana Data Model (EDM). We give more detail on the EDM data we publish on the technical details page.

Please take the time to check out the list of collections currently included in the pilot.

The terms of use and external data sources appearing at data.europeana.eu are provided on the Europeana Data sources page.

Sample queries are available on the sparql page.

At first I wondered why this was news because: Europeana opens up data on 20 million cultural items appeared on 12 September 2012 in the Guardian

I assume the data has been in use since its release last September.

If you have been using it, can you comment on how your use will change now that the data is available as a SPARQL end-point?

November 26, 2012

UILLD 2013 — User interaction built on library linked data

Filed under: Interface Research/Design,Library,Linked Data,Usability,Users — Patrick Durusau @ 4:48 pm

UILLD 2013: Workshop on User interaction built on library linked data (UILLD) Pre-conference to the 79th World Library and Information Conference, Jurong Regional Library, Singapore.

Important Dates:

Paper submission deadline: February 28, 2013
Acceptance notification: May 15, 2013
Camera-ready versions of accepted papers: June 30, 2013
Workshop date: August 16, 2013

From the webpage:

The quantity of Linked Data published by libraries is increasing dramatically: Following the lead of the National Library of Sweden (2008), several libraries and library networks have begun to publish authority files and bibliographic information as linked (open) data. However, applications that consume this data are not yet widespread. Particularly, there is a lack of methods for integration of Linked Data from multiple sources and its presentation in appropriate end user interfaces. Existing services tend to build on one or two well integrated datasets – often from the same data supplier – and do not actively use the links provided to other datasets within or outside of the library or cultural heritage sector to provide a better user experience.

CALL FOR PAPERS

The main objective of this workshop/pre-conference is to provide a platform for discussion of deployed services, concepts, and approaches for consuming Linked Data from libraries and other cultural heritage institutions. Special attention will be given to papers presenting working end user interfaces using Linked Data from both cultural heritage institutions (including libraries) and other datasets.

For further information about the workshop, please contact the workshops chairs at uilld2013@gmail.com

In connection with this workshop, see also: IFLA World Library and Information Congress 79th IFLA General Conference and Assembly.

I first saw this in a tweet by Ivan Herman.

Bibliographic Framework as a Web of Data:…

Filed under: BIBFRAME,Library,Linked Data — Patrick Durusau @ 9:53 am

Bibliographic Framework as a Web of Data: Linked Data Model and Supporting Services (PDF)

From the introduction:

The new, proposed model is simply called BIBFRAME, short for Bibliographic Framework. The new model is more than a mere replacement for the library community’s current model/format, MARC. It is the foundation for the future of bibliographic description that happens on, in, and as part of the web and the networked world we live in. It is designed to integrate with and engage in the wider information community while also serving the very specific needs of its maintenance community – libraries and similar memory organizations. It will realize these objectives in several ways:

  1. Differentiate clearly between conceptual content and its physical manifestation(s) (e.g., works and instances)
  2. Focus on unambiguously identifying information entities (e.g., authorities)
  3. Leverage and expose relationships between and among entities

In a web-scale world, it is imperative to be able to cite library data in a way that not only differentiates the conceptual work (a title and author) from the physical details about that work’s manifestation (page numbers, whether it has illustrations) but also clearly identifies entities involved in the creation of a resource (authors, publishers) and the concepts (subjects) associated with a resource. Standard library description practices, at least until now, have focused on creating catalog records that are independently understandable, by aggregating information about the conceptual work and its physical carrier and by relying heavily on the use of lexical strings for identifiers, such as the name of an author. The proposed BIBFRAME model encourages the creation of clearly identified entities and the use of machine-friendly identifiers which lend themselves to machine interpretation for those entities.

An important draft from the Library of Congress on the BIBFRAME proposal.

Please review and comment. (Plus forward to your library friends.)

I first saw this in a tweet by Ivan Herman.

November 19, 2012

D-Lib

Filed under: Archives,Digital Library,Library — Patrick Durusau @ 4:26 pm

D-Lib

From the about page:

D-Lib Magazine is an electronic publication with a focus on digital library research and development, including new technologies, applications, and contextual social and economic issues. D-Lib Magazine appeals to a broad technical and professional audience. The primary goal of the magazine is timely and efficient information exchange for the digital library community to help digital libraries be a broad interdisciplinary field, and not a set of specialties that know little of each other.

I am about to post concerning an article in D-Lib and realized I don’t have a blog entry on D-Lib!

Not that it is topic map specific but it is digital library specific, with all the issues that entails. Remarkably similar to the issues any topic map author or software will face.

D-Lib has proven what many of us suspected:

The quality of content is not related to the medium of delivery.

Enjoy!

Constructing a true LCSH tree of a science and engineering collection

Filed under: Cataloging,Classification,Classification Trees,Hierarchy,LCSH,Library,Trees — Patrick Durusau @ 5:49 am

Constructing a true LCSH tree of a science and engineering collection by Charles-Antoine Julien, Pierre Tirilly, John E. Leide and Catherine Guastavino.

Abstract:

The Library of Congress Subject Headings (LCSH) is a subject structure used to index large library collections throughout the world. Browsing a collection through LCSH is difficult using current online tools in part because users cannot explore the structure using their existing experience navigating file hierarchies on their hard drives. This is due to inconsistencies in the LCSH structure, which does not adhere to the specific rules defining tree structures. This article proposes a method to adapt the LCSH structure to reflect a real-world collection from the domain of science and engineering. This structure is transformed into a valid tree structure using an automatic process. The analysis of the resulting LCSH tree shows a large and complex structure. The analysis of the distribution of information within the LCSH tree reveals a power law distribution where the vast majority of subjects contain few information items and a few subjects contain the vast majority of the collection.

After a detailed analysis of records from the McGill University Libraries (204,430 topical authority records) and 130,940 bibliographic records (Schulich Science and Engineering Library), the authors conclude in part:

This revealed that the structure was large, highly redundant due to multiple inheritances, very deep, and unbalanced. The complexity of the LCSH tree is a likely usability barrier for subject browsing and navigation of the information collection.

For me the most compelling part of this research was the focus on LCSH as used and not as it imagines itself. Very interesting reading. A slow walk through the bibliography will interest those researching LCSH or classification more generally.

Demonstration of the power law with the use of LCSH makes one wonder about other classification systems as used.

November 3, 2012

Complexity Explorer Project

Filed under: Complexity,Document Management,Library — Patrick Durusau @ 6:33 pm

Complexity Explorer Project

A website development project that reports that when “live” it will serve (among others):

Scientist keeping up to date on papers with Source Materials Search Engine and Paper Summaries

Professor designing new course on complexity

High-school science teacher using virtual laboratory for student science projects

Non-expert learning how complex systems science relates to their own field

Scheduled to go beta in the Fall of 2012.

As always, of interest to see how semantic issues are handled in research/library settings.

October 30, 2012

Bibliographic Framework Transition Initiative

Filed under: Library,Library software,MARC — Patrick Durusau @ 9:50 am

Bibliographic Framework Transition Initiative

The original announcement for this project lists its requirements but the requirements are not listed on the homepage.

The requirements are found at: The Library of Congress issues its initial plan for its Bibliographic Framework Transition Initiative for dissemination, sharing, and feedback (October 31, 2011) . Nothing in the link text says “requirements here” to me.

To effectively participate in discussions about this transition you need to know the requirements.

Requirements as of the original announcement:

Requirements for a New Bibliographic Framework Environment

Although the MARC-based infrastructure is extensive, and MARC has been adapted to changing technologies, a major effort to create a comparable exchange vehicle that is grounded in the current and expected future shape of data interchange is needed. To assure a new environment will allow reuse of valuable data and remain supportive of the current one, in addition to advancing it, the following requirements provide a basis for this work. Discussion with colleagues in the community has informed these requirements for beginning the transition to a "new bibliographic framework". Bibliographic framework is intended to indicate an environment rather than a "format".

  • Broad accommodation of content rules and data models. The new environment should be agnostic to cataloging rules, in recognition that different rules are used by different communities, for different aspects of a description, and for descriptions created in different eras, and that some metadata are not rule based. The accommodation of RDA (Resource Description and Access) will be a key factor in the development of elements, as will other mainstream library, archive, and cultural community rules such as Anglo-American Cataloguing Rules, 2nd edition (AACR2) and its predecessors, as well as DACS (Describing Archives, a Content Standard), VRA (Visual Resources Association) Core, CCO (Cataloging Cultural Objects).
  • Provision for types of data that logically accompany or support bibliographic description, such as holdings, authority, classification, preservation, technical, rights, and archival metadata. These may be accommodated through linking technological components in a modular way, standard extensions, and other techniques.
  • Accommodation of textual data, linked data with URIs instead of text, and both. It is recognized that a variety of environments and systems will exist with different capabilities for communicating and receiving and using textual data and links.
  • Consideration of the relationships between and recommendations for communications format tagging, record input conventions, and system storage/manipulation. While these environments tend to blur with today’s technology, a future bibliographic framework is likely to be seen less by catalogers than the current MARC format. Internal storage, displays from communicated data, and input screens are unlikely to have the close relationship to a communication format that they have had in the past.
  • Consideration of the needs of all sizes and types of libraries, from small public to large research. The library community is not homogeneous in the functionality needed to support its users in spite of the central role of bibliographic description of resources within cultural institutions. Although the MARC format became a key factor in the development of systems and services, libraries implement services according to the needs of their users and their available resources. The new bibliographic framework will continue to support simpler needs in addition to those of large research libraries.
  • Continuation of maintenance of MARC until no longer necessary. It is recognized that systems and services based on the MARC 21 communications record will be an important part of the infrastructure for many years. With library budgets already stretched to cover resource purchases, large system changes are difficult to implement because of the associated costs. With the migration in the near term of a large segment of the library community from AACR to RDA, we will need to have RDA-adapted MARC available. While that need is already being addressed, it is recognized that RDA is still evolving and additional changes may be required. Changes to MARC not associated with RDA should be minimal as the energy of the community focuses on the implementation of RDA and on this initiative.
  • Compatibility with MARC-based records. While a new schema for communications could be radically different, it will need to enable use of data currently found in MARC, since redescribing resources will not be feasible. Ideally there would be an option to preserve all data from a MARC record.
  • Provision of transformation from MARC 21 to a new bibliographic environment. A key requirement will be software that converts data to be moved from MARC to the new bibliographic framework and back, if possible, in order to enable experimentation, testing, and other activities related to evolution of the environment.

The Library of Congress (LC) and its MARC partners are interested in a deliberate change that allows the community to move into the future with a more robust, open, and extensible carrier for our rich bibliographic data, and one that better accommodates the library community’s new cataloging rules, RDA. The effort will take place in parallel with the maintenance of MARC 21 as new models are tested. It is expected that new systems and services will be developed to help libraries and provide the same cost savings they do today. Sensitivity to the effect of rapid change enables gradual implementation by systems and infrastructures, and preserves compatibility with existing data.

Ongoing discussion at: Bibliographic Framework Transition Initiative Forum, BIBFRAME@LISTSERV.LOC.GOV.

The requirements recognize a future of semantic and technological heterogeneity.

Similar to the semantic and technological heterogeneity we have now and have had in the past.

A warning to those expecting a semantic and technological rapture of homogeneity.

(I first saw this initiative at: NoSQL Bibliographic Records: Implementing a Native FRBR Datastore with Redis.)

NoSQL Bibliographic Records:…

Filed under: Bibliography,Library,MARC,NoSQL — Patrick Durusau @ 8:58 am

NoSQL Bibliographic Records: Implementing a Native FRBR Datastore with Redis by Jeremy Nelson.

From the background:

Using the Library of Congress Bibliographic Framework for the Digital Age as the starting point for software development requirements; the FRBR-Redis-Datastore project is a proof-of-concept for a next-generation bibliographic NoSQL system within the context of improving upon the current MARC catalog and digital repository of a small academic library at a top-tier liberal arts college.

The FRBR-Redis-Datastore project starts with a basic understanding of the MARC, MODS, and FRBR implemented using a NoSQL technology called Redis.

This presentation guides you through the theories and technologies behind one such proof-of-concept bibliographic framework for the 21st century.

I found the answer to “Well, Why Not Hadoop?”

Hadoop was just too complicated compared to the simple three-step Redis server set-up.

refreshing.

Simply because a technology is popular doesn’t mean it meets your requirements. Such as administration by non-full time technical experts.

An Oracle database supports applications that could manage garden club finances but that’s a poor choice under most circumstances.

The Redis part of the presentation is apparently not working (I get Python errors) as of today and I have sent a note with the error messages.

A “proof-of-concept” that merits your attention!

October 24, 2012

JournalTOCs

Filed under: Data Source,Library,Library software,Publishing — Patrick Durusau @ 4:02 pm

JournalTOCs

Most publishers have TOC services for new issues of their journals.

JournalTOCs aggregates TOCs from publishers and maintains a searchable database of their TOC postings.

A database that is accessible via a free API I should add.

The API should be a useful way to add journal articles to a topic map, particularly when you want to add selected articles and not entire issues.

I am looking forward to using and exploring JournalTOCs.

Suggest you do the same.

September 13, 2012

Europeana opens up data on 20 million cultural items

Filed under: Archives,Data,Dataset,Europeana,Library,Museums — Patrick Durusau @ 3:25 pm

Europeana opens up data on 20 million cultural items by Jonathan Gray (Open Knowledge Foundation):

From the post:

Europe‘s digital library Europeana has been described as the ‘jewel in the crown’ of the sprawling web estate of EU institutions.

It aggregates digitised books, paintings, photographs, recordings and films from over 2,200 contributing cultural heritage organisations across Europe – including major national bodies such as the British Library, the Louvre and the Rijksmuseum.

Today [Wednesday, 12 September 2012] Europeana is opening up data about all 20 million of the items it holds under the CC0 rights waiver. This means that anyone can reuse the data for any purpose – whether using it to build applications to bring cultural content to new audiences in new ways, or analysing it to improve our understanding of Europe’s cultural and intellectual history.

This is a coup d’etat for advocates of open cultural data. The data is being released after a grueling and unenviable internal negotiation process that has lasted over a year – involving countless meetings, workshops, and white papers presenting arguments and evidence for the benefits of openness.

That is good news!

A familiar issue that it overcomes:

To complicate things even further, many public institutions actively prohibit the redistribution of information in their catalogues (as they sell it to – or are locked into restrictive agreements with – third party companies). This means it is not easy to join the dots to see which items live where across multiple online and offline collections.

Oh, yeah! That was one of Google’s reasons for pulling the plug on the Open Knowledge Graph. Google had restrictive agreements so you can only connect the dots with Google products. (I think there is a name for that, let me think about it. Maybe an EU prosecutor might know it. You could always ask.)

What are you going to be mapping from this collection?

September 11, 2012

Linked Data in Libraries, Archives, and Museums

Filed under: Archives,Library,Linked Data,Museums — Patrick Durusau @ 2:23 pm

Linked Data in Libraries, Archives, and Museums Information Standards Quarterly (ISQ) Spring/Summer 2012, Volume 24, no. 2/3 http://dx.doi.org/10.3789/isqv24n2-3.2012.

Interesting reading on linked data.

I have some comments on the “discovery” of the need to manage “diverse, heterogeneous metadata” but will save them for another post.

From the “flyer” that landed in my inbox:

The National Information Standards Organization (NISO) announces the publication of a special themed issue of the Information Standards Quarterly (ISQ) magazine on Linked Data for Libraries, Archives, and Museums. ISQ Guest Content Editor, Corey Harper, Metadata Services Librarian, New York University has pulled together a broad range of perspectives on what is happening today with linked data in cultural institutions. He states in his introductory letter, “As the Linked Data Web continues to expand, significant challenges remain around integrating such diverse data sources. As the variance of the data becomes increasingly clear, there is an emerging need for an infrastructure to manage the diverse vocabularies used throughout the Web-wide network of distributed metadata. Development and change in this area has been rapidly increasing; this is particularly exciting, as it gives a broad overview on the scope and breadth of developments happening in the world of Linked Open Data for Libraries, Archives, and Museums.”

The feature article by Gordon Dunsire, Corey Harper, Diane Hillmann, and Jon Phipps on Linked Data Vocabulary Management describes the shift in popular approaches to large-scale metadata management and interoperability to the increasing use of the Resource Description Framework to link bibliographic data into the larger web community. The authors also identify areas where best practices and standards are needed to ensure a common and effective linked data vocabulary infrastructure.

Four “in practice” articles illustrate the growth in the implementation of linked data in the cultural sector. Jane Stevenson in Linking Lives describes the work to enable structured and linked data from the Archives Hub in the UK. In Joining the Linked Data Cloud in a Cost-Effective Manner, Seth van Hooland, Ruben Verborgh, and Rik Van de Walle show how general purpose Interactive Data Transformation tools, such as Google Refine, can be used to efficiently perform the necessary task of data cleaning and reconciliation that precedes the opening up of linked data. Ted Fons, Jeff Penka, and Richard Wallis discuss OCLC’s Linked Data Initiative and the use of Schema.org in WorldCat to make library data relevant on the web. In Europeana: Moving to Linked Open Data , Antoine Isaac, Robina Clayphan, and Bernhard Haslhofer explain how the metadata for over 23 million objects are being converted to an RDF-based linked data model in the European Union’s flagship digital cultural heritage initiative.

Jon Voss provides a status on Linked Open Data for Libraries, Archives, and Museums (LODLAM) State of Affairs and the annual summit to advance this work. Thomas Elliott, Sebastian Heath, John Muccigrosso Report on the Linked Ancient World Data Institute, a workshop to further the availability of linked open data to create reusable digital resources with the classical studies disciplines.

Kevin Ford wraps up the contributed articles with a standard spotlight article on LC’s Bibliographic Framework Initiative and the Attractiveness of Linked Data. This Library of Congress-led community effort aims to transition from MARC 21 to a linked data model. “The move to a linked data model in libraries and other cultural institutions represents one of the most profound changes that our community is confronting,” stated Todd Carpenter, NISO Executive Director. “While it completely alters the way we have always described and cataloged bibliographic information, it offers tremendous opportunities for making this data accessible and usable in the larger, global web community. This special issue of ISQ demonstrates the great strides that libraries, archives, and museums have already made in this arena and illustrates the future world that awaits us.”

August 21, 2012

Putting WorldCat Data Into A Triple Store

Filed under: Library,Linked Data,RDF,WorldCat — Patrick Durusau @ 10:32 am

Putting WorldCat Data Into A Triple Store by Richard Wallis.

From the post:

I can not really get away with making a statement like “Better still, download and install a triplestore [such as 4Store], load up the approximately 80 million triples and practice some SPARQL on them” and then not following it up.

I made it in my previous post Get Yourself a Linked Data Piece of WorldCat to Play With in which I was highlighting the release of a download file containing RDF descriptions of the 1.2 million most highly held resources in WorldCat.org – to make the cut, a resource had to be held by more than 250 libraries.

So here for those that are interested is a step by step description of what I did to follow my own encouragement to load up the triples and start playing.

Have you loaded the WorldCat linked data into a triple store?

Some other storage mechanism?

August 18, 2012

Does Time Fix All? [And my response]

Filed under: Librarian/Expert Searchers,Library,WWW — Patrick Durusau @ 3:51 pm

Does Time Fix All? by Daniel Lemire, starts off:

As an graduate, finding useful references was painful. What the librarians had come up with were terrible time-consuming systems. It took an outsider (Berners-Lee) to invent the Web. Even so, the librarians were slow to adopt the Web and you could often see them warn students against using the Web as part of their research. Some of us ignored them and posted our papers online, or searched for papers online. Many, many years later, we are still a crazy minority but a new generation of librarians has finally adopted the Web.

What do you conclude from this story?

Whenever you point to a difficult systemic problem (e.g., it is time consuming to find references), someone will reply that “time fixes everything”. A more sophisticated way to express this belief is to say that systems are self-correcting.

Here is my response:

From above: “… What the librarians had come up with were terrible time-consuming systems. It took an outsider (Berners-Lee) to invent the Web….”

Really?

You mean the librarians who had been working on digital retrieval since the late 1940’s and subject retrieval longer than that? Those librarians?

With the web, every user repeats the search effort of others. Why isn’t repeating the effort of others a “terrible time-consuming system?”

BTW, Berners-Lee invented allowing 404s for hyperlinks. Significant because it lowered the overhead of hyperlinking enough to be practical. It was other CS types with high overhead hyperlinking. Not librarians.

Berners-Lee fixed hyperlinking maintenance, failed and continues to fail on IR. Or have you not noticed?

I won’t amplify my answer here but will wait to see what happens to my comment at Daniel’s blog.

August 16, 2012

Get Yourself a Linked Data Piece of WorldCat to Play With

Filed under: Library,RDF,WorldCat — Patrick Durusau @ 7:26 pm

Get Yourself a Linked Data Piece of WorldCat to Play With by Richard Wallis.

From the post:

You may remember my frustration a couple of months ago, at being in the air when OCLC announced the addition of Schema.org marked up Linked Data to all resources in WorldCat.org. Those of you who attended the OCLC Linked Data Round Table at IFLA 2012 in Helsinki yesterday, will know that I got my own back on the folks who publish the press releases at OCLC, by announcing the next WorldCat step along the Linked Data road whilst they were still in bed.

The Round Table was an excellent very interactive session with Neil Wilson from the British Library, Emmanuelle Bermes from Centre Pompidou, and Martin Malmsten of the Nation Library of Sweden, which I will cover elsewhere. For now, you will find my presentation Library Linked Data Progress on my SlideShare site.

After we experimentally added RDFa embedded linked data, using Schema.org markup and some proposed Library extensions, to WorldCat pages, one the most often questions I was asked was where can I get my hands on some of this raw data?

We are taking the application of linked data to WorldCat one step at a time so that we can learn from how people use and comment on it. So at that time if you wanted to see the raw data the only way was to use a tool [such as the W3C RDFA 1.1 Distiller] to parse the data out of the pages, just as the search engines do.

So I am really pleased to announce that you can now download a significant chunk of that data as RDF triples. Especially in experimental form, providing the whole lot as a download would have bit of a challenge, even just in disk space and bandwidth terms. So which chunk to choose was a question. We could have chosen a random selection, but decided instead to pick the most popular, in terms of holdings, resources in WorldCat – an interesting selection in it’s own right.

To make the cut, a resource had to be held by more than 250 libraries. It turns out that almost 1.2 million fall in to this category, so a sizeable chunk indeed. To get your hands on this data, download the 1Gb gzipped file. It is in RDF n-triples form, so you can take a look at the raw data in the file itself. Better still, download and install a triplestore [such as 4Store], load up the approximately 80 million triples and practice some SPARQL on them.

That’s a nice sized collection of data. In any format.

But next to last sentence of the post reads:

As I say in the press release, posted after my announcement, we are really interested to see what people will do with this data.

Déjà vu?

I think I have heard that question asked with other linked data releases. You? Pointers?

I first saw this at SemanticWeb.com.

August 10, 2012

Data-Intensive Librarians for Data-Intensive Research

Data-Intensive Librarians for Data-Intensive Research by Chelcie Rowell.

From the post:

A packed house heard Tony Hey and Clifford Lynch present on The Fourth Paradigm: Data-Intensive Research, Digital Scholarship and Implications for Libraries at the 2012 ALA Annual Conference.

Jim Gray coined The Fourth Paradigm in 2007 to reflect a movement toward data-intensive science. Adapting to this change would, Gray noted, require an infrastructure to support the dissemination of both published work and underlying research data. But the return on investment for building the infrastructure would be to accelerate the transformation of raw data to recombined data to knowledge.

In outlining the current research landscape, Hey and Lynch underscored how right Gray was.

Hey led the audience on a whirlwind tour of how scientific research is practiced in the Fourth Paradigm. He showcased several projects that manage data from capture to curation to analysis and long-term preservation. One example he mentioned was the Dataverse Network Project that is working to preserve diverse scholarly outputs from published work to data, images and software.

Lynch reflected on the changing nature of the scientific record and the different collaborative structures that will be needed to define, generate and preserve that record. He noted that we tend to think of the scholarly record in terms of published works. In light of data-intensive science, Lynch said the definition must be expanded to include the datasets which underlie results and the software required to render data.

I wasn’t able to find a video of the presentations and/or slides but while you wait for those to appear, you can consult the homepages of Lynch and Hey for related materials.

Librarians already have searching and bibliographic skills, which are appropriate to the Fourth Paradigm.

What if they were to add big data design, if not processing, skills to their resumes?

What if articles in professional journals carried a byline in addition to the authors: Librarian(s): ?

August 9, 2012

The Bookless Library

Filed under: Books,Library — Patrick Durusau @ 3:45 pm

The Bookless Library by David A. Bell. (New Republic, July 12, 2012)

Although Bell is quick to dismiss the notion of libraries without physical books, the confusion of libraries with physical books is one that has hurt the cause of libraries.

He remarks:

Libraries are also sources of crucial expertise. Librarians do not just maintain physical collections of books. Among other things, they guide readers, maintain catalogues, develop access portals for electronic sources, organize special programs and exhibitions, oversee special collections, and make acquisition decisions. The fact that more and more acquisition decisions now involve a question of which databases to subscribe to, rather than which physical books and journals to buy, does not make these functions any less important. To the contrary: the digital landscape is wild and wooly, and it is crucial to have well-trained, well-informed librarians on hand to figure out which content to spend scarce subscription dollars on, and how to guide readers through it.

Digital resources and collections have already out-stripped the physical collections possible in even major research libraries. Digitization efforts promise that more and more of the written record will become readily accessible to more readers.

Accessible in the sense that they can “read” the text, whether it is understood or not, is a different issue.

Without librarians to act as intelligent filters, digital content will be a sea of information that washes over all but the most intrepid scholars.

Increases in digital resources require increases in the number of librarians performing the creative aspects of their professions.

Acting as teachers, guides and fellow travellers in the exploration cultural riches past and present, and preparing for those yet to come.

July 26, 2012

Law Libraries, Government Transparency, and the Internet

Filed under: Government,Law,Library — Patrick Durusau @ 9:35 am

Law Libraries, Government Transparency, and the Internet by Daniel Schuman.

From the post:

This past weekend I was fortunate to attend the American Association of Law Libraries 105th annual conference. On Sunday morning, I gave a presentation to a special interest section entitled “Law Libraries, Government Transparency, and the Internet,” where I discussed the important role that law libraries can play in making the government more open and transparent.

The slides illustrate the range of legal material, which is by definition difficult for the lay reader to access, that is becoming available.

I see an important role for law libraries as curators who create access points for both professional as well as lay researchers.

I first saw this at Legal Informatics.

July 8, 2012

MaRC and SolrMaRC

Filed under: Librarian/Expert Searchers,Library,MARC,SolrMarc — Patrick Durusau @ 2:30 pm

MaRC and SolrMaRC by Owen Stephens.

From the post:

At the recent Mashcat event I volunteered to do a session called ‘making the most of MARC’. What I wanted to do was demonstrate how some of the current ‘resource discovery’ software are based on technology that can really extract value from bibliographic data held in MARC format, and how this creates opportunities for in both creating tools for users, and also library staff.

One of the triggers for the session was seeing, over a period of time, a number of complaints about the limitations of ‘resource discovery’ solutions – I wanted to show that many of the perceived limitations were not about the software, but about the implementation. I also wanted to show that while some technical knowledge is needed, some of these solutions can be run on standard PCs and this puts the tools, and the ability to experiment and play with MARC records, in the grasp of any tech-savvy librarian or user.

Many of the current ‘resource discovery’ solutions available are based on a search technology called Solr – part of a project at the Apache software foundation. Solr provides a powerful set of indexing and search facilities, but what makes it especially interesting for libraries is that there has been some significant work already carried out to use Solr to index MARC data – by the SolrMARC project. SolrMARC delivers a set of pre-configured indexes, and the ability to extract data from MARC records (gracefully handling ‘bad’ MARC data – such as badly encoded characters etc. – as well). While Solr is powerful, it is SolrMARC that makes it easy to implement and exploit in a library context.

SolrMARC is used by two open source resource discovery products – VuFind and Blacklight. Although VuFind and Blacklight have differences, and are written in different languages (VuFind is PHP while Blacklight is Ruby), since they both use Solr and specifically SolrMARC to index MARC records the indexing and search capabilities underneath are essentially the same. What makes the difference between implementations is not the underlying technology but the configuration. The configuration allows you to define what data, from which part of the MARC records, goes into which index in Solr.

Owen explains his excitement over these tools as:

These tools excite me for a couple of reasons:

  1. A shared platform for MARC indexing, with a standard way of programming extensions gives the opportunty to share techniques and scripts across platforms – if I write a clever set of bean shell scripts to calculate page counts from the 300 field (along the lines demonstrated by Tom Meehan in another Mashcat session), you can use the same scripts with no effort in your SolrMARC installation
  2. The ability to run powerful, but easy to configure, search tools on standard computers. I can get Blacklight or VuFind running on a laptop (Windows, Mac or Linux) with very little effort, and I can have a few hundred thousand MARC records indexed using my own custom routines and searchable via an interface I have complete control over

I like the “geek” appeal of #2, but creating value-add interfaces for the casual user is more likely to attract positive PR for a library.

As far as #1, how uniform are the semantics of MARC fields?

I suspect physical data, page count, etc., are fairly stable/common, what about more subjective fields? How would you test that proposition?

April 30, 2012

What is Umlaut anyway?

Filed under: Library,Library software,OpenURL,Umlaut — Patrick Durusau @ 3:18 pm

What is Umlaut anyway?

From the webpage:

Umlaut is software for libraries (you know the kind with books), which deals with advertising services for specific known citations. It runs as Ruby on Rails application via an engine gem.

Umlaut could be called an ‘open source front-end for a link resolver’ — Umlaut accepts requests in OpenURL format, but has no knowledge base of it’s own, it can be used as a front-end for an existing knowledge base. (Currently SFX, but other plugins can be written).

And that describes Umlaut’s historical origin and one of it’s prime use cases. But in using and further developing Umlaut, I’ve come to realize that it has a more general purpose, as a new kind of infrastructural component.

Better, although a bit buzzword laden:

Umlaut is a just-in-time aggregator of “last mile” specific citation services, taking input as OpenURL, and providing an HTML UI as well as an api suite for embedding Umlaut services in other applications.

(In truth, that’s just a generalization of what your OpenURL Link Resolver does now, but considered from a different more flexible vantage).

Reading under Last Mile, Specific Citation I find:

Umlaut is not concerned with the search/discovery part of user research. Umlaut’s role begins when a particular item has been identified, with a citation in machine-accessible form (ie, title, author, journal, page number, etc., all in seperate elements).

Umlaut’s role is to provide the user with services that apply to the item of interest. Services provided by the hosting institution, licensed by the hosting institution, or free services the hosting institution wishes to advertise/recommend to it’s users.

Umlaut strives to supply links that take the user in as few clicks as possible to the service listed, without ever listing ‘blind links’ that you first have to click on to find out whether they are available. Umlaut pre-checks things when neccesary to only list services, with any needed contextual info, such that the user knows what they get when they click on it. Save the time of the user.

Starts with a particular subject (nee item) and maps known services to it.

Although links to subscriber services are unlikely to be interchangeable, links to public domain resources or those with public identifiers would be interchangeable. Potential for a mapping syntax? Or transmission of the “discovery” of such resources?

April 27, 2012

Harvard Library releases big data for its books

Filed under: Books,Library — Patrick Durusau @ 6:11 pm

Harvard Library releases big data for its books

Audrey Watters writes in part:

Harvard University announced this week that it would make more than 12 million catalog records from its 73 libraries publicly available. These records contain bibliographic information about books, manuscripts, maps, videos, and audio recordings. The Harvard Library is making these records available under a Creative Commons 0 license, in accordance with its Open Metadata Policy.

In MARC21 format, these records should lend themselves to a number of interesting uses.

I have always been curious about semantic drift across generations of librarians for subject headings.

Did we as users “learn” the cataloging of particular collections?

How can we recapture that “learning” in a topic map?

April 13, 2012

Seminar: Five Years On

Filed under: Library,Linked Data,Semantic Web — Patrick Durusau @ 4:45 pm

Seminar: Five Years On

British Library
April 26, 2012 – April 27, 2012

From the webpage:

April 2012 marks the fifth anniversary of the Data Model Meeting at the British Library, London attended by participants interested in the fit between RDA: Resource Description and Access and the models used in other metadata communities, especially those working in the Semantic Web environment. This meeting, informally known as the “London Meeting”, has proved to be a critical point in the trajectory of libraries from the traditional data view to linked data and the Semantic Web.

DCMI-UK in cooperation with DCMI International as well as others will co-sponsor a one-day seminar on Friday 27 April 2012 to describe progress since 2007, mark the anniversary, and look to further collaboration in the future.

Speakers will include participants at the 2007 meeting and other significant players in library data and the Semantic Web. Papers from the seminar will be published by DCMI and available freely online.

The London Meeting stimulated significant development of Semantic Web representations of the major international bibliographic metadata models, including IFLA’s Functional Requirements family and the International Standard Bibliographic Description (ISBD), and MARC as well as RDA itself. Attention is now beginning to focus on the management and sustainability of this activity, and the development of high-level semantic and data structures to support library applications.

Would appreciate a note if you are in London for this meeting. Thanks!

March 19, 2012

53 Books APIs: Google Books, Goodreads and SharedBook

Filed under: Books,Library — Patrick Durusau @ 6:54 pm

53 Books APIs: Google Books, Goodreads and SharedBook

Wendell Santos has posted on behalf of ProgrammableWeb a list of fifty-three (53) book APIs!

Fairly good listing but it could be better.

For example, it is missing the Springer API, http://dev.springer.com/, and although they don’t list Elsevier and say http://www.programmableweb.com/api/elsevier-article is historical only, you should be aware that Elsevier does offer an extensive API at: http://www.developers.elsevier.com/cms/index (called SciVerse).

I am sure there are others. Any you would like to mention in particular?

Now that I think about it, guess who doesn’t have a public API?

Would you believe the ACM? Check out the ACM Digital Library and tell me if I am wrong.

Or for that matter, the IEEE. See CS Digital Library.

Maybe they don’t have anyone to build an API for them? Please write the ACM and/or IEEE offering your services at your usual rates.

March 11, 2012

Are You An Invisible Librarian?

Filed under: Librarian/Expert Searchers,Library — Patrick Durusau @ 8:09 pm

Are librarians choosing to disappear from the information & knowledge delivery process? by Carl Grant.

Carl Grant writes:

As librarians, we frequently strive to connect users to information as seamlessly as possible. A group of librarians said to me recently: “As librarian intermediation becomes less visible to our users/members, it seems less likely it is that our work will be recognized. How do we keep from becoming victims of our own success?”

This is certainly not an uncommon question or concern. As our library collections have become virtual and as we increasingly stop housing the collections we offer, there is a tendency to see us as intermediaries serving as little more than pipelines to our members. We have to think about where we’re adding value to that information so that when delivered to the user/member that value is recognized. Then we need to make that value part of our brand. Otherwise, as stated by this concern, librarians become invisible and that seems to be an almost assured way to make sure our funding does the same. As evidenced by this recently updated chart on the Association of Research Libraries website, this seems to be the track we are on currently:

I ask Carl’s question more directly to make it clear that invisibility is a matter of personal choice for librarians.

Vast institutional and profession wide initiatives are needed but those do not relieve librarians of the personal responsibility for demonstrating the value add of library services in their day to day activities.

It is the users of libraries, those whose projects, research and lives are impacted by librarians who can (and will) come to the support of libraries and librarians, but only if asked and only if librarians stand out as the value-adds in libraries.

Without librarians, libraries may as well be random crates of books. (That might be a good demonstration of the value-add of libraries by the way.) All of the organization, retrieval, and other value adds are present due to librarians. Makes that and other value adds visible. Market librarians as value adds at every opportunity.

At the risk of quoting too much, Grant gives a starting list of value adds for librarians:

… Going forward, we should be focusing on more fine-grained service goals and objectives and then selecting technology that supports those goals/objectives.  For instance, in today’s (2012) environment, I think we should be focusing on providing products that support these types of services:

  1. Access to the library collections and services from any device, at anytime from anywhere. (Mobile products)
  2. Massive aggregates of information that have been selected for inclusion because of their quality by either: a) librarians, or b) filtered by communities of users through ranking systems and ultimately reviewed and signed-off by librarians for final inclusion in those aggregates. (Cloud computing products are the foundation technology here)
  3. Discovery workbenches or platforms that allow the library membership to discover existing knowledge and build new knowledge in highly personalized manners. (Discovery products serve as the foundation, but they don’t yet have the necessary extensions)
  4. Easy access and integration of the full range of library services into other products they use frequently, such as course or learning management systems, social networking, discussion forums, etc.  (Products that offer rich API’s, extensive support of Apps and standards to support all types of other extensions)
  5. Contextual support, i.e. the ability for librarianship to help members understand the environment in which a particular piece of work was generated (for instance, Mark Twain’s writings, or scientific research—is this a peer reviewed publication? Who funded it and what are their biases?) is an essential value-add we provide.  Some of this is conveyed by the fact that the item is in collections we provide access to, but other aspects of this will require new products we’ve yet to see.
  6. Unbiased information. I’ve written about this in another post and I strongly believe we aren’t conveying the distinction we offer our members by providing access to knowledge that is not biased by constructs based on data unknown and inaccessible to them.   This is a huge differentiator and we must promote and ensure is understood.  If we do decide to use filtering technologies, and there are strong arguments this is necessary to meet the need of providing “appropriate” knowledge, then we should provide members with the ability to see and/or modify the data driving that filtering.  I’ve yet to see the necessary technology or products that provides good answers here.  
  7. Pro-active services (Analytics).  David Lankes makes the point in many of his presentations (here is one) that library services need to be far more pro-active.  He and I couldn’t agree more.   We need go get out there in front of our member needs.  Someone is up for tenure?  Let’s go to their office.  Find out what they need and get it to them.  (Analytic tools, coupled with massive aggregates of data are going to enable us to do this and a lot more.)

Which of these are you going to reduce down to an actionable item for discussion with your supervisor this week? It is really that simple. The recognition of the value add of librarians is up to you.

March 2, 2012

The Library In Your Pocket

Filed under: Library,Library software — Patrick Durusau @ 8:04 pm

The Library In Your Pocket by Meredith Farkas.

A delightful slidedeck of suggestions for effective delivery of content to mobile devices for libraries.

Since topic maps deliver content as well, I thought at least some of the suggestions would be useful there as well.

The effective design of library websites for mobile access seems particularly appropriate for topic maps.

Do you have a separate interface for mobile access? Care to say a few words about it?

« Newer PostsOlder Posts »

Powered by WordPress