Integrating Linked Data into Discovery by Götz Hatop.
Abstract:
Although the Linked Data paradigm has evolved from a research idea to a practical approach for publishing structured data on the web, the performance gap between currently available RDF data stores and the somewhat older search technologies could not be closed. The combination of Linked Data with a search engine can help to improve ad-hoc retrieval. This article presents and documents the process of building a search index for the Solr search engine from bibliographic records published as linked open data.
Götz makes an interesting contrast between the Semantic Web and Solr:
In terms of the fast evolving technologies in the web age, the Semantic Web can already be called an old stack. For example, RDF was originally recommended by the W3C on the February 22, 1999. Greenberg [8] points out many similarities between libraries and the Semantic Web: Both have been invented as a response to information abundance, their mission is grounded in service and information access, and libraries and the Semantic Web both benefit from national and international standards. Nevertheless, the technologies defined within the Semantic Web stack are not well established in libraries today, and the Semantic Web community is not fully aware of the skills, talent, and knowledge that catalogers have and which may be of help to advance the Semantic Web.
On the other hand, the Apache Solr [9] search system has taken the library world by storm. From Hathi Trust to small collections, Solr has become the search engine of choice for libraries. It is therefore not surprising, that the VuFind discovery system uses Solr for its purpose, and is not built upon a RDF triple store. Fortunately, the software does not make strong assumptions about the underlying index structure and can coexist with non-MARC data as soon as these data are indexed conforming to the scheme provided by VuFind.
The lack of “…strong assumptions about the underlying index structure…” enables users to choose their own indexing strategies.
That is an indexing strategy is not forced on all users.
You could just as easily say that no built-in semantics are forced on users by Solr.
Want Solr success for topic maps?
Free users from built-in semantics. Enable them to use topic maps to map their models, their way.
Or do we fear the semantics of others?
I think the most powerful mashup is at the intersection of applying the wonderful results we get with Solr/Lucene’s statistical based approach with the power of semantic relationships to find connected data.
The hardest part is making that first connection between a result returned by Solr, and then hooking that onto some sort of linked data graph.
We’re doing a project right now where that is the core problem to solve. I’d love to chat more with you about your thoughts.
Comment by epugh — November 11, 2013 @ 8:59 am
@epugh
Yes, the “last semantic mile” problem between human curated data (linked data for example ) and more automated analysis.
Looking foward to chatting with you about this issue!
Comment by Patrick Durusau — November 12, 2013 @ 12:15 pm