Improving the recall of decentralised linked data querying through implicit knowledge by Jürgen Umbrich, Aidan Hogan, Axel and Polleres.
Abstract:
Aside from crawling, indexing, and querying RDF data centrally, Linked Data principles allow for processing SPARQL queries on-the-fly by dereferencing URIs. Proposed link-traversal query approaches for Linked Data have the benefits of up-to-date results and decentralised (i.e., client-side) execution, but operate on incomplete knowledge available in dereferenced documents, thus affecting recall. In this paper, we investigate how implicit knowledge – specifically that found through owl:sameAs and RDFS reasoning – can improve the recall in this setting. We start with an empirical analysis of a large crawl featuring 4 m Linked Data sources and 1.1 g quadruples: we (1) measure expected recall by only considering dereferenceable information, (2) measure the improvement in recall given by considering rdfs:seeAlso links as previous proposals did. We further propose and measure the impact of additionally considering (3) owl:sameAs links, and (4) applying lightweight RDFS reasoning (specifically {\rho}DF) for finding more results, relying on static schema information. We evaluate our methods for live queries over our crawl.
From the document:
owl:sameAs links are used to expand the set of query relevant sources, and owl:sameAs rules are used to materialise implicit knowledge given by the OWL semantics, potentially generating additional answers.
I have always thought that knowing the “why” an owl:sameAs would make it more powerful. But since any basis for subject sameness can be used, that may not be the case.