SPARQLES: Monitoring Public SPARQL Endpoints

SPARQLES: Monitoring Public SPARQL Endpoints by Pierre-Yves Vandenbussche, Jürgen Umbrich, Aidan Hogan, and Carlos Buil-Aranda.

Abstract:

We describe SPARQLES: an online system that monitors the health of public SPARQL endpoints on the Web by probing them with custom-designed queries at regular intervals. We present the architecture of SPARQLES and the variety of analytics that it runs over public SPARQL endpoints, categorised by availability, discoverability, performance and interoperability. To motivate the system, we gives examples of some key questions about the health and maturation of public SPARQL endpoints that can be answered by the data it has collected in the past year(s). We also detail the interfaces that the system provides for human and software agents to learn more about the recent history and current state of an individual SPARQL endpoint or about overall trends concerning the maturity of all endpoints monitored by the system.

I started to pass on this article since it does date from 2009 but am now glad that I didn’t. The service is still active and can be found at: http://sparqles.okfn.org/.

The discoverability of SPARQL endpoints is reported to be:

sparql-discovery

From the article:

[VoID Description:] The Vocabulary of Interlinked Data-sets (VoID) [2] has become the de facto standard for describing RDF datasets (in RDF). The vocabulary allows for specifying, e.g., an OpenSearch description, the number of triples a dataset contains, the number of unique subjects, a list of properties and classes used, number of triples associated with each property (used as predicate), number of instances of a given class, number of triples used to describe all instances of a given class, predicates used to describe class instances, and so forth. Likewise, the description of the dataset is often enriched using external vocabulary, such as for licensing information.

[SD Description:] Endpoint capabilities – such as supported SPARQL version, query and update features, I/O formats, custom functions, and/or entailment regimes – can be described in RDF using the SPARQL 1.1 Service Description (SD) vocabulary, which became a W3C Recommendation in March 2013 [21]. Such descriptions, if made widely available, could help a client find public endpoints that support the features it needs (e.g., find SPARQL 1.1 endpoints)

No, I’m not calling your attention to this to pick on SPARQL, especially, but the lack of discoverability raises a serious issue for any information retrieval system that hopes to better the dumb luck searching.

Clearly SPARQL has the capability to increase discoverability, whether those mechanisms would be effective or not cannot be answered due to lack of use. So my first question is: Why aren’t the mechanisms of SPARQL being used to increase discoverability?

Or perhaps better, having gone to the trouble to construct a SPARQL endpoint, why aren’t people taking the next step to make them more discoverable?

Is it because discoverability benefits some remote and faceless user instead of those being called upon to make the endpoint more discoverable? In that sense, it is a lack of positive feedback for the person tasked with increasing discoverability?

I ask because if we can’t find the key to motivating people to increase the discoverability of information (SPARQL or no) then we are in serious trouble as the rate of big data continues to increase. The amount of data will continue to grow and discoverability continues to go down. That can’t be a happy circumstance for anyone interested in discovering information.

Suggestions?

I first saw this in a tweet by Ruben Verborgh.

One Response to “SPARQLES: Monitoring Public SPARQL Endpoints”

  1. […] "I started to pass on this article since it does date from 2009 but am now glad that I didn’t. The service is still active and can be found at:http://sparqles.okfn.org/."  […]