Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 8, 2012

Federated SPARQL Queries [Take “Hit” From Multiple/Distributed Data Sets]

Filed under: BigData,RDF,SPARQL — Patrick Durusau @ 5:30 pm

On the Impact of Data Distribution in Federated SPARQL Queries by Nur Aini Rakhmawati and Michael Hausenblas.

Abstract:

With the growing number of publicly available SPARQL endpoints, federated queries become more and more attractive and feasible. Compared to queries against a single endpoint, queries that range over a number of endpoints pose new challenges, ranging from the type and number of datasets involved to the data distribution across the datasets. Existingre search focuses on the data distribution in a central store and is mainly concerned with adopting well-known, traditional database techniques. In this work we investigate the impact of the data distribution in the context of federated SPARQL queries.We perform a number of experiments with four federation frameworks (Sesame Alibaba, Splendid, FedX, and Darq) against an RDF dataset, Dailymed, that we partition by graph and class.Our preliminary results confirm the intuition that the more datasets involved in query processing, the worse performance of federation query is and that the data distribution significantly influences the performance.

It isn’t often I read in the same paragraph:

With the growing number of publicly available SPARQL endpoints, federated queries become more and more attractive and feasible.

and

Our preliminary results confirm the intuition that the more datasets involved in query processing, the worse performance of federation query is and that the data distribution significantly influences the performance.

I have trouble reconciling “…more and more attractive and feasible” with “…the more datasets…the worse performance of federation query is….”

Particularly in the age of “big data” where an increasing number of datasets and data distribution are the norms, not exceptions.

I commend the authors for creating data points to confirm “intuitions” about SPARQL performance.

At the same time, their results raise serious questions about SPARQL in big data environments.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress