Towards Web-scale Web querying: The quest for intelligent clients starts with simple servers that scale. by Ruben Verborgh.
From the post:
Most public SPARQL endpoints are down for more than a day per month. This makes it impossible to query public datasets reliably, let alone build applications on top of them. It’s not a performance issue, but an inherent architectural problem: any server offering resources with an unbounded computation time poses a severe scalability threat. The current Semantic Web solution to querying simply doesn’t scale. The past few months, we’ve been working on a different model of query solving on the Web. Instead of trying to solve everything at the server side—which we can never do reliably—we should build our servers in such a way that enables clients to solve queries efficiently.
The Web of Data is filled with an immense amount of information, but what good is that if we cannot efficiently access those bits of information we need?
SPARQL endpoints aim to fulfill the promise of querying on the Web, but their notoriously low availability rates make that impossible. In particular, if you want high availability for your SPARQL endpoint, you have to compromise one of these:
- offering public access,
- allowing unrestricted queries,
- serving many users.
Any SPARQL endpoint that tries to fulfill all of those inevitably has low availability. Low availability means unreliable query access to datasets. Unreliable access means we cannot build applications on top of public datasets.
Sure, you could just download a data dump and have your own endpoint, but then you move from Web querying to local querying, and that problem has been solved ages ago. Besides, it doesn’t give you access to up to date information, and who has enough storage to download a dump of the entire Web?
The whole “endpoint” concept will never work on a Web scale, because servers are subject to arbitrarily complex requests by arbitrarily many clients. (emphasis in original)
The prelude to an interesting proposal on Linked Data Fragments.
See the Linked Data Fragments website or Web-Scale Querying through Linked Data Fragments by Ruben Verborgh, et. al. (LDOW2014 workshop).
The paper gives a primary motivation as:
There is one issue: it appears to be very hard to make a sparql endpoint available reliably. A recent survey examining 427 public endpoints concluded that only one third of them have an availability rate above 99%; not even half of all endpoints reach 95% [6]. To put this into perspective: 95% availability means the server is unavailable for one and a half days every month. These figures are quite disturbing given the fact that availability is usually measured in “number of nines” [5, 25], counting the number of leading nines in the availability percentage. In comparison, the fairly common three nines (99.9%) amounts to 8.8 hours of downtime per year. The disappointingly low availability of public sparql endpoints is the Semantic Web community’s very own “Inconvenient Truth”.
Curious that on the twenty-fifth anniversary of the WWW that I would realize the WWW re-created a networking problem solved by the Internet.
Unlike the WWW, to say nothing of Linked Data and its cousins in the SW activity, the Internet doesn’t have a single point of failure.
Or put more positively, the Internet is fault-tolerant by design. In contrast, the SW is fragile, by design.
While I applaud the Linked Data fragment exploration of the solution space, focusing on the design flaw of a single point of failure might be more profitable.
I first saw this in a tweet by Thomas Steiner.