RDF 1.1: On Semantics of RDF Datasets
Abstract:
RDF defines the concept of RDF datasets, a structure composed of a distinguished RDF graph and zero or more named graphs, being pairs comprising an IRI or blank node and an RDF graph. While RDF graphs have a formal model-theoretic semantics that determines what arrangements of the world make an RDF graph true, no agreed formal semantics exists for RDF datasets. This document presents some issues to be addressed when defining a formal semantics for datasets, as they have been discussed in the RDF 1.1 Working Group, and specify several semantics in terms of model theory, each corresponding to a certain design choice for RDF datasets.
I can see how not knowing the semantics of a dataset could be problematic.
What puzzles me about this particular effort is that it appears to be an attempt to define the semantics of RDF datasets for others. Yes?
That activity depends upon semantics being inherent in an RDF dataset so that everyone can “discover” the same semantics or that such uniform semantics can be conferred upon an RDF dataset by decree.
The first possibility, that RDF datasets have an inherent semantic need not delay us as this activity started because different people saw different semantics in RDF datasets. That alone is sufficient to defeat any proposal based on “inherent” semantics.
The second possibility, that of defining and conferring semantics, seems equally problematic to me.
In part because there no enforcement mechanism that can prevent users of RDF datasets from assigning any semantic they like to a dataset.
But this remains important work but I would change the emphasis to defining what this group considers to be the semantics of RDF datasets and a mechanism to allow others to signal their agreement with it for a particular dataset.
That has the advantage of other users being able to adopt wholesale an entire set of semantics for an RDF dataset. Which hopefully reflects the semantics with which it should be processed.
Declaring semantics may help avoid users silently using inconsistent semantics for the same datasets.