The DataLift project will no doubt produce some useful tools and output but reading its self-description:
The project will provide tools allowing to facilitate each step of the publication process:
- selecting ontologies for publishing data
- converting data to the appropriate format (RDF using the selected ontology)
- publishing the linked data
- interlinking data with other data sources
I am struck by how futile the effort sounds in the face of petabytes of data flow, changing semantics of that data and changing semantics of other data, with which it might be interlinked.
The nearest imagery I can come up with is trying to direct the flow of a tsunami with a roll of paper towels.
It is certainly brave (I forgo usage of the other term) to try but ultimately isn’t very productive.
First, any scheme that start with conversion to a particular format is an automatic loser.
The source format is itself composed of subjects that are discarded by the conversion process.
Moreover, what if we disagree about the conversion?
Remember all the semantic diversity that gave rise to this problem? Where did it get off to?
Second, the interlinking step introduces brittleness into the process.
Both in terms of the ontology that any particular data must follow but also in terms of resolution of any linkage.
Other data sources can only be linked in if they use the correct ontology and format. And that assumes they are reachable.
I hope the project does well, but at best it will result in another semantic flavor to be integrated using topic maps.
*****
PS: The use of data heaven betrays the religious nature of the Linked Data movement. I don’t object to Linked Data. What I object to is the missionary conversion aspects of Linked Data.