The DataLift project will no doubt produce some useful tools and output but reading its self-description:
The project will provide tools allowing to facilitate each step of the publication process:
- selecting ontologies for publishing data
- converting data to the appropriate format (RDF using the selected ontology)
- publishing the linked data
- interlinking data with other data sources
I am struck by how futile the effort sounds in the face of petabytes of data flow, changing semantics of that data and changing semantics of other data, with which it might be interlinked.
The nearest imagery I can come up with is trying to direct the flow of a tsunami with a roll of paper towels.
It is certainly brave (I forgo usage of the other term) to try but ultimately isn’t very productive.
First, any scheme that start with conversion to a particular format is an automatic loser.
The source format is itself composed of subjects that are discarded by the conversion process.
Moreover, what if we disagree about the conversion?
Remember all the semantic diversity that gave rise to this problem? Where did it get off to?
Second, the interlinking step introduces brittleness into the process.
Both in terms of the ontology that any particular data must follow but also in terms of resolution of any linkage.
Other data sources can only be linked in if they use the correct ontology and format. And that assumes they are reachable.
I hope the project does well, but at best it will result in another semantic flavor to be integrated using topic maps.
*****
PS: The use of data heaven betrays the religious nature of the Linked Data movement. I don’t object to Linked Data. What I object to is the missionary conversion aspects of Linked Data.
Interesting comments. Datalift indeed tackles a hard problem and the points you mention are relevant: selecting a proper vocabulary to represent the data is hard (giving such a vocabulary exists). Converting data from its source format into RDF is difficult if we want to keep its intended meaning. Data interlinking is either tedious or imprecise… Well we’re doing our best in Datalift to cope with these issues.
Comment by François Scharffe — February 18, 2011 @ 12:19 pm
@François – I suppose part of my concern is triggered by projects that have attempted to regulate language development/semantics.
None of them have succeeded to date nor is it likely (in my view) that any such project will ever succeed.
So, how does this project differ? Isn’t any static format/semantic a mistake?
Thinking that dynamic mapping (not saying I know how to do that) is more in keeping with the ever changing semantics that surround us.
Hope you are having a great weekend!
Patrick
Comment by Patrick Durusau — February 19, 2011 @ 6:00 am