The Semantic Web Challenge 2010 details landed in my inbox this morning. My first reaction was to refine my spam filter. 😉 Just teasing. My second and more considered reaction was to think about the “challenge” in terms of topic maps.
Particularly because a posting from the Ontology Alignment Evaluation Initiative arrived the same day, in response to a posting from sameas.org.
I freely grant that URIs that cannot distinguish between identifiers and resources without 303 overhead are poor design. But the fact remains that there are many data sets, representing large numbers of subjects that have even poorer subject identification practices. And there are no known approaches that are going to result in the conversion of those data sets.
Personally I am unwilling to wait until some new “perfect” language for data sweeps the planet and results in all data being converted into the “perfect” format. Anyone who thinks that is going to happen needs to stand with the end-of-the-world-in-2012 crowd. They have a lot in common. Magical thinking being one common trait.
The question for topic mappers to answer is how do we attribute to whatever data language we are confronting, characteristics that will enable us to reliably merge information about subjects in that format either with other information in the same or another data language? Understanding that the necessary characteristics may vary from data language to data language.
Take the lack of a distinction between identifier and resource in the Semantic Web for instance. One easy step towards making use of such data would be to attribute to each URI the status of either being an identifier or a resource. I suspect, but cannot say, that the authors/users of those URIs know the answer to that question. It seems even possible that some sets of such URIs are all identifiers and if so marked/indicated in some fashion, they automatically become useful as just that, identifiers (without 303 overhead).
As identifiers they may lack the resolution that topic maps provide to the human user, which enables them to better understand what subject is being identified. But, since topic maps can map additional identifiers together, when you encounter a deficient identifier, simply create another one for the same subject and map them together.
I think we need to view the Semantic Web data sets as opportunities to demonstrate how understanding subject identity, however that is indicated, is the linchpin to meaningful integration of data about subjects.
Bearing in mind that all our identifications, Semantic Web, topic map or otherwise, are always local, provisional and subject to improvement, in the eye of another.