Archive for the ‘Virus’ Category

Don’t feed the semantic black holes [Dangers of Semantic Promiscuity]

Wednesday, December 5th, 2012

Don’t feed the semantic black holes by Bernard Vatant.

From the post:

If I remember correctly it was at Knowledge Technologies 2001Ann Wrightson explained us, during the informal RDF-Topic Maps session, how to build a semantic virus for Topic Maps, through abuse of subject indicator. At the time OWL and its now infamous owl:sameAs were not yet around, but the idea was identical : if several “topics” A, B, C, … indicate the same “subject” X, then they should be merged into a single topic. In linked data land ten years after it’s the same story : if RDF descriptions A, B, C … declare a owl:sameAs link to X, then A and B are merged together with the current description of X.

Hence the very simple semantic virus concept :

1. Harvest all the topic identifiers you can grab from distributed topic maps (read today : URIs from distributed linked data).

2. Publish a new topic map adding a common subject indicator to every topic description you have harvested (read today : add owl:sameAs X to all resource descriptions)

Now if you query the resulting data base for the description of any topic (resource) in it you get just all elements of description of everything on anything. All the map is collapsed on a single heavy and meaningless node. An irreversible semantic collapse.

True but that’s like having unprotected sex with a hooker in the bushes near a truck stop in India.

Reliance on non-verified sources of data is like unprotected sex, except for the lack of enjoyable parts.

As Bernard points out, this can lead to very bad consequences.

I would not wait for Bernard’s provenance indication using named graphs. Do you think people who would create malicious owl:sameAs statements would also create false statements about their graphs? Gasp! 😉

Trusting evil-doers to respect provenance conventions meant to exclude their content is a low percentage bet.

One solution, possibly a commercially viable one, would be to harvest and test linked data, being a canonical and trusted source for that data. Any semantic black holes being detected and blocked from reaching you.

A prophylactic service as it were.