The set-similarity offers a useful way to think about merging in a topic maps context. The measure of self-similarity that we want for merging in topic maps is the same subject.
Self-similarity, in the TMDM, for topics is:
- at least one equal string in their [subject identifiers] properties,
- at least one equal string in their [item identifiers] properties,
- at least one equal string in their [subject locators] properties,
- an equal string in the [subject identifiers] property of the one topic item and the [item identifiers] property of the other, or
- the same information item in their [reified] properties.
The research literature makes it clear that judging self-similarity isn’t subject to one test or even a handful of them for all purposes. Not to mention that more often than not, self-similarity is being judged on high dimensional data.
Despite clever approaches and quite frankly amazing results, I have yet to run across sustained discussion of how to interchange self-similarity tests. Perhaps it is my markup background but that seems like the sort of capability that would be widely desired.
The issue of interchangeable self-similarity tests looks like an area where JTC 1/SC 34/WG 3 could make a real contribution.