Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 21, 2010

Complex Merging Conditions In XTM

Filed under: Merging,Subject Identifiers,TMDM,Topic Maps — Patrick Durusau @ 6:09 pm

We need a way to merge topics for reasons that are not specified by the TMDM.

For example, I want merge topics that have equivalent occurrences of type ISBN. Library catalogs in different languages may only share the ISBN of an item as a common characteristic. A topic map generated from each of them could have the ISBN as an occurrence on each topic.

I am assuming each topic map relies upon library identifiers for “standard” merging because that is typically how library systems bind the information for a particular item together.

So, how to make merging occur when there are equivalent occurrences of type ISBN?

Solution: As part of the process of creating the topics, add a subject identifier based on the occurrences of type ISBN that results in equivalent subject identifiers when the ISBN numbers are equivalent. That results in topics that share equivalent occurrences of type ISBN merging.

While the illustration is with one occurrence, there is no limit as to the number of properties of a topic that can be considered in the creation of a subject identifier that will result in merging. Such subject identifiers, when resolved, should document the basis for their assignment to a topic.

BTW, assuming a future TMQL that enables such merging, note this technique will work with XTM 1.0 topic map engines.

Caution: This solution does not work for properties that can be determined only after the topic map has been constructed. Such as participation in particular associations or the playing of particular roles.

PS: There is a modification of this technique to deal with participation in associations or the playing of particular roles. More on that in another post.

4 Comments

  1. This is doable today with tolog, and scales easily to far more complex examples. Your particular example could be done like this:

    merge $T1, $T2 from
    isbn($T1, $ISBN),
    isbn($T2, $ISBN),
    $T1 /= $T2

    It’s also possible to specify in TMCL that isbn occurences must be unique, as follows:

    isbn isa tmcl:occurrence-type;
    has-unique-value().

    Exactly what happens with different topics having the same isbn occurrence value depends on the TMCL engine, but it would be strange if they didn’t offer the option to simply merge offending topics.

    Comment by Lars Marius Garshol — April 22, 2010 @ 2:59 am

  2. True, tis true but as you point out, what happens with a TMCL engine depends upon what developers thought was odd or not. Depending on a common mind set of developers isn’t a a good strategy.

    Appreciate you pointing out the tolog solution since that non-standard solution is widely available.

    I do think it is useful for people to think about solutions that involve the standard mechanisms of subject identity since those will persist across all applications that support the XTM/TMDM.

    Comment by Patrick Durusau — April 22, 2010 @ 5:37 am

  3. I am not sure if i can follow you. I do not think that it makes a difference how you encode the statement in the topic map. In any case one can design a procedure for URI construction and share it. The playing of a particular role is not something that does not exist in the source dataset, is it? It is just not as clearly visible as the ISBN. Maybe i miss something. Can you give an example?

    Comment by Robert Cerny — April 23, 2010 @ 10:21 am

  4. Robert,

    Good point. Even when streaming data, there will be a point when it is “known” that a particular subject is playing a role and a URI construction rule can be triggered. Which would then trigger additional merging.

    I think there may be exceptions to that statement but right now I can’t formulate an example.

    Comment by Patrick Durusau — April 24, 2010 @ 11:52 am

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress