Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 16, 2010

Reducing Ambiguity, LOD, Ookaboo, TMRM

Filed under: Ambiguity,Subject Identity,TMRM,Topic Maps — Patrick Durusau @ 9:25 pm

While reading Resource Identity and Semantic Extensions: Making Sense of Ambiguity and In Defense of Ambiguity it occurred to me that reducing ambiguity has a hidden assumption.

That hidden assumption is the intended audience for who I wish to reduce ambiguity.

For example, Ookaboo does #it solves the problem of multiple vocabularies for its intended audience thusly:

Our strategy for dealing with multiple subject terminologies is to what we call a reference set, which in this case is

http://ookaboo.com/o/pictures/topic/2021903/Central_Air_Force_Museum#it
http://dbpedia.org/resource/Central_Air_Force_Museum
http://rdf.freebase.com/ns/m.0g_2bv

If we want to assert foaf:depicts we assert foaf:depicts against all of these. The idea is that not all clients are going to have the inferencing capabilities that I wish they’d have, so I’m trying to assert terms in the most “core” databases of the LOD cloud.

In a case like this we may have YAGO, OpenCyc, UMBEL and other terms available. Relationships like this are expressed as

<:Whatever> <ontology2:ak>
<http://mpii.de/yago/resource/Central_Air_Force_Museum> .

<ontology2:aka>, not dereferencable yet, means (roughly) that “some people use term X to refer to substantially the same thing as term Y.” It’s my own answer to the <owl:sameAs> problem and deliberately leaves the exact semantics to the reader. (It’s a lossy expression of the data structures that I use for entity management)

This is very like a TMRM solution since it gathers different identifications together, in hopes that at least one will be understood by a reader.

This is very unlike a TMRM solution because it has no legend to say how to compare these “values,” must less their “key.”

The lack of a legend makes integration in legal, technical, medical or intelligence applications, ah, difficult.

Still, it is encouraging to see the better Linked Data applications moving in the direction of the TMRM.

13 Comments

  1. Yeah, one day when the SemWeb world finally comes together and makes some sense we can look back at how we in the TM community has wasted 20 years of our lives in the wrong technology. *sigh*

    We’re bloody experts in useful but ignored technology, I’m afraid.

    Comment by Alexander Johannesen — November 16, 2010 @ 10:22 pm

  2. […] This post was mentioned on Twitter by Johannes Schmidt, Patrick Durusau. Patrick Durusau said: LOD gets closer to TMRM, http://tm.durusau.net/?p=4159 […]

    Pingback by Tweets that mention Reducing Ambiguity, LOD, Ookaboo, TMRM « Another Word For It -- Topsy.com — November 17, 2010 @ 1:38 am

  3. Alexander,

    Hardy the case of wasting time on the wrong technology!

    The Ookaboo solution only works for web identifiers, which leaving LOD fantasy literature to one side, are a distinct minority of identifiers. To say nothing of potential subject identifiers.

    Do you really want to give up 150 years of Mozart identifiers from the Chronologisch-thematisches Verzeichnis sämmtlicher Tonwerke W. A. Mozart’s (Chronological-thematic Catalogue of the Complete Musical Works of W. A. Mozart)? Or mappings between the various editions and the forthcoming revision (which returns to the earlier system)?

    The SemWeb world as originally envisioned, is another “perfect language” project and would never make sense. The revision towards something a bit more sensible is understandable.

    I take it as encouraging that it is moving towards a topic maps view of subject identity.

    If you have something in particular that you think the topic maps community should be doing, please post it to one of the lists. I don’t think anyone can answer a “wrong technology” sort of complaint. It’s too vague.

    Comment by Patrick Durusau — November 17, 2010 @ 5:49 am

  4. One of the ideas behind LOD is to be able to integrate and link data that is maintained at different places to enrich your own data. For this to work you need to know (read: resolve the address of) the entity that knows more about the topic at hand. So you can go there and fetch the data that you want to integrate.

    In your example of Mozart, using as an identifier the Koechelverzeichnis Number alone would not help much. To understand this identifier you would need implicit knowledge about the structure and nature of every possible identifier system in existence, and then you still do not know who has more information about it.

    But nobody prevents you to create URI identifiers that are composed of already used and accepted identifiers in the real-world. In fact you are even encouraged to do that (see f.e. how amazon uses the ISBN number as part of the URL for a book).

    So what’s wrong with an identifier like: http://world-of-music.org/mozart/kv16?

    Comment by Thomas Neidhart — November 17, 2010 @ 10:23 am

  5. Why do I need to understand “structure and nature of every possible identifier system in existence…?”

    Granting there are a number of subjects that go into understanding a Koechelverzeichnis Number, which would be part of representing that numbering system as a subject in a topic map. I wasn’t trying to do that in my example.

    BTW, on one hand you say use of the “Koechelverzeichnis Number alone would not help much.”

    So how does converting that number (your example) into a URI make it more helpful?

    If your response is there could be more data there, true, but none of it forms part of the identification. At least not in any standard way.

    If what the URI means is look around and find stuff you might want to use, that doesn’t sound like a recipe for reliable integration of information.

    Comment by Patrick Durusau — November 17, 2010 @ 11:04 am

  6. @Neidhart

    OK, I thought about it and now realize what you meant by “structure and nature of every possible identifier system in existence…?

    Sure, URIs are a single syntax but the discussion of URIs has shown them to be as ambiguous as any other identifiers.

    Not to mention that identification is rarely a matter of a single string.

    Nor to mention that URI identification is a “perfect language” solution. That has been tried, literally hundreds of times, has never worked.

    Comment by Patrick Durusau — November 17, 2010 @ 3:19 pm

  7. Hi Patrick,

    Why is that solution closer to TMRM? onto:aka is just a predicate (or an association in TMDM land). X onto:aka Y means “X could-also-mean Y” and this is nothing which is uncommon in RDF/TMDM.

    Best regards,
    Lars

    Comment by Lars Heuer — November 17, 2010 @ 3:34 pm

  8. Unqualified comment: /me still laughs about the #it solution 😉

    Comment by Lars Heuer — November 17, 2010 @ 3:37 pm

  9. Regarding my “not uncommon” comment: Maybe it’s more common in RDF since TMDM users could tend to merge X with Y since they think they represent the same subject. Keeping this as an association would make more sense.

    Comment by Lars Heuer — November 17, 2010 @ 3:42 pm

  10. @Lars

    I wasn’t keying on the syntax but opening paragraph where it was said they were dealing with a reference set.

    True, the TMDM can have a set of URI identifiers but I was (unconsciously?) extending that to include other identifiers in the set.

    You are right, so long as we mean URI identifiers, the same result obtains for the TMDM

    I do think identity can be more complex than a single identifier.

    Comment by Patrick Durusau — November 17, 2010 @ 3:46 pm

  11. @Lars – Association

    Interesting thought. Identifications of the same subject could certainly exist in a series of associations to other identifications of the same subject.

    Whether you wanted to represent the structure of all of those explicitly would be another question.

    Comment by Patrick Durusau — November 17, 2010 @ 3:50 pm

  12. @patrick Well, in my opinion, Topic Maps is the wrong technology because of its lack of popularity. I’m just lamenting the fact that the RDF world is slowly, little by little, catching up to where TM was 10 years ago, and it sucks to witness this. I’ve got similar feelings about TM in libraries, for example, where I’ve been talking about the trifecta of topic / subject identifiers, locators and indicators for ages, and only now recently have some persistent identifier systems begun to creep in because they are needed in the RDF world (with friggin’ 303 redirection and resolvement to work), and not because they were the right idea all along.

    Perhaps I’m being stupid. Perhaps this is all good, but I have a sneaky feeling that the state of identifiers today is where we should have been 10 years ago. I just can’t believe we’re not further. TMRM was released, what, almost 10 years ago now? (2002 or so) Thinking of subject identification through labels and proxies should have helped us get a lot further ahead than the mess we’re in right now.

    I hate saying “I told you so”, but seriously, Topic Mappers have being saying “told you so” for ages, and it’s just tiring and annoying, not for the sake of where we are, but where we could have been.

    And, oh, triplets. What evil they are, and the pain and suffering they cause. *sigh* If things only had gone differently, this evil would be one less problem that we had to deal with. But alas, here we are, steeped in a simplistic model that explodes on you as soon as you try to do anything remotely useful and / or complex.

    Anyway, sorry for whining. Topic Maps is definitely not *a* wrong technology (in fact, I think it’s the right one), it’s just *the* wrong technology to get any traction for. In a commercial market you choose the tool that’s right for the job, however a big constraint these days seems to be that semantic data === W3C hopscotch.

    Sorry, sorry. I’m tired. Someone take me away from this keyboard, and shove a coffee down my throat! Aaargh!

    Comment by Alexander Johannesen — November 17, 2010 @ 8:09 pm

  13. @Alexander

    Sorry for the delayed response!

    But there isn’t anything inconsistent with the W3C hopscotch as you call it and topic maps. As you can see from another thread on this blog, the LOD community is just throwing data up onto the web and it is up to users to sort out what subject is meant.

    I suppose that is one strategy and it does get the data up onto the web.

    In terms of marketing, I would just tell a client that yes, yes we are using LOD and we identify the subjects to they can reliably integrate it with other data in their organization or with future data.

    Note that I did not use the term “topic map” anywhere in that pitch.

    True, the Semantic Web has gone from autonomous agents roaming the web and reasoning about data to “pig in a poke” subjects, but that’s a religious issue.

    What I think will “sell” topic maps, assuming you ever have to use the term, will be re-use and re-integration with other data, that is to say, practical results.

    I don’t think there is any future in “I told you so” scenarios, although I must admit that pointing out shortfalls in current approaches sounds that way.

    I suppose another way to put it is: “When the client says: “Semantic Web,” you say: “Semantic Web that works,” and proceed to produce an application with robust subject identity and a legend.

    Client is happy, you’re happy, and when client tries lesser SW apps, they are disappointed. Life is good.

    Comment by Patrick Durusau — November 19, 2010 @ 8:13 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress