Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 2, 2010

Skillful Semantic Users?

Filed under: Usability — Tags: , , , , — Patrick Durusau @ 9:05 am

I recently discovered one reason for my unease with semantic this and that technologies, including topic map interfaces. A friend mentioned to me that he wanted users to do more than enter subject names in their topic map interface. “Users need to also enter….”

The idea of users busily populating a semantic space is an attractive one, but it hasn’t been borne out in practice. So I don’t think my friend’s interface is going to prove to be useful, but why?

Then I got to thinking, how many indexers or librarians do I know? The sort of people whose talents combined together to bring us the Reader’s Guide to Periodic Literature and useful back of the book indexes. Due to my work in computer standards I know a lot of smart people but very few of them strike me as also being good at indexing or cataloging type skills.

Any semantic solution, RDFa, RDF/OWL, SUMO, Topic Maps, etc., will fail from an authoring standpoint due to a lack of skill. No technology can magically make users competent at the indexing or cataloging skills required to enable access by others.

Semantic interface writers need to recognize most users are simply consumers of information created by others. I would not be surprised if the ratio of producers to consumers is close to the ratio in open source projects between contributors and the consumers in those projects.

9 Comments

  1. I share a great passion for this subject. For various reasons. One of them being my vision of creating a “semantic” portal at http://www.topincs.com where users can create and exchange information without preshared knowledge across language and domain boundaries. This does not work. The best explanation for why it is impossible i found in the the report “Communication measures to bridge 10 Millennia” by Thomas Sebeok. The plot being that no bullet-proof method of communication exists so that a simple message “Nuclear waste is buried here. Don’t dig here, because it is dangerous!” could still be interpreted in a far distant future. My learning from this: the interpretation process – as any human behaviour – is a function of the environment and the person. As long as a strong shared context exists, communication is simple. So what you have to ensure is not the message, but the context. In Seboek’s case: an atomic priesthood.

    The other reason for my passion is that i have just finished creating the third user interface for Topincs which is as simple as it gets: Forms. No Topic Maps lingo in sight. The biggest challenge being that a person is able to recognize the proxy for the subject it wants to refer to. This task gets more difficult the more diverse the range of a player is. Now coming to the indexing association: it does not get more diverse than that. Anything can play the role “subject”. But on the other hand, for most roles it is easy. It is dependent on the domain and the group using the tool. Understanding will always require effort and will never come out of the blue. So i disagree that it is impossible to achieve the task. I have to repeat myself: using a software product is as any human behavior a function of the environment and the person. While i agree that it is good to make software easy to use, one should never forget that there is always the option of making the user more skilled by training him.

    Comment by Robert Cerny — March 6, 2010 @ 12:41 pm

  2. Thanks for mentioning Sebeok’s report. I don’t think it is meaningful to talk about “messages” in the absence of context. For discussion purposes it may be convenient to act as though we can talk about one without the other but that is just a convention. Much in the same way there are subjects that we assume (that is don’t explicitly identify) in a relational database, while explicitly identifying other subjects. The subjects that compose column headers haven’t disappeared, we have simply chosen to not talk about (identify) them.

    If we want to identify the subjects that compose column headers in two different databases, say familyName and lastName, without discarding the different identifications of the same subject or converting into a common schema, all we need do is decide to identify those subjects. Well, and supply the appropriate rule for treating them as identifying the same subject. But we incur the overhead of identifying those subjects only when we choose to do so. Within each database, respectively, live goes on as before.

    Users can be trained but experience has shown that training them to use semantic vocabularies that are not their own is of doubtful utility. Search for Drabenstott on this blog to find a posting on that subject.

    Comment by Patrick Durusau — March 6, 2010 @ 5:08 pm

  3. Thanks for the reference to Drabenstott, i will read into it. The challenge that library science faces is that the domain has no boundaries. I have seen indexing of documents work very well in a large corporation because the business itself was schematic (financial industry) and so were the documents. Between those two extremes there is still a lot of ground to be covered and this is in my opinion a good hunting ground for semantic technologies.

    Regarding column headers: i recently found an interesting article “The Many Forms of a Single Fact” by William Kent (1988), where he discusses all the variations of encoding one fact (“Salesman S covers territory T”) in a CSV file. He comes up with around 20 to 30. This can transferred to relational databases. What i like about Topic Maps is that it is less tempting to abuse a statement type because the cost of introducing a new statement type is smaller . Still i am considering to write a paper for TMRA 2010 called “The Many Forms of a Single Statement – 20 years later”. I do not think that i will come up with 20 but i think i can make it to 10.

    Comment by Robert Cerny — March 7, 2010 @ 2:04 pm

  4. Hi all,

    I wonder how the Topic Maps Reference Model (TMRM) fits into the problem of “treating x column headers as a subject”?

    If you read the TMRM, you think “Great! Everything is possible!”, but do the Topic Maps community came up with usable implementations? I don’t think so. And is the lack of implementations an indicator that TMRM does not work?

    Personally, I wish that TMRM works and that we’d see some implementations. TMDM is nice, but TMRM would be very cool.

    Best regards,
    Lars

    Comment by Lars Heuer — March 7, 2010 @ 4:56 pm

  5. William Kent is an excellent author. I found a copy of The Many Forms of a Single Fact online. His Data and Reality: Basic Assumptions in Data Processing Reconsidered is an excellent read as well.

    Go for the paper! I suspect that the number of current forms remains at 20 or 30, if not higher, if you include older material.

    Newcomb and I found articles on “record linkage,” under 20+ ways to talk about the same technique, including two independently developed mathematical models. The more recent one apparently developed without reference to the older one. A sign that semantic integration efforts could benefit from semantic integration!

    Comment by Patrick Durusau — March 7, 2010 @ 7:58 pm

  6. Funny remark about semantic integration. But let’s consider a world where understanding would require no effort and would be instantaneous. Not much coming to my mind in doing so. It is hard to grasp. But i can say, that in such world, i would miss out on the following experience: it is a true miracle to me (that happens a million times each day), that one person says something and another one can say “Yes, I understand” or “Yes, but …”. Thinking about it makes me usually end up speechless in awe. We have the power to shape another person’s mind. And even if it is just for a second. And even if we confuse them.

    We all have our isolated proxy spaces (minds) which are developed under similar circumstances and are based on reality, but even more so on what other people say. We use a shared proxy system (natural language) in order to send messages from one space to another and we end up doing surgery in the brain, flying to the moon and deciphering the genetic code. And then again, take any of the great minds of our time, take Einstein for his distinctive appearance, and put him back into prehistoric times in the middle of a camp fire amongst tribal people. What would he have achieved?

    Comment by Robert Cerny — March 8, 2010 @ 8:46 am

  7. I like your analogy of language to a shared proxy system (even though it is imperfectly shared, hence misunderstandings, etc.) because it can illustrate something important about your Einstein question.

    I am sure there were/are great minds around camp fires amongst tribal people. But, since we don’t share their “proxy system” (which includes more than language, their entire context) we can only marginally understand anything we might learn about them. And any judgment that we form, would be in terms of our shared proxy system, not theirs.

    Comment by Patrick Durusau — March 8, 2010 @ 11:01 am

  8. Lars,

    Not trying to dodge your question about “implementing” the TMRM!

    Yes, implementing the TMRM in its full generality would be “cool.” I will start a separate post on implementing the TMRM but the short answer is that only legends can be implemented, not the TMRM.

    The TMRM is a rhetoric, a method of analysis, a heuristic if you will for how to recognize what subjects an information system allow you to talk about and which ones you can’t.

    Take the column header situation. A relational database may have a data dictionary but it doesn’t define rules by which the subject represented by that column header can be identified (other than a string match against the column header, which may or may not be correct). So if one company acquires another, merging databases cannot occur until those subjects identified. Problem is that in most cases the subjects are never explicitly identified or rules defined for merging so, next merger, we have to start all over again. Not a very efficient way to operate a business.

    A legend for a database would explicitly define the subjects represented by the column headers and rules for “merging” them with other representatives of the same subjects. That is assuming you have an implementation that can perform the equivalence tests defined by the legend.

    I put “merging” in quotes because it doesn’t necessarily mean representatives of the same subject are consolidated into one representative. It means the user is presented with one representative for a subject, nothing more. (That is also a complicated issue on which I will have a separate post.)

    Comment by Patrick Durusau — March 8, 2010 @ 11:20 am

  9. I agree with you that our judgment is always based on our shared proxy system. And it very much determines of what we can learn in a lifetime. It unifies and at the same time it separates. And thus it makes the achievements of people who were able to overcome the border between the proxy systems even bigger, e.g. Jean-François Champollion.

    Comment by Robert Cerny — March 8, 2010 @ 12:20 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress