Archive for the ‘TMDM’ Category

Understanding Clojure’s Persistent Vectors, pt. 1

Monday, March 24th, 2014

Understanding Clojure’s Persistent Vectors, pt. 1 by Jean Niklas L’orange.

From the post:

You may or may not heard about Clojure’s persistent vectors. It is a data structure invented by Rich Hickey (influenced by Phil Bagwell’s paper on Ideal Hash Trees) for Clojure, which gives practically O(1) runtime for insert, update, lookups and subvec. As they are persistent, every modification creates a new vector instead of changing the old one.

So, how do they work? I’ll try to explain them through a series of blogposts, in which we look at manageable parts each time. It will be a detailed explanation, with all the different oddities around the implementation as well. Consequently, this blog series may not be the perfect fit for people who want a “summary” on how persistent vectors work.

For today, we’ll have a look at a naive first attempt, and will cover updates, insertion and popping (removal at the end).

Note that this blogpost does not represent how PersistentVector is implemented: There are some speed optimizations, solutions for transients and other details which we will cover later. However, this serves as a basis for understanding how they work, and the general idea behind the vector implementation.
….

The sort of post that makes you start wondering why we don’t have a persistent data model for XTM based topic maps?

With persistent we get to drop all the creating new identifiers on merges, creating sets of identifiers, determining if sets of identifiers intersect, to say nothing of having persistent identifiers for interchange of data with other topic maps. A topic’s identifier is its identifier today, tomorrow and at any time to which it is persisted.

To say nothing of having an audit trail for additions/deletions plus “merges.”

While you are considering those possibilities, see: Understanding Clojure’s Persistent Vectors, pt. 2

Are all the “Facts” in a Topic Map True? [Reporting on the NSA]

Thursday, October 24th, 2013

Topic maps are not constrained to report “true facts.”

The Topic Maps Data Model (TMDM, 5.3.1 Subjects and topics) states:

A subject can be anything whatsoever, regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever. In particular, it is anything about which the creator of a topic map chooses to discourse. (emphasis in the original)

Which is fortunate for topic map authors who are tracking the false claims that NSA surveillance has prevented 54 terrorist attacks.

Claim on “Attacks Thwarted” by NSA Spreads Despite Lack of Evidence by Justin Elliott and Theodoric Meyer, reports:

Earlier this month, Sen. Patrick Leahy, D-Vt., pressed Alexander on the issue at a Senate Judiciary Committee hearing.

“Would you agree that the 54 cases that keep getting cited by the administration were not all plots, and of the 54, only 13 had some nexus to the U.S.?” Leahy said at the hearing. “Would you agree with that, yes or no?”

“Yes,” Alexander replied, without elaborating.

“We’ve heard over and over again the assertion that 54 terrorist plots were thwarted” by the two programs, Leahy told Alexander at the Judiciary Committee hearing this month. “That’s plainly wrong, but we still get it in letters to members of Congress, we get it in statements. These weren’t all plots and they weren’t all thwarted. The American people are getting left with the inaccurate impression of the effectiveness of NSA programs.”

To track the spread of false facts, see the excellent visualization in How the NSA’s Claim on Thwarted Terrorist Plots Has Spread by By Sisi Wei, Theodoric Meyer and Justin Elliott.

With a topic map you could connect the spreaders of those lies with other lies they have spread on the same subject, other lies they have spread and their relationships to others who spread lies.

The NSA may be accidentally tracking terrorists every now and again.

What do you say to tracking the polluters of public policy discussions?

The TAO of Topic Maps in Spanish

Wednesday, April 17th, 2013

Steve Pepper sends word that The TAO of Topic Maps has been translated into Spanish!

I am very grateful to Maria Ramos of WebHostingHub.com for translating The TAO of Topic Maps into Spanish: http://www.webhostinghub.com/support/es/misc/mapas-tematicos.

Since the article contains a lot of technical terminology, it might be a good idea if some Spanish-speaking Topic Maps experts were to proof-read the translation. Please send any comments directly to Maria at mariar@webhostinghub.com with a cc: to me.

Other translations to note?

Unknown Association Roles (TMDM specific)

Saturday, April 13th, 2013

As I was pondering the treatment of nulls in Neo4j (Null Values in Neo4j), it occurred to me that we have something quite similar in the TMDM.

The definition of association items includes this language:

[roles]: A non-empty set of association role items. The association roles for all the topics that participate in this relationship.

I read this as saying that if I don’t know their role, I can’t include a known player in an association.

For example, I am modeling an association between two players to a phone conversation, who are discussing a drone strike or terrorist attack by other means.

I know their identities but I don’t know their respective roles in relationship to each other or in the planned attack.

I want to capture this association because I may have other associations where they are players where roles are known. Perhaps enabling me to infer possible roles in this association.

Newcomb has argued roles in associations are unique and in sum, constitute the type of the association. I appreciate the subtlety and usefulness of that position but it isn’t a universal model for associations.

By the same token, the TMDM restricts associations to use where all roles are known. Given that roles are often unknown, that also isn’t a universal model for associations.

I don’t think the problem can be solved by an “unknown role” topic because that would merge unknown roles across associations.

My preference would be to allow players to appear in associations without roles.

Where the lack of a role prevents the default merging of associations. That is, all unknown roles are presumed to be unique.

Suggestions?

Synonyms in the TMDM Legend

Sunday, May 13th, 2012

I was going over some notes on synonyms this weekend when it occurred to me to ask:

How many synonyms does a topic item have in the TMDM legend?

A synonym being when one term can be freely substituted for another.

Not wanting to trust my memory, I quote from the TMDM legend (ISO/IEC 13250-2):

Two topic items are equal if they have:

  • at least one equal string in their [subject identifiers] properties,
  • at least one equal string in their [item identifiers] properties,
  • at least one equal string in their [subject locators] properties,
  • an equal string in the [subject identifiers] property of the one topic item and the [item identifiers] property of the other, or
  • the same information item in their [reified] properties.

The wording is a bit awkward for my point about synonyms but I take it that if two topic had

at least one equal string in their [subject identifiers] properties,

I could substitute:

at least one equal string in their [item identifiers] properties, (in all relevant places)

and have the same effect.

I am going to be exploring the use of synonym based processing for TMDM governed topic maps.

Any thoughts or insights would be greatly appreciated.

Semantic Integration: N-Squared to N+1 (and decentralized)

Friday, September 30th, 2011

Data Integration: The Relational Logic Approach pays homage to what is called the N-squared problem. The premise of N-squared for data integration is that every distinct identification must be mapped to every other distinct identification. Here is a graphic of the N-squared problem.

Two usual responses, depending upon the proposed solution.

First, get thee to a master schema (probably the most common). That is map every distinct data source to a common schema and all clients have to interact with that one schema. Case closed. Except data sources come and go, as do clients so there is maintenance overhead. Maintenance can take time to agree on updates.

Second, no system integrates every other possible source of data, so the fear of N-squared is greatly exaggerated. Not unlike the sudden rush for “big data” solutions whether the client has “big data” or not. Who would want to admit to having “medium” or even “small” data?

The third response that is of topic maps. The assumption that every identification must map to every other identification means things get ugly in a hurry. But topic maps question the premise of the N-Squared problem, that every identification must map to every other identification.

Here is an illustration of how five separate topic maps, with five different identifications of a popular comic book character (Superman), can be combined and yet avoid the N-Squared problem. In fact, topic maps offer an N+1 solution to the problem.

Each snippet, written in Compact Topic Map (CTM) syntax represents a separate topic map.


en-superman
http://en.wikipedia.org/wiki/Super_man ;
- "Superman" ;
- altname: "Clark Kent" .

***


de-superman
http://de.wikipedia.org/wiki/Superman ;
- "Superman" ;
- birthname: "Kal-El" .

***


fr-superman
http://fr.wikipedia.org/wiki/Superman ;
- "Superman" ;
birthplace: "Krypton" .

***


it-superman
http://it.wikipedia.org/wiki/Superman ;
- "Superman" ;
- altname: "Man of Steel" .

***


eo-superman
http://eo.wikipedia.org/wiki/Superman ;
- "Superman" ;
- altname: "Clark Joseph Kent" .

Copied into a common file, superman-N-squared.ctm, nothing happens. That’s because they all have different subject identifiers. What if I add to the file/topic map, the following topic:


superman
http://en.wikipedia.org/wiki/Super_man ;
http://de.wikipedia.org/wiki/Superman ;
http://fr.wikipedia.org/wiki/Superman ;
http://it.wikipedia.org/wiki/Superman ;
http://eo.wikipedia.org/wiki/Superman .

Results in the file, superman-N-squared-solution.ctm.

Ooooh.

Or an author know one other identifier. So long as any group of authors uses at least one common identifier between any two maps, it results in the merger of their separate topic maps. (Ordering of the merges may be an issue.)

Another way to say that is that the trigger for merging of identifications is decentralized.

Which gives you a lot more eyes on the data, potential subjects and relationships between subjects.

PS: Did you know that the English and German versions gives Superman’s cover name as “Clark Kent,” while the French, Italian and Esperanto versions give his cover name as “Clark Joeseph Kent?”

PPS: The files are both here, Superman-Semantics-01.zip.

Subject Normalization

Thursday, September 29th, 2011

Another way to explain topic maps is in terms of Database normalization, except that I would call it subject normalization. That is every subject that is explicitly represented in the topic map appears once and only once, with relations to other subjects being recast to point to this single representative and all properties of the subject gathered to that one place.

One obvious advantage is that the shipping and accounting departments, for example, both have access to updated information for a customer as soon as entered by the other. And although they may gather different information about a customer, that information can be (doesn’t have to be) available to both of them.

Unlike database normalization, subject normalization in topic maps does not require rewriting of database tables, which can cause data access problems. Subject normalization (merging) occurs automatically, based on the presence of properties defined by the Topic Maps Data Model (TMDM).

And unlike OWL same:As, subject normalization in topic maps does not require knowledge of the “other” subject representative. That is I can insert an identifier that I know has been used for a subject, without knowledge it has been used in this topic map, and topics representing that subject will automatically merge (or be normalized).

Subject normalization in the terms of the TMDM, reduces the redundancy of information items. Which is true enough but not the primary experience of users with subject normalization. How many copies of a subject representative (information items) a system has is of little concern for an end-user.

What does concern end-users is getting the most complete and up-to-date information on a subject, however that is accomplished.

Topic maps accomplish that goal by empowering users to add identifiers to subject representatives that result in subject normalization. It doesn’t get any easier than that.

SAGA: A DSL for Story Management

Monday, September 12th, 2011

SAGA: A DSL for Story Management by Lucas Beyak and Jacques Carette (McMaster University).

Abstract:

Video game development is currently a very labour-intensive endeavour. Furthermore it involves multi-disciplinary teams of artistic content creators and programmers, whose typical working patterns are not easily meshed. SAGA is our first effort at augmenting the productivity of such teams.

Already convinced of the benefits of DSLs, we set out to analyze the domains present in games in order to find out which would be most amenable to the DSL approach. Based on previous work, we thus sought those sub-parts that already had a partially established vocabulary and at the same time could be well modeled using classical computer science structures. We settled on the ‘story’ aspect of video games as the best candidate domain, which can be modeled using state transition systems.

As we are working with a specific company as the ultimate customer for this work, an additional requirement was that our DSL should produce code that can be used within a pre-existing framework. We developed a full system (SAGA) comprised of a parser for a human-friendly language for ‘story events’, an internal representation of design patterns for implementing object-oriented state-transitions systems, an instantiator for these patterns for a specific ‘story’, and three renderers (for C++, C# and Java) for the instantiated abstract code.

I mention this only in part because of Jack Park’s long standing interest in narrative structures.

The other reason I mention this article is it is a model for how to transition between vocabularies in a useful way.

Transitioning between vocabularies is as nearly a constant theme in computer science as data storage. Not to mention that disciplines, domains, professions, etc., have been transitioning between vocabularies for thousands of years. Some more slowly than other, some terms in legal vocabularies date back centuries.

We need vocabularies and data structures, but with the realization that none of them are final. If you want blind interchangea of topic maps I would strongly suggest that you use one of the standard syntaxes.

But with the realization that you will encounter data that isn’t in a standard topic map syntax. What subjects are represented there? How would you tell others about them? And those vocabularies are going to change over time, just as there were vocabularies before RDF and topic maps.

If you ask an honest MDM advocate, they will tell you that the current MDM effort is not really all that different from MDM in the ’90’s. And MDM may be what you need, depends on your requirements. (Sorry, master data management = MDM.)

The point being that there isn’t any place where a particular vocabulary or “solution” is going to freeze the creativity of users and even programmers, to say nothing of the rest of humanity. Change is the only constant and those who aren’t prepared to deal with it, will be the worse off for it.

When Should Identifications Be Immutable?

Thursday, September 8th, 2011

After watching a presentation on Clojure and its immutable data structures, I began to wonder when should identifications be immutable?

Note that I said when should identifications… which means I am not advocating a universal position for all identifiers but rather a choice that may vary from situation to situation.

We may change our minds about an identification, the fact remains that at some point (dare I say state?) a particular identification was made.

For example, you make a intimate gesture at a party only to discover your spouse wasn’t the recipient of the gesture. But at the time you made the gesture, at least I am willing to believe, you thought it was your spouse. New facts are now apparent. But it is also a new identification. As your spouse will remind you, you did make a prior, incorrect identification.

As I recall, topics (and other information items) are immutable for purposes of merging. (TMDM, 6.2 and following.) That is merging results in a new topic or other new information item. On the other hand, merging also results in updating information items other than the one subject to merging. So those information items are not being treated as immutable.

But since the references are being updates, I don’t think it would be inconsistent with the TMDM to create new information items to be the carriers of the new identifiers and thus treating the information items as immutable.

Would be application/requirement specific but say for accounting/banking/securities and similar applications, it may be important for identifications to be immutable. Such that we can “unroll” a topic map as it were to any prior arbitrary identification or state.

TMDM to Redis Schema (paper)

Thursday, April 14th, 2011

Yet another mapping of the Topic Maps Data Model to Redis schema

By Johannes Schmidt :

In this document another mapping of the Topic Maps Data Model (TMDM) [3] to Redis key-value store [8] schema is drafted. An initial mapping [5] of the TMDM to Redis schema has been provided by the Topic Maps Lab of the University of Leipzig [9]. The main motivation is not to design a “better” schema but to simply do a mapping of the TMDM to a key-value store schema. Some valuable enhancements for the Topic Maps Lab schema are created, though.

Possible guide to mapping the TMDM to key-value store databases.

Something to consider would be mapping the TMDM to a graph database.

Do topic, association, and occurrence become nodes?

Dimensions to use to compare NoSQL data stores – Queries to Produce Topic Maps

Wednesday, January 26th, 2011

Dimensions to use to compare NoSQL data stores

A post by Huan Liu to read after Billy Newport’s Enterprise NoSQL: Silver Bullet or Poison Pill? – (Unique Questions?)

A very good quick summary of the dimension to consider. As Liu makes clear, choosing the right data store is a complex issue.

I would use this as an overview article to get everyone on a common ground for a discussion of NoSQL data stores.

At least that way, misunderstandings will be on some other topic of discussion.

BTW, if you think about Newport’s point (however correct/incorrect) that NoSQL databases enable only one query, doesn’t that fit the production of a topic map?

That is there is a defined set of constructs, with defined conditions of equivalence. So the only query in that regard has been fixed.

Questions remain about querying the data that a topic map holds, but the query that results in merged topics, associations, etc.

In some processing models, that query is performed and a merged artifact is produced.

Following the same data model rules, I would prefer to allow those queries be made on an ad hoc basis. So that users are always presented with the latest merged results.

Same rules as the TMDM, just a question of when they fire.

Questions:

  1. NoSQL – What other general compare/dimension articles would you recommend as common ground builders? (1-3 citations)
  2. Topic maps as artifacts – What other data processing approaches produce static artifacts for querying? (3-5 pages, citations)
  3. Topic maps as query results – What are the concerns and benefits of topic maps as query results? (3-5 pages, citations)

Topic Maps – Human-oriented Semantics? – Slides

Tuesday, January 25th, 2011

Topic Maps – Human-oriented Semantics? – Slides

The slides from Lars Marius Garshol’s topic map presentation tomorrow in Sogndal are now online.

Recommended for use with civilians. (Currently non-topic map advocates.)

See The Tin Man for my take away from the presentation.

The Tin Man

Friday, January 14th, 2011

One of the reasons I suggested having a podcast based topic maps conference is that watching presentations by others always (or nearly so) inspires me with new ideas.

Take Lars Marius Garshol’s presentation this morning, Topic Maps – Human Oriented Semantics?

While musing over the presentation, I was reminded of the line from the Tin Man, …Oz never did give nothing to the Tin Man / That he didn’t, didn’t already have….

If you don’t remember the story, the Wizard of Oz gives the Tin Man a heart, which he obviously had through out the story.

Anyway, I think one take away from Lars’ presentation is that users don’t need to go looking for experts in order to have semantics.

Users already have semantics and topic maps are a particularly clever way for users to express their semantics using their understanding of those semantics.

May or may not fit into classical, neo-classical, rough or fuzzy logic.

What matters is that a topic map represents subjects and their relationships as understood by the users of the topic map.

Users already have semantics, they just need topic maps in order to express them!

Topic Maps – Human-oriented Semantics? – A Quibble

Friday, January 14th, 2011

Topic Maps – Human-oriented Semantics?

As promised, I have a quibble about the presentation that Lars made this morning. 😉

When talking about topic maps as semantic technology, Lars suggested or at least I heard him suggest, that topic maps help the person inside the Chinese room in John Searle’s famous example.

Lars then proceeded to use an example of a topic map, where the content was written in Japanese.

To show that you could know something about the content or at least relationships between the content, whether you could read it or not.

All of which is true, but my quibble is that such an understanding is on the part of the audience to the presentation and not of the machine/person inside the Chinese room.

Even with a topic map as input, we still don’t know what, if anything, is understood by a person or machine inside the Chinese room.

All we ever know is that we got the correct response to our input.

The presentation elided the transition from the Chinese room to the audience for the presentation. Quite different, at least in my view.

I did not allow that to distract me from an otherwise excellent presentation but I thought I should mention it. 😉

idk (I Don’t Know)

Sunday, December 5th, 2010

What are you using to act as the placeholder for an unknown player of a role?

That is in say a news, crime or accident investigation, there is an association with specified roles, but only some facts and not the identity of all the players is known.

For example, in the recent cablegate case, when the story of the leaks broke, there was clearly an association between the leaked documents and the leaker.

The leaker had a number of known characteristics, the least of which was ready access to a wide range of documents. I am sure there were others.

To investigate that leak with a topic map, I would want to have a representative for the player of that role, to which I can assign properties.

I started to publish a subject identifier for the subject idk (I Don’t Know) to act as that placeholder but then thought it needs more discussion.

This has been in my blog queue for a couple of weeks so another week or so before creating a subject identifier won’t hurt.

The problem, which you already spotted, is that TMDM governed topic maps are going to merge topics with the idk (I Don’t Know) subject identifier. Which would in incorrect in many cases.

Interesting that it would not be wrong in all cases. That is I could have two associations, both of which have idk (I Don’t Know) subject identifiers and I want them to merge on the basis of other properties. So in that case the subject identifiers should merge.

I am leaning towards simply defining the semantics to be non-merger in the absence of merger on some other specified basis.

Suggestions?

PS: I kept writing the expansion idk (I Don’t Know) because a popular search engine suggested Insane Dutch Killers as the expansion. Wanted to avoid any ambiguity.

TMRM and a “universal information space”

Wednesday, November 24th, 2010

As an editor of the TMRM (Topic Maps Reference Model) I feel compelled to point out the TMRM is not a universal information space.

I bring up the universal issue because someone mentioned lately, mapping to the TMRM.

There is a lot to say about the TMRM but let’s start with the mapping issue.

There is no mapping to the TMRM. (full stop) The reason is that the TMRM is also not a data model. (full stop)

There is a simple reason why the TMRM was not, is not, nor ever will be a data model or universal information space.

There is no universal information space or data model.

Data models are an absolute necessity and more will be invented tomorrow.

But, to be a data model is to govern some larger or smaller slice of data.

We want to meaningfully access information across past, present and future data models in different information spaces.

Enter the TMRM, a model for disclosure of the subjects represented by a data model. Any data model, in any information space.

A model for disclosure, not a methodology, not a target, etc.

We used key and value because a key/value pair is the simplest expression of a property class.

The representative of the definition of a class (the key) and an instance of that class (the value).

That does not constrain or mandate any particular data model or information space.

Rather than mapping to the TMRM, we should say mapping using the principles of the TMRM.

I will say more in a later post, but for example, what subject does a topic represent?

With disclosure for the TMDM and RDF, we might not agree on the mapping, but it would be transparent. And useful.

Whose Logic Binds A Topic Map?

Tuesday, November 9th, 2010

An exchange with Lars Heuer over what the TMRM should say about “ako” and “isa” (see: A Guide to Publishing Linked Data Without Redirects brings up an important but often unspoken issue.

The current draft of the Topic Maps Reference Model (TMRM) says that subclass-superclass relationships are reflexive and transitive. Moreover, “isa” relationships, are non-reflexive and transitive.

Which is all well and good, assuming that accords with your definition of subclass-superclass and isa. The Topic Maps Data Model (TMDM) on the other hand defines “isa” as non-transitive.

Either one is a legitimate choice and I will cover the resolution of that difference elsewhere.

My point here is to ask: “Whose logic binds a topic map?”

My impression is that here and in the Semantic Web, logical frameworks are being created, into which users are supposed to fit their data.

As a user I would take serious exception to fitting my data into someone else’s world view (read logic).

That the real question isn’t it?

Whether IT/SW dictates to users the logic that will bind their data or if users get to define their own “logics?”

Given the popularity of tagging and folksonomies, user “logics” look like the better bet.

TMDM-NG – Overloading Occurrence

Friday, November 5th, 2010

“Occurrence” in topic maps is currently overloaded. Seriously overloaded.

In one sense, “occurrence” is used as it is in a bibliographic reference. That is that subject X “occurs” at volume Y, page Z. A reader expects to find the subject in question at that location.

In the overloaded sense, “occurrence” is used to mean some additional property of a subject.

To me the semantics of “occurrence” weigh against using it for any property associated with a subject.

That has been the definition used in topic maps for a very long time but that to me simply ripens it for correction.

Occurrence should be used only for instances of a subject that are located outside of a topic map.

A property element should be allowed for any topic, name, occurrence or association. Every property should have a type attribute.

It is a property of the subject represented by the construct where it appears.

Previously authored topic maps will continue to be valid since as of yet there are no processors that could validate the use of “occurrence” either in the new or old sense of the term.

Older topic map software will not be able to process newer topic maps but unless topic maps change and evolve (even COBOL has), they will die.

Revisiting the TAO of Topic Maps

Friday, November 5th, 2010

One of the readings for my course on topic maps is the TAO of Topic Maps.

I was re-reading it the other day while writing a lecture.

Topics can represent anything. That much we all know.

Associations represent “a relationship between two or more topics.”

Isn’t an association an “anything?”

Occurrences are “information resources that are deemed to be relevant to the topic in some way.”

Isn’t an occurrence an “anything?”

Which would mean that both association and occurrences could be represented by topics, but their not.

They have special constructs in ISO 13250. And defined sets of properties.

I thought about that for a while and it occurred to me that topic, association and occurrence are just convenient handles for bundles of properties.

When I say “association,” you know we are about to talk about a relationship between two subjects (topics), their roles, role players, etc.

Same goes for occurrence.

Or to put it differently, “topic,” “association” and “occurrence” facilitate talking about particular subjects and their properties.

Managing Semantic Ambiguity

Wednesday, November 3rd, 2010

Topic maps do not and cannot eliminate semantic ambiguity. What topic maps can do is assist users in managing semantic ambiguity with regard to identification of particular subjects.

Consider the well-known ambiguity of whether a URI is an identifier or an address.

The Topic Maps Data Model (TMDM) provides a way to manage that ambiguity by providing a means to declare if a URI is being used and an identifier or as an address.

That is only “managing” the ambiguity because there is no mechanism to prevent incorrect use of that mechanism, which would result in ambiguity or even having the mechanism mis-understood entirely.

Identification by saying a subject representative (proxy) must have properties X…Xn is a collection of possible ambiguities that an author hopes will be understood by a reader.

Since we are trying to communicate with other people, there isn’t any escape from semantic ambiguity. Ever.

Topic maps provide the ability to offer more complete descriptions of subjects in hopes of being understood by others.

With the ability to add descriptions of subjects from others, offering users a variety of descriptions of the same subject.

We have had episodic forays into “certainty,” the Semantic Web being only one of the more recent failures in that direction. Ambiguity anyone?

The UMLS Metathesaurus: representing different views of biomedical concepts

Wednesday, October 27th, 2010

The UMLS Metathesaurus: representing different views of biomedical concepts

Abstract

The UMLS Metathesaurus is a compilation of names, relationships, and associated information from a variety of biomedical naming systems representing different views of biomedical practice or research. The Metathesaurus is organized by meaning, and the fundamental unit in the Metathesaurus is the concept. Differing names for a biomedical meaning are linked in a single Metathesaurus concept. Extensive additional information describing semantic characteristics, occurrence in machine-readable information sources, and how concepts co-occur in these sources is also provided, enabling a greater comprehension of the concept in its various contexts. The Metathesaurus is not a standardized vocabulary; it is a tool for maximizing the usefulness of existing vocabularies. It serves as a knowledge source for developers of biomedical information applications and as a powerful resource for biomedical information specialists.

Bull Med Libr Assoc. 1993 Apr;81(2):217-22.
Schuyler PL, Hole WT, Tuttle MS, Sherertz DD.
Medical Subject Headings Section, National Library of Medicine, Bethesda, MD 20894.

Questions:

  1. Did you notice the date on the citation?
  2. Map this article to the Topic Maps Data Model (3-5 pages, no citations)
  3. Where does the Topic Maps Data Model differ from this article? (3-5 pages, no citations)
  4. If concept = proxy, what concepts (subjects) don’t have proxies in the Metathesaurus?
  5. On what basis are “biomedical meanings” mapped to a single Metathesaurus “concept?” Describe in general but illustrate with at least five (5) examples

TMDM-NG – Reification

Monday, October 18th, 2010

Reification in the TMDM means using a topic to “reify” a name, occurrence, association, etc. Whatever a subject is represented by a name, occurrence or association, after “reification” it is also also represented by a topic.

For the TMDM-NG, let’s drop reification and make names, occurrences, associations, etc., first class citizens in a topic map.

Making names, occurrences, associations first class citizens would mean we could add properties to them without the overhead of creating topics to represent subjects that already have representatives in a topic map.

Do need to work on occurrence being overloaded to mean both in the bibliographic sense as well as a property but that can wait for a future post.

Semantic Drift: A Topic Map Answer (sort-of)

Tuesday, October 12th, 2010

Topic maps took a different approach to the problem of identifying subjects (than RDF) and so looks at semantic drift differently.

In the original 13250, subject descriptor was defined as:

3.19 subject descriptor – Information which is intended to provide a positive, unambiguous indication of the identity of a subject, and which is the referent of an identity attribute of a topic link.

When 13250 was reformulated to focus on the XTM syntax and the legend known as the Topic Maps Data Model (TMDM), the subject descriptor of old became subject identifiers. (Clause 7, TMDM)

A subject identifier has information that identifies a subject.

The author of a topic uses information that identifies a subject to create a subject identifier. (Which is represented in a topic map by an IRI.)

Anyone can look at the subject identifier to see if they are talking about the same subject.

They are responsible for catching semantic drift if it occurs.

But, there is something missing from RDF and topic maps.

Something that would help with semantic drift, although they would use it differently.

Care to take a guess?

Topic Maps Data Model (TMDM) Turns 10 (next year)

Friday, August 6th, 2010

The Topic Maps Data Model (TMDM) first appeared in the SC 34 document registry on 11 August 2001. (That’s SC 34 N0242 for ISO insiders.)

What better way to celebrate its “birthday” than a two day, 2 hours per day, series of presentations on what we have learned in the past ten years and where we would like to go?

I am proposing teleconferences on the 11th and 12th of August, 2011, say from 10 AM UTC/GMT (12 PM Norway, 7 PM Japan, 6 AM Eastern US) until 12 PM UTC/GMT.

General format being 20 minute presentations with 10 minutes Q/A. That should accommodate a maximum of 4 presentations each day.

Comments/suggestions? Volunteers to make presentations?

******

Correction/Clarification:

TMDM – Current TMDM Page the link that appears above.

http://www1.y12.doe.gov/capabilities/sgml/sc34/document/0242.htm The version of the TMDM that will be 10 next year.

Thanks to Lars Heuer for catching the confusion. I wanted to point everyone to the current TMDM page.

*****
Note that I corrected the date for the appearance of the TMDM from 11 August 2011 to 11 August 2001 (its actual date of appearance).

Thanks for Lars Marius Garshol for the correction!

Topic Maps Data Model (TMDM) in a nutshell

Thursday, July 29th, 2010

Topic Maps Data Model (TMDM) in a nutshell by Marcel Hoyer is a handy graphic representation of all the relationships in the TMDM.

Topic Map: An Ontology Framework for Information Retrieval

Wednesday, June 2nd, 2010

Topic Maps Lab reports Topic Map: An Ontology Framework for Information Retrieval, a presentation by Rajkumar Kannan, at the National Conference on Advances in Knowledge Management 2010. (National Conference on Advances in Knowledge Management(NCAKM’10), pp195-198, March 2010, India.)

Nothing novel for long time topic map advocates but a place for others to start learning about topic maps.

Which reminds me, I need to return to the non-standard/technical introduction to topic maps. Will try to post the first installment, without illustrations (still looking for an illustrator) later this week.

Complex Merging Conditions In XTM

Wednesday, April 21st, 2010

We need a way to merge topics for reasons that are not specified by the TMDM.

For example, I want merge topics that have equivalent occurrences of type ISBN. Library catalogs in different languages may only share the ISBN of an item as a common characteristic. A topic map generated from each of them could have the ISBN as an occurrence on each topic.

I am assuming each topic map relies upon library identifiers for “standard” merging because that is typically how library systems bind the information for a particular item together.

So, how to make merging occur when there are equivalent occurrences of type ISBN?

Solution: As part of the process of creating the topics, add a subject identifier based on the occurrences of type ISBN that results in equivalent subject identifiers when the ISBN numbers are equivalent. That results in topics that share equivalent occurrences of type ISBN merging.

While the illustration is with one occurrence, there is no limit as to the number of properties of a topic that can be considered in the creation of a subject identifier that will result in merging. Such subject identifiers, when resolved, should document the basis for their assignment to a topic.

BTW, assuming a future TMQL that enables such merging, note this technique will work with XTM 1.0 topic map engines.

Caution: This solution does not work for properties that can be determined only after the topic map has been constructed. Such as participation in particular associations or the playing of particular roles.

PS: There is a modification of this technique to deal with participation in associations or the playing of particular roles. More on that in another post.

An SQL Example for Michael

Sunday, April 18th, 2010

Marijane White pointed out the following comment from Michael Sperberg-McQueen asking how topic maps differ from SQL:

The biggest set of open questions remains: how does modeling a collection of information with Topic Maps differ from modeling it using some other approach? Are there things we can do with Topic Maps that we can’t do, or cannot do as easily, with a SQL database? With a fact base in Prolog? With colloquial XML? It might be enlightening to see what the Italian Opera topic map might look like, if we designed a bespoke XML vocabulary for it, or if we poured it into SQL. (I have friends who tell me that SQL is really not suited for the kinds of things you can do with Topic Maps, but so far I haven’t understood what they mean; perhaps a concrete example will make it easier to compare the two.)

From http://cmsmcq.com/mib/?p=810

An SQL example:

firstName lastName
Patrick Durusau

And elsewhere:

givenName surName
Patrick Durusau

An interface could issue separate queries and returns a consolidated result.

Does that equal a topic map? My answer is NO!.

The questions that SQL doesn’t answer (topic maps do):

  • On what basis to map? There are no explicit properties of those subjects on which to make a mapping.
  • What rules should we follow? Because there are no explicit rules even assuming there were properties for these subjects.

Contrast that with (topics in CTM syntax):


http://en.wikipedia.org/wiki/First_name
- "firstName" .


http://en.wikipedia.org/wiki/First_name
- "givenName" .

The Topic Maps Data Model (TMDM) defines the subject identifier property (the URL string you see) and that when subject identifier properties are equal the topics merge.

Different situation from the SQL example.

First, we have a defined property that anyone can look at to judge both the merging (are these really the same two subjects?) as well as to decide if they want to merge their subject representatives with these.

Second, we have a rule by which the mapping/merging occurs. We are no long relying on a blind mapping between the two subject representatives.

Topic maps are a three fold trick: 1) No second class subjects, 2) Explicit properties for identification, 3) Explicit rules for when subject representatives are considered to represent the same subject.

Apologies for the length of this post! But, Michael wanted an example.

Questions?

(I will answer Michael’s questions about XML and Prolog separately.)

Degrees of Separation and Scope

Wednesday, April 14th, 2010

Most people have heard of the Six degrees of separation.

I am wondering how to adapt that to scope?

Reasoning that when I find that a particular author uses the term “list washing” as an alternative way to identify “record linkage,” I should scope that term by that author.

Assuming that author has co-authors, those authors should be used as scopes on that term as well.

That seems straight forward enough but then it occurred to me that anyone who either cites that article or one of those authors, is probably using the same term to identify the same subject. So I need to extend the scope to include those authors as well.

You can see where this is going.

But unlike the usual citation network, this is tracing at a more fine grained level the identification of subjects, which isn’t necessarily co-extensive with citation of an article.

If a legal analogy would help, courts cite prior decisions for all sorts of reasons and being able to identify the ones that are important to your case would save enormous amounts of time. Remembering that even in hard times top firms charge anywhere from $300 to $750/hour, saving time can be important.*

Thinking about it visually, imagine a citation network, those are common enough, but where you can lift out a set of connections based on the usage of a particular term to identify a subject.

Add merging of the different identifications and it starts to sound like a game, with scores, etc., to tease apart citation networks into references to particular subjects, even though the authors use different terminology.

*****

*Public access to legal material projects should note that court opinions exhibit the same behavior. If a court in Case1 cites Case2 for a jurisdictional issue, it is likely that any other case citing Case1 and Case2, is also citing Case2 for a jurisdictional issue. Old law clerk trick. Not always true but true often enough to get a lot of mileage out of one identification of why a case was cited.

In Praise of Legends (and the TMDM in particular)

Thursday, March 11th, 2010

Legends enable topic maps to have different representations of the same subject. Standard legends, like the Topic Maps Data Model (TMDM), are what enable blind interchange of topic maps.

Legends do a number of things but among the more important, legends define the rules for the contents of subject representatives and the rules for comparing them. The TMDM defines three representatives for subjects, topics, associations and occurrences. It also defines how to compare those representatives to see if they represent the same subjects.

Just to pull one of those rules out, if two or more topics have an equal string in their [subject identifiers] property, the two topics are deemed to represent the same subject. (TMDM 5.3.5 Properties) The [subject identifiers] property is a set so a topic could have two or more different strings in that property to match other topics.

It is the definition of a basis for comparison (see the TMDM for the other rules for comparing topics) of topics that enables the blind interchange of topic maps that follow the TMDM. That is to say that I can author a topic map in XTM (one syntax that follows the TMDM) and reasonably expect that other users will be able to successfully process it.

I am mindful of Robert Cerny’s recent comment on encoding strings as URLs but don’t think that covers the case where the identifications of the subjects are dynamic. That is to say that the strings themselves are composed of strings that are subject to change as additional items are merged into the topic map.

The best use case that comes to mind is that of the current concern in the United States over the non-sharing of intelligence data. You know, someone calls up and says their son is a terrorist and is coming to the United States to commit a terrorist act. That sort of intelligence. That isn’t passed on to anyone. At least anyone who cared enough to share it, I don’t know, with the airlines perhaps?

If I can author a subject identification that includes a previously overlooked source of information, say the parent of a potential terrorist, in addition to paid informants, current/former drug lords, etc. the usual categories, then the lights aren’t simply blinking red there is actual information in addition to the blinking lights.

I really should wait for Robert to make his own arguments but if you think of URLs as simply strings, without any need for resolution, you could compose a dynamic identification, freeze it into a URL, then pass it along to a TMDM based system. You don’t get any addition information but that would be one way to input such information into a TMDM based system. If you control the server you could provide a resolution back into the dynamic subject identification system. (Will have to think about that one.)

I think of it as the TMDM using sets of immutable strings for subject identification and one of the things the TMRM authorizes, but does not mandate, is the use of mutable strings as subject identifiers.