Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 17, 2013

Mapping Wikipedia

Filed under: Topic Maps,Wikipedia — Patrick Durusau @ 4:09 pm

Carl Lemp, commented in the XTM group at LinkedIn, potential redesign of topic maps discussion:

2. There are only a few tools to help build a Topic Map.
3. There is almost nothing to help translate familiar information structures to Topic Map structures.
(…)
Getting through 2 and 3 is a bitch.

I can’t help with #2 but I may be able to help with #3.

I suggest mapping the MediaWiki structure that is used for Wikipedia into a topic map.

As a demonstration it has the following advantages:

  1. Conversion from SQL dump to topic map scripts.
  2. Large enough to test alternative semantics.
  3. Sub-sets of Wikipedia good for starter maps.
  4. Useful to merge with other data sets.
  5. Well known data set.
  6. Widespread data format (SQL).

The MediaWiki schema MediaWiki-1.21.1-tables.sql.

The base output format will be CTM.

When we want to test alternative semantics, I suggest that we use “.” followed by “0tm” (zero followed by “tm”) as the file extension. Comments at the head of the file should reference or document the semantics to be applied in processing the file.

In terms of sigla for annotating the SQL, are there any strong feelings against? (Drawn from the TMDM vocabulary section):

A association representation of a relationship between one or more subjects
Ar association role representation of the involvement of a subject in a relationship represented by an association
Art association role type subject describing the nature of the participation of an association role player in an association
At association type subject describing the nature of the relationship represented by associations of that type
Ir information resource a representation of a resource as a sequence of bytes; it could thus potentially be retrieved over a network
Ii item identifier locator assigned to an information item in order to allow it to be referred to
O occurrence representation of a relationship between a subject and an information resource
Ot occurrence type subject describing the nature of the relationship between the subjects and information resources linked by the occurrences of that type
S scope context within which a statement is valid
Si subject identifier locator that refers to a subject indicator
Sl subject locator locator that refers to the information resource that is the subject of a topic
T topic symbol used within a topic map to represent one, and only one, subject, in order to allow statements to be made about the subject
Tn topic name name for a topic, consisting of the base form, known as the base name, and variants of that base form, known as variant names
Tnt topic name type subject describing the nature of the topic names of that type
Tf topic type subject that captures some commonality in a set of subjects
Vn variant name alternative form of a topic name that may be more suitable in a certain context than the corresponding base name

The first step I would suggest is creating a visualization of the MediaWiki schema.

We will still have to iterate over the tables but getting an over all view of the schema will be helpful.

Suggestions on your favorite schema visualization tool?

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress