Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 24, 2010

Dangers of Renaming

Filed under: Subject Identifiers,Subject Identity,Topic Maps — Patrick Durusau @ 3:49 pm

Topic maps and the semantic web share problems and dangers in their rush to re-name things with IRIs.

The problems include, the number of subjects, the propagation (enforcement?) of new names, the emergence of new subjects, and others.

Re-naming has a graver danger, identified by Michael Shara, curator of Astrophysics, American Museum of Natural History, when asked why the heaviest star in the universe, R136a**, doesn’t have a better name.* He responsed:

…partly because it [R136a] refers back to the original catalog, and once you go back to the original catalog, you can find all the literature that refers to it, so naming it John’s star or Betty’s Bright Object, would take that away from us.

So would renaming it to an IRI.

Request of the topic maps and semantic web communities:

Please let us keep our identifiers (as identifiers) and our history.

*****
*Weekend Edition – 24 July 2010 – Biggest Star Still Managed To Hide Until Just Now

**Astronomers find a 300 mass star (Royal Astronomical Society)

July 20, 2010

Subjects, Sets and Identifications

Filed under: Subject Identifiers,Subject Identity,Topic Maps — Patrick Durusau @ 6:18 am

There was a lively discussion on the topicmapmail discussion list about books and whether they have any universal identifiers. (Look in the archives for July, 2010 and messages with MARC in the subject line.)

There are known problems with ISBNs, such as publishers re-using them or assigning duplicate ISBNs to different books or simply making mistakes with the numbers themselves.

It was reported by one participant that Amazon uses it own unique identifier for books.

The United States Library of Congress has its own internal identifier for books in its collection.

Not to mention that other library systems have their own identifiers for their collections.

At a minimum, it is possible for a book, considered as a subject, to have an ISBN, an identifier from Amazon, another identifier at the Library of Congress and still others in other systems. Perhaps even a unique identifier from a book jobber that sells books to libraries.

If you think about that for a moment, it become clear that a book as a subject has a *set* of identifiers, all of which identify the same subject. Moreover, each of those identifiers works best in a particular context, dare we say the identifier has a scope?

If I had a representative (a topic) for this subject (book) that had a set of identifiers (ISBN, ASIN, LOC, etc.) and each of those identifiers had a scope, I could reliably import information from any source that used at least one of those identifiers.

The originators of those identifiers can use continue to use their identifiers and yet enjoy the benefits of information that was generated or collected using other identifiers.

Topic maps anyone?

June 15, 2010

Library of Congress LCCN Permalink

Filed under: Subject Identifiers — Patrick Durusau @ 10:09 am

Library of Congress LCCN Permanlinks provide a persistent link to bibliographic records in the Library of Congress catalog.

From the FAQ:

You can use an LCCN Permalink anywhere you need to reference an LC bibliographic record — in emails, blogs, databases, web pages, digital files, etc.

Let’s see how that works: http://lccn.loc.gov/88009251

The internal system maintains its use of the Library of Congress Control Number (the LCCN in the title), which is a unique identifier for that record and allows the outside world access to the same information using a URI.

Question: When I have a work that is identified by a LCCN Permalink and also has an identifier in CiteseerX, DBLP, WorldCat or in a European library, which one should I use?

Question: The FAQ says this link identifies the bibliographic record. Not the same thing as the book it identifies. How should I tell others that I am using the URI to identify a particular book? (Which is not the same thing as the record for that book.)

June 10, 2010

Linked Data and Citation Indexes

Filed under: Linked Data,Subject Identifiers — Patrick Durusau @ 5:46 am

Citation indexes offer a concrete example of why blindly following the linked data mantra of creating “ URIs as names for things” (Linked Data) is a bad idea.

Science Citation Index Expanded ™ by Thompson Reuters offers coverage using citations to identify articles back to 1900. That works because the articles use citations as identifiers to reference previous articles.

There are articles available in digital form, from arXiv.org, CiteSeerX or some other digital repository. That means that they have an identifier in addition to the more traditional citation reference/identifier.

Where multiple identifiers identify the same subject, we need equivalence operators.

Where identifiers already identify subjects, we need operators that re-use those identifiers.

Ask yourself, “What good is a new set of identifiers that partially duplicates existing identifiers?”

If you think you have a good answer, please email me or reply to this post. Thanks!

May 18, 2010

The Story of Blow

Filed under: Humor,Subject Identifiers — Patrick Durusau @ 9:40 am

Thomas Neidhart‘s comments made me realize I had been too brief on the issue of subject identifiers. I want to correct that by telling “The Story of Blow.”

If you are reading this post you are likely online so please open up another browser window to: Merriam-Webster and type in the search box the word “blow.”

Working from my post What Makes Subject Identifiers Different?, let’s go down my four points for “blow.”

1) Quite clearly “blow” identifies a lot of different subjects. So it is a “subject identifier” in the non-topic map sense.

2) And just as clearly, “blow” can be, has the capacity to, lead us to additional information. That is it can be resolved.

Doesn’t mean it will be resolved, only that resolution is possible.

3) The additional information point is illustrated by the Merriam-Webster entry. As a transitive verb, it lists some 14 separate meanings. All of which involve additional information to know which one is meant.

Btu the dictionary is just a common example.

Another is the information that speakers of English carry around about the meanings of “blow.”

Which means our resolutions of “blow” can differ from that of others. (The “vocabulary problem.”)

4) The additional information in a dictionary is explicit. That is you and I can both examine the same information.

That is in contrast to each of us hearing the term “blow” in conversation or over the radio/TV and deciding privately what was meant. We go through the first three steps but not to the fourth.

I could say: “That was good blow.” and leave you wondering what possible meaning I have assigned to the term “blow.” I’m surprised the dictionary omits this one, in another lifetime I would have understood it to be a reference to cocaine. So if I wanted that usage to be understood by others, I had better mark it with a Subject Identifier so as to make that meaning explicit.

I can think of several other missing definitions for “blow.” Can you?

PS: I was amused at the example given for the sense of “blow” as to spend extravagantly, “I will blow you to a steak.” Since Google reports no “hits” on that string I suspect it was inserted to catch anyone copying their definitions.

May 16, 2010

What Makes Subject Identifiers Different?

Filed under: Subject Identifiers — Patrick Durusau @ 5:57 pm

What makes Subject Identifiers (topic maps sense) different from subject identifiers (non-topic maps sense)?

Summary of the argument/answer for the impatient:

Property subject identifier Subject Identifier
Identifies Subject Yes Yes
Resolvable Yes Yes
Resolution = More Information Yes Yes
Explicit Information No Yes

Identifies Subject

All “subject identifiers” and “Subject Identifiers identify subjects.

Words, for example, as “subject identifiers,” identify subjects.

Resolvable

All “subject identifiers” and “Subject Identifiers” are resolvable. That is they can lead to more information.

Resolution = More Information

The resolution of a “subject identifier” or “Subject Identifier” leads to information that identifies the subject it represents.

Explicit Information

Resolving a “subject identifier” does not lead to explicit information. Known only to the listener.

Resolving a “Subject Identifier” does lead to explicit information. Known to anyone who looks.

Conclusion: Resolution of “Subject Identifiers” leads to explicit information others can use to understand what subject it represents.

May 12, 2010

Time, Tide and Identifiers Wait for No One

Filed under: Subject Identifiers,Subject Identity — Patrick Durusau @ 10:38 am

The earliest record of time and tide wait for no man dates from 1225 and reads in modern English:

the tide abides for, tarrieth for no man, stays no man, tide nor time tarrieth no man

Meaning no one can command time. The same is true for identifiers.

What do you think “tide” means in the title? Ocean tide perhaps?

In the original phrase, “tide” meant a period of time. The identifier persisted, but its meaning changed.

Identifiers for subjects and their meanings change.

Topic maps can follow those changes.  Can you?

April 21, 2010

Complex Merging Conditions In XTM

Filed under: Merging,Subject Identifiers,TMDM,Topic Maps — Patrick Durusau @ 6:09 pm

We need a way to merge topics for reasons that are not specified by the TMDM.

For example, I want merge topics that have equivalent occurrences of type ISBN. Library catalogs in different languages may only share the ISBN of an item as a common characteristic. A topic map generated from each of them could have the ISBN as an occurrence on each topic.

I am assuming each topic map relies upon library identifiers for “standard” merging because that is typically how library systems bind the information for a particular item together.

So, how to make merging occur when there are equivalent occurrences of type ISBN?

Solution: As part of the process of creating the topics, add a subject identifier based on the occurrences of type ISBN that results in equivalent subject identifiers when the ISBN numbers are equivalent. That results in topics that share equivalent occurrences of type ISBN merging.

While the illustration is with one occurrence, there is no limit as to the number of properties of a topic that can be considered in the creation of a subject identifier that will result in merging. Such subject identifiers, when resolved, should document the basis for their assignment to a topic.

BTW, assuming a future TMQL that enables such merging, note this technique will work with XTM 1.0 topic map engines.

Caution: This solution does not work for properties that can be determined only after the topic map has been constructed. Such as participation in particular associations or the playing of particular roles.

PS: There is a modification of this technique to deal with participation in associations or the playing of particular roles. More on that in another post.

April 15, 2010

Topic Maps Gospel

Filed under: Subject Identifiers,Subject Locators,Topic Maps — Patrick Durusau @ 10:08 am

We are all familiar with the topic maps gospel that emphasizes that subjects can have multiple identifications. And unlike other semantic technologies, we can distinguish between identifiers and locators.

There is no shortage of data integration and other IT projects that would benefit from hearing the topic maps gospel.

So, why hasn’t the gospel of topic maps spread? I suspect it is because semantic integration is only one need among many.

For example, enabling federated, global debate is ok but I need relevant documents for an IRS auditor. Who is waiting for an answer. Can we do that first?

Meeting user needs as the users understand them may explain the success of NetworkedPlanet. They have used topic maps to enhance Sharepoint, something users see a need for.

We need to preserve the semantic integration that defines topic maps but let’s express it in terms of meeting the needs others have articulated. In the context of their projects.

My first target? (First question you should ask when anyone has a call to action.) Next generation library catalog projects. I am creating a list of them now. Will lurk for a while to learn their culture but will be spreading the topic maps gospel.

The conversation will naturally develop to include the treatment of relationships (associations in our speak), roles, and in some cases, interchange of the resulting information (when interchange syntax questions arise).

That sounds like a good way to spread the good news of topic maps to me.

April 9, 2010

TFM (To Find Me) Scoring

Filed under: LCSH,Subject Headings,Subject Identifiers,Subject Identity — Patrick Durusau @ 8:34 pm

The TFM (To Find Me) score for a topic map or other information resource depends upon the subject being identified.

Here is a portion of a record from the Library of Congress:

LC Control No.: 2001376890
Type of Material: Book (Print, Microform, Electronic, etc.)
Main Title: Medieval Slavic manuscripts and SGML : problems and
perspectives = Srednovekovni slavi·a·nski rukopisi i
SGML / [Anisava Miltenova, David Birnbaum, editors].
Parallel Title: Srednovekovni slavi·a·nski rukopisi i SGML
Published/Created: Sofii·a· : A.I. “Prof. Marin Drinov”, 2000.
Related Names: Miltenova, Anisava
Birnbaum, David J.
Description: 371 p. : ill. ; 24 cm.
ISBN: 9544307400
Subjects: ***omitted, will cover in another post***
LC Classification: Z115.5.C57 M43 2000
Language Code: eng bul
Other System No.: (OCoLC)ocm45819499
CALL NUMBER: Z115.5.C57 M43 2000

How many ways can you find this book?

  1. Main title: Medieval Slavic manuscripts and SGML : problems and perspectives
  2. Parallel Title: Srednovekovni slavi·a·nski rukopisi i SGML
  3. ISBN: 9544307400
  4. Other System No.: (OCoLC)ocm45819499

TFM score of 4. Four ways to find this book.

But, why the following weren’t included?

  1. LC Control No.: 2001376890
  2. CALL NUMBER: Z115.5.C57 M43 2000

Which would have made the TFM score 6.

Depends on what subject you think is being identified.

If the subject is this book, as a publication, the TFM score remains at 4.

If the subject is a particular copy of this book, held by the Library of Congress, the TFM score goes to 6.

« Newer Posts

Powered by WordPress