Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 30, 2010

Topic Maps Axioms

Filed under: Mapping,Maps,Topic Maps — Patrick Durusau @ 8:14 pm

While working on a “non-standard” explanation of topic maps (read non-formal notation), I thought about topic maps axioms that I needed to write down.

  1. All territories can have multiple maps.
  2. All maps can have multiple legends.
  3. All maps and legends are territories.

Here are my proposed illustrations for the axioms and perhaps readers can suggest alternatives or improvements.

All territories can have multiple maps. I will capture the outline of a geographic area, the borders, and use public domain maps that fade over each other to illustrate a single territory having multiple maps. That’s a geographic example, but I need an intellectual territory to illustrate the same principle. Suggestions?

All maps can have multiple legends. Start with a single map of a geographic area and apply different legends to it. For example, a Google map and asking for all hotels or schools or stores, etc. Different legends show different things (dare I say subjects?) on the same map.

All maps and legends are territories. Use a Google map showing say schools, stores, gas stations and then add foreign language terms for each items? Treating those parts of a map as subjects and adding an additional identifications.

April 29, 2010

Second Class Citizens/Subjects

Filed under: Semantic Diversity,Subject Identity,TMRM — Patrick Durusau @ 6:39 pm

One of the difficulties that topic maps solve is the question of second class citizens (or subjects) in information systems.

The difficulty is one that Marijane raises when she quotes Michael Sperberg-McQueen wondering how topic maps differ from SQL databases, Prolog or colloquial XML?

One doesn’t have to read far to find that SQL databases, colloquial XML (and other information technologies) talk about real world subjects.*

The real world view leaves the subjects that comprise information systems out of the picture.

That creates an underclass of subjects that appear in information systems, but can never be identified or be declared to have more than one identification.

Mapping strategies, like topic maps enable users to identify any subject. Any subject can have multiple identifiers. Users can declare what properties must be present to identify a subject. Including the subjects that make up information systems.

*Note my omission of Prolog. Some programming languages may be more map friendly than others but I am unaware of any that cannot attribute properties to parts of a data structure (or its contents) for the purposes of mapping and declaring a mapping.

April 28, 2010

URIs As Shorthand

Filed under: PSI,Subject Identity — Patrick Durusau @ 3:49 pm

Inge Henriksen made me realize that URIs are being used as a shorthand for the {set of properties} that identify subjects.

A user recognizes a subject by observing/recognizing some {set of properties}.

They then choose a URI as the shorthand for the {set of properties} they recognized.

To interchange a URI with others, the other users need to know what {set of properties} map to the URI.

Corollary: If no {set of properties} maps to a URI, there is no interchange.

Well, no reliable interchange.

Inge could use http://psi.ontopedia.net/Inge_Henriksen to identify himself. I could use http://psi.ontopedia.net/Inge_Henriksen to identify Gjetost cheese. If the URI did not map to a set of properties, how would you choose between them?

(Detail: Less than all the properties in the {set of properties} may identify a subject. I’ll talk about that at a later point.)

April 27, 2010

Use My Model/Language Mister!

Filed under: Authoring Topic Maps,Semantics,Subject Identity,Topic Maps — Patrick Durusau @ 6:48 pm

“Use My Model/Language Mister!” is the cry of markup, modeling and semantics projects.

They all equally sincere and if you don’t like any of them, wait another six months or so for additional choices.

I don’t remember if it was after the 75th or 100th or somewhere past the 100th “true” model that I began to suspect something was amiss.

Models and languages change over time and can be barriers to discussion and discovery of badly needed information.

Rather than arguing for this or that model, as though it were some final answer, why not ask which model suits our present purposes?

With topic maps, once the subjects under discussion are identified, how they are represented for some purpose is a detail. A very important detail but a detail none the less.

If, or rather when, our requirements change, the same subject can be represented in a different way. The subjects can be identified, again, to create a new representation, or, if identified using topic maps, our job of moving to another model just got a whole lot easier.

April 26, 2010

Are Topic Maps News?

Filed under: Semantic Diversity,Subject Identity,Topic Maps — Patrick Durusau @ 4:24 pm

Jack Park, co-editor of XML Topic Maps likes to tell me: “topic maps are not news.” I respond with a variety of explanations/defenses.

Today I wrote the following:

Topic maps are a representation of what people have been doing for as long as they been able to communicate and had different ways to identify things they wanted to talk about.

Some people were able to recognize the same subjects were being identified differently, so they created a mental mapping of the different identifiers. When we reached the age of recorded information, that mental mapping enabled them to find information recorded under different identifications for the same subject.

Topic maps, like thesauri and indexes before them, enable people to write down their mappings. And say on what basis those mappings were done. The first act enables people to use mappings done by others, like thesauri and indexes. The second act, recording the reason for the mapping (subject identity), enables the re-use of a mapping.

So, no news. Saving time, money, resources, enabling auditability/transparency, preserving institutional memory, re-use of mappings (reliably), making more information available to more people, but alas, no news.

Interface 2010: Humanities and Technology

Filed under: Conferences — Patrick Durusau @ 10:40 am

Warwick, UK, 15-16 July, 2010, conference fee of £ 40, a chance to network with both humanities projects and technologists.

Presentations about topic maps occur at markup and topic maps conferences, but are there presentations outside those venues? Topic maps presentations at general IT or other conferences? Anyone planning topic maps presentations at general IT or other conferences?

Promoting topic maps to each other isn’t going to find new customers/users for topic maps.

PS: There are 14 days left for paper submissions to this conference.

April 25, 2010

A “Terrier” For Your Tool Box?

Filed under: Search Engines — Patrick Durusau @ 2:55 pm

Terrier IR project description:

Terrier is a highly flexible, efficient, and effective open source search engine, readily deployable on large-scale collections of documents. Terrier implements state-of-the-art indexing and retrieval functionalities, and provides an ideal platform for the rapid development and evaluation of large-scale retrieval applications.

Terrier is open source, and is a comprehensive, flexible and transparent platform for research and experimentation in text retrieval. Research can easily be carried out on standard TREC and CLEF test collections.

Become comfortable with the TREC or CLEF test collections.

A topic map on any part of either collection would attract IR researchers to topic maps.

Topic Maps Roots?

Filed under: Mapping,Search Interface,Topic Map Software,Topic Maps — Patrick Durusau @ 9:26 am

Have you read: Hypermedia exploration with interactive dynamic maps by Mountaz Zizi and Michel Beaudouin-Lafon?

They define “interactive dynamic maps (IDMs),” which consist of:

topic maps, which provide visual abstractions of the semantic content of a web of documents and document maps, which provide visual abstractions of subsets of documents. (emphasis in original)

The authors speak of creating a thesaurus, user control over query expansion, using queries to create new maps, treating maps as documents, sharing maps among users, etc. Plus have screen shots of working software, SHADOCS.

The authors do not cite ISO 13250. The year? 1995 ISO/IEC 13250 became an ISO standard in 1999.

They don’t have roles and role players, etc., nor an explicit notion of subject identity, but where are we with regard to user control over query expansion for example? Or creating new maps with queries? (Was that possible with Robert Barta’s last draft for TMQL?)

Those who do not learn from history are doomed to re-invent it, maybe. (apologies to Santayana)

April 24, 2010

Usability at TMRA 2010?

Filed under: Conferences,Interface Research/Design,Topic Map Software,Usability — Patrick Durusau @ 6:58 pm

The success of topic maps depends upon having interfaces people will want to use.

Let’s request a one-day workshop on usability prior to TMRA 2010.

An overview of usability studies, techniques and literature. Might be a push in the right direction.

Perhaps a usability (HCI – human-computer interaction) track for TMRA 2011?

With case studies from topic map projects and usability researchers.

Impatient? See: HCI Bibliography : Human-Computer Interaction Resources, a collection of over 57,000 documents, plus recommended readings, link collections, etc.

Explicit Semantic Analysis

Filed under: Classification,Data Integration,Information Retrieval,Semantics — Patrick Durusau @ 7:58 am

Explicit Semantic Analysis looks like another tool for the topic maps toolkit.

Not 100% accurate but close enough to give a topic map project involving a serious amount of text a running start.

Start with Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis by Evgeniy Gabrilovich and Shaul Markovitch.

There are 55 citations of this work (as of 2010-04-24), ranging from Geographic Information Retrieval and Beyond the Stars: Exploiting Free-Text User Reviews for Improving the Accuracy of Movie Recommendations (2009) to Explicit Versus Latent Concept Models for Cross-Language Information Retrieval.

I encountered this line of work while reading Combining Concept Based and Text Based Indexes for CLIR by Philipp Sorg and Philipp Cimiano (slides) from the 2009 Cross Language Evaluation Forum. (For any search engines, CLIR = Cross-Language Information Retrieval.) Cross Language Evaluation Forum General link because it does not expose direct links to resources.

Quibble:

Evgeniy Gabrilovich and Shaul Markovitch say that:

We represent texts as a weighted mixture of a predetermined set of natural concepts, which are defined by humans themselves and can be easily explained. To achieve this aim, we use concepts defined by Wikipedia articles, e.g., COMPUTER SCIENCE, INDIA, or LANGUAGE.

and

The choice of encyclopedia articles as concepts is quite natural, as each article is focused on a single issue, which it discusses in detail.

Their use of “natural,” which I equate in academic writing to “…a miracle occurs…,” drew my attention. There are things we choose to treat as concepts or even subject representatives, but that hardly makes them “natural.” Most academic articles would claim (whether true or not) to be “…focused on a single issue, which it discusses in detail.”

Rather than “natural concepts,” describe the headers of Wikipedia texts. More accurate and sets the groundwork for investigation into the nature and length of headers and their impact on semantic mapping and information retrieval.

April 23, 2010

Are Data Mediators Topic Maps?

Filed under: Mapping,Subject Identity — Patrick Durusau @ 8:32 pm

Data mediators are similar to topic maps. They take heterogeneous data and present a common interface to it.

Does that make a data mediator mapping a topic map?

No!

Here is a test to see if a data mediator mapping qualifies as a topic map:

  1. Create two mappings of the same data using mediators that use different vocabularies.
  2. State what basis you would combine components from the two mappings. (Or even the basis for mapping from the data source to the target, but I digress.)

What a data mediator creates is a useful but blind mapping. That is the reason for that mapping is not apparent from the map.

That prevents re-use of the mapping by others. Such as combining it with other mappings.

The reason for the mapping, subject identification in topic map terms, remains in the head of the person who created the mapping.

What happens when they move up, on or simply retire?

The business case question is:

How many times have you paid to have subjects in your information system identified?

Or even better:

How many more times are you going to pay to have subjects in your information system identified?

Is this time going to be the last time? Could be if you were using topic maps.

April 22, 2010

A Blogging Lesson For Topic Maps?

Filed under: Interface Research/Design,Topic Map Software,Usability — Patrick Durusau @ 2:32 pm

As I was posting a blog entry for today, I thought about how blogging swept around the web. Unlike RDF and topic maps.

One difference between blogging and topic maps is that I can type a blog entry and post it.

I have an immediate feeling of accomplishment (whether I have accomplished anything or not).

And, what I have authored is immediately available for me and others to use.

Contrast that with the “hold your left foot in your right hand behind your back with your left eye closed, squinting through an inverted coke bottle while humming Zarathustra” theoretical discussions. (I am a co-author of the reference model so I think I am entitled.)

Or the “developers know best” cult that has shaped discussions to match the oddities and priorities of a developer view of the world.

An emphasis on giving users an immediate sense of accomplishment, with results they can use immediately could lead to a different adoption curve for topic maps.

Neither the theoretical nor developer perspectives on topic maps have had that emphasis.

A Missing Step?

I happened across a guide to study and writing research papers that I had as an undergraduate. Looking back over it, I noticed there is a step in the research process that is missing from search engines. Perhaps by design, perhaps not.

After choosing a topic, you did research, then in a variety of print resources to gather material for the paper. As you gathered it, you wrote down each piece of information on a note card along with the full bibliographic information for the source.

When you were writing a paper, you did not consult the original sources but rather your sub-set of those sources that were on your note cards.

In group research projects, we exchanged note cards so that everyone had access to the same sub-set of materials that we had found.

Bibliographic software mimics the note card based process but my question is why is that capacity missing from search interfaces?

That seems to be a missing step.  I don’t know if it is missing by design, i.e., it is cheaper to let everyone look for the same information over and over, or if it is missing in anticipation of bibliographic software filling the gap.

Search interfaces need to offer ways for us to preserve and share our research results with others.

Topic maps would be a good way to offer that sort of capability.

April 21, 2010

Complex Merging Conditions In XTM

Filed under: Merging,Subject Identifiers,TMDM,Topic Maps — Patrick Durusau @ 6:09 pm

We need a way to merge topics for reasons that are not specified by the TMDM.

For example, I want merge topics that have equivalent occurrences of type ISBN. Library catalogs in different languages may only share the ISBN of an item as a common characteristic. A topic map generated from each of them could have the ISBN as an occurrence on each topic.

I am assuming each topic map relies upon library identifiers for “standard” merging because that is typically how library systems bind the information for a particular item together.

So, how to make merging occur when there are equivalent occurrences of type ISBN?

Solution: As part of the process of creating the topics, add a subject identifier based on the occurrences of type ISBN that results in equivalent subject identifiers when the ISBN numbers are equivalent. That results in topics that share equivalent occurrences of type ISBN merging.

While the illustration is with one occurrence, there is no limit as to the number of properties of a topic that can be considered in the creation of a subject identifier that will result in merging. Such subject identifiers, when resolved, should document the basis for their assignment to a topic.

BTW, assuming a future TMQL that enables such merging, note this technique will work with XTM 1.0 topic map engines.

Caution: This solution does not work for properties that can be determined only after the topic map has been constructed. Such as participation in particular associations or the playing of particular roles.

PS: There is a modification of this technique to deal with participation in associations or the playing of particular roles. More on that in another post.

Interfaces and Topic Maps

Filed under: Interface Research/Design,Search Interface,Searching,Topic Map Software — Patrick Durusau @ 6:02 pm

When I posted the note about Marti Heart’s new book, Search User Interfaces, in Interfaces and Topic Maps I was thinking about it being relevant for software interfaces to topic maps.

After stewing on it for several days and a close read of Chapter 1, I think it has broader application for topic maps.

Topic maps present information about subjects using a single representative for each subject. And those representatives can record properties and associations entered using different identifications.

That sounds like an interface to me. It presents all the considerations of any “interface” in the usual sense of the word. Does it match the intended user’s understanding of the domain? Is the information of interest to the user? Does it help/hinder the user making use of the information?

The Hearst volume is relevant to topic mappers for two reasons:

First, in the conventional sense of the “user interface” to software.

Second, as a guide to exploring how users understand their worlds.

Both are important to keep in mind when constructing topic software as well as topic maps themselves.

April 20, 2010

Data Virtualization

Filed under: Data Integration,Heterogeneous Data,Subject Identity — Patrick Durusau @ 6:47 pm

I ran across a depressing quote today on data virtualization:

But data is distributed, heterogeneous, and often full of errors. Simply federating it insufficient. IT organizations must build a single, accurate, and consistent view of data, and deliver it precisely when it’s needed. Data virtualization needs to take this complexity into account.*

It is very important to have a single view of data for some purposes, but what happens when circumstances change and we need a different view than the one before?

Without explicit identification of subjects, all the IT effort that went into the first data integration project gets repeated in the next data integration project.

You would think that after sixty years of data migration, largely repeating the efforts of prior migrations, even business types would have caught on by this point.

Without explicit identification of subjects, there isn’t any way to “know” what subjects were being identified. Or to create reliable new mappings. So the cycle of data migrations goes on and on.

Break the cycle of data migrations, choose topic maps!

*Look under webinars at: http://www.informatica.com/Pages/index.aspx There wasn’t a direct link that I could post to lead you to the quote.

Lossy Mapping/Modeling

Filed under: Mapping,SQL — Patrick Durusau @ 6:42 pm

As I mentioned in Maps and Territories, relational database theory excludes SQL schemas from the items that can be modeled/mapped by a relational database.

All maps are lossy, but I think we can distinguish between types of loss.

Some losses are voluntary, in the sense that we choose, due to lack of interest, funding, fitness for use, or other reason to exclude some things from a map.

We could in a library catalog, which is a map of the library’s holdings, add the number of words on each page of each item to that map. Or not. But that would be a voluntary choice on our part.

The exclusion of SQL schemas from the mappings possible within the relational database paradigm, strikes me as a different type of loss. That is an involuntary loss that is mandated by the paradigm.

It simply isn’t possible to model an SQL schema in the relational paradigm. Those subjects, the subjects of the schema, are simply off limits to everyone writing an SQL schema.

I mention that because with topic maps, all the losses are voluntary. At least in the sense that the paradigm does not mandate the exclusion of any subjects, although particular legends may.

I think it would be helpful to have a table listing model/mapping systems and what, if anything, they exclude from modeling/mapping.

Suggestions?

April 19, 2010

Why Semantic Technologies Remain Orphans (Lack of Adoption)

Filed under: Data Silos,Heterogeneous Data,Mapping,Semantic Diversity,Topic Maps — Patrick Durusau @ 6:54 pm

In the debate over Data 3.0 (a Manifesto for Platform Agnostic Structured Data) Update 1, Kingsley Idehen has noted the lack of widespread adoption of semantic technologies.

Everyone prefers their own world view. We see some bright, shiny future if everyone else, at their expense, would adopt our view of the world. That hasn’t been persuasive.

And why should it be? What motivation do I have to change how I process/encode my data, in the hopes that if everyone else in my field does the same thing, then at some unknown future point, I will have some unquantifiable advantage over how I process data now?

I am not advocating that everyone adopt XTM syntax or the TMDM as a data model. Just as there are an infinite number of semantics there are an infinite number of ways to map and combine those semantics. I am advocating a disclosed mapping strategy that enables others to make meaningful use of the resulting maps.

Let’s take a concrete case.

The Christmas Day “attack” by a terrorist who set his pants on fire (Christmas Day Attack Highlights US Intelligence Failures) illustrates a failure to share intelligence data.

One strategy, the one most likely to fail, is the development of a common data model for sharing intelligence data. The Guide to Sources of Information for Intelligence Officers, Analysts, and Investigators, Updated gives you a feel for the scope of such a project. (100+ pages listing sources of intelligence data)

A disclosed mapping strategy for the integration of intelligence data would enable agencies to keep their present systems, data structures, interfaces, etc.

Disclosing the basis for mapping, whatever the target (such as RDF), will mean that users can combine the resulting map with other data. Or not. But it will be a meaningful choice. A far saner (and more cost effective) strategy than a common data model.

Semantic diversity is our strength. So why not play to our strength, rather than against it?

Zero-Sum Games and Semantic Technologies

Filed under: Mapping,Maps,Semantic Diversity,Semantic Web,Topic Maps — Patrick Durusau @ 12:39 pm

Kingsley Idehen asked why debates over semantic technologies are always zero-sum games?

I understood him to be asking about RDF vs. Topic Maps but the question could be applied to any two semantic technologies, including RDF vs. his Data 3.0 (a Manifesto for Platform Agnostic Structured Data) Update 1.

This isn’t a new problem but in fact is a very old one.

To take Kingsley’s OR seriously means a user may choose a semantic technology other than mine. Which means it may as well, or at all, with my software. (vendor interest) More importantly, given the lack of commercial interest in semantic technologies, it is a different way of viewing the world. That is, it is different from my way of viewing the world.

That is the linchpin that explains both the zero-sum nature of the debates over upper ontologies to the actual application of semantic technologies.

We prefer our view of the world to that of others.

Note that I said we. Not some of us, not part of the time, not some particular group or class, or any other possible distinction. Everyone, all the time.

That fact, everyone’s preference for their view of the world, underlies the semantic, cultural, linguistic diversity that we encounter day to day. It is a diversity that has persisted, as far as is known, throughout recorded history. There are no known periods without that diversity.

To advocate anyone adopt another view of the world, a view other than their own, even only Kingsley’s OR, means they have a different view than before. That is by definition, a zero-sum game. Either the previous view prevails, or it doesn’t.

I prefer mapping strategies (note I did not say a particular mapping strategy) because it enables diverse views to continue as is and to put the burden of that mapping on those who wish to have additional views.

April 18, 2010

Maps and Territories

Filed under: Maps,Subject Identity — Patrick Durusau @ 6:26 pm

All maps are territories.

The question for comparing SQL (or any other system) to topic maps is:

Can SQL (or other system) recognize one of its own mappings/models as a territory for mapping? If so, how?

I reviewed Chapter 14, “Semantic Modeling,” of C. J. Date’s An Introduction To Database Systems and modeling/mapping there refers to objects in “the real world.”

I take it that Date would exclude SQL schemas as the objects of modeling or mapping with a relational database.

Does anyone have a different impression?

An SQL Example for Michael

Filed under: SQL,TMDM,Topic Maps — Patrick Durusau @ 6:24 pm

Marijane White pointed out the following comment from Michael Sperberg-McQueen asking how topic maps differ from SQL:

The biggest set of open questions remains: how does modeling a collection of information with Topic Maps differ from modeling it using some other approach? Are there things we can do with Topic Maps that we can’t do, or cannot do as easily, with a SQL database? With a fact base in Prolog? With colloquial XML? It might be enlightening to see what the Italian Opera topic map might look like, if we designed a bespoke XML vocabulary for it, or if we poured it into SQL. (I have friends who tell me that SQL is really not suited for the kinds of things you can do with Topic Maps, but so far I haven’t understood what they mean; perhaps a concrete example will make it easier to compare the two.)

From http://cmsmcq.com/mib/?p=810

An SQL example:

firstName lastName
Patrick Durusau

And elsewhere:

givenName surName
Patrick Durusau

An interface could issue separate queries and returns a consolidated result.

Does that equal a topic map? My answer is NO!.

The questions that SQL doesn’t answer (topic maps do):

  • On what basis to map? There are no explicit properties of those subjects on which to make a mapping.
  • What rules should we follow? Because there are no explicit rules even assuming there were properties for these subjects.

Contrast that with (topics in CTM syntax):


http://en.wikipedia.org/wiki/First_name
- "firstName" .


http://en.wikipedia.org/wiki/First_name
- "givenName" .

The Topic Maps Data Model (TMDM) defines the subject identifier property (the URL string you see) and that when subject identifier properties are equal the topics merge.

Different situation from the SQL example.

First, we have a defined property that anyone can look at to judge both the merging (are these really the same two subjects?) as well as to decide if they want to merge their subject representatives with these.

Second, we have a rule by which the mapping/merging occurs. We are no long relying on a blind mapping between the two subject representatives.

Topic maps are a three fold trick: 1) No second class subjects, 2) Explicit properties for identification, 3) Explicit rules for when subject representatives are considered to represent the same subject.

Apologies for the length of this post! But, Michael wanted an example.

Questions?

(I will answer Michael’s questions about XML and Prolog separately.)

April 17, 2010

Data 3.0 Manifesto (Reinventing Topic Maps, Almost)

Filed under: Linked Data,Semantic Web,Topic Maps — Patrick Durusau @ 3:54 pm

I happened across Data 3.0 (a Manifesto for Platform Agnostic Structured Data) Update 1.

Kingsley Idehen says:

  • An “Entity” is the “Referent” of an “Identifier.”
  • An Identifier SHOULD provide an unambiguous and unchanging (though it MAY be opaque!) “Name” for its Referent.
  • A Referent MAY have many Identifiers (Names), but each Identifier MUST have only one Referent. (A Referent MAY be a collective Entity, i.e., a Group or Class.)

Sounds like:

  • A proxy represents a subject
  • A proxy can have one or more identifiers for a subject
  • The identitifiers in a proxy have only one referent, the subject the proxy represents

Not quite a re-invention of topic maps as Kingsley’s proposal misses treating entity representatives, and their components, potentially as entities themselves. That can have identifiers, rules for mapping, etc.

“When you can do that, grasshopper, then you will be a topic map.”

April 16, 2010

Thesis – Sharding the Neo4J Graph DB

Filed under: NoSQL,Topic Map Software — Patrick Durusau @ 12:23 pm

Sharding the Neo4J Graph DB thesis bears watching.

As the size of topic maps increase, so will the performance demands made upon them.

Topincs 4

Filed under: Topic Map Software — Patrick Durusau @ 9:34 am

Robert Cerny has released a new version of Topincs!

Supports domain modeling without programming!

Topincs – General information

Topincs – Video demonstration of new look/functions

Topincs – Modeling with TMCL

Definitely worth a close look!

Big Data and Subject Identity

Filed under: BigData,Topic Map Software,Topic Maps — Patrick Durusau @ 9:11 am

Yesterday I posted on spreading the gospel of topic maps. Then a note about Ken North’s Movement on the Big Data Front shows up in my inbox.

Coincidence? I don’t think so!

It isn’t difficult to imagine linked data with a useful notion of subject identity (as opposed to 303 overhead) and subjects with multiple aliases.

For topic mappers concerned with processing issues, that post led me to:

Sets, Data Models and Data Independence (Part 1)

Laying the Foundation (Part 2)

Information Density, Mathematical Identity, Set Stores and Big Data (Part 3)

Small data sets have subject identity issues. Big Data sets do too! Big Data needs Big Topic Maps!

I saw the Big Data in XML Daily Newslink. Wednesday, 14 April 2010 by Robin Cover. Definitely worth subscribing. newsletter-subscribe@xml.coverpages.org

April 15, 2010

What Is Your TFM (To Find Me) Score?

Filed under: Information Retrieval,Recall,Search Engines,Subject Identity — Patrick Durusau @ 10:54 am

I have talked about TFM (To Find Me) scores before. Take a look at How Can I Find Thee? Let me count the ways… for example.

So, you have looked at your OPAC, database, RDF datastore, topic map. What is your average TMF Score?

What do you think it needs to be for 60 to 80% retrieval?

The Furnas article from 1983 is the key to this series of posts. See the full citation in Are You Designing a 10% Solution?.

Would you believe 15 ways to identify a subject? Or aliases to use the common terminology.

Say it slowly, 15 ways to identify a subject gets on average 60 to 80% retrieval. If you are in the range of 3 – 5 ways to identify a subject on your ecommerce site, you are leaving money on the table. Lots of money on the table.

Want to leave less money on the table? Use topic maps and try for 15 aliases for a subject or more.

Topic Maps Gospel

Filed under: Subject Identifiers,Subject Locators,Topic Maps — Patrick Durusau @ 10:08 am

We are all familiar with the topic maps gospel that emphasizes that subjects can have multiple identifications. And unlike other semantic technologies, we can distinguish between identifiers and locators.

There is no shortage of data integration and other IT projects that would benefit from hearing the topic maps gospel.

So, why hasn’t the gospel of topic maps spread? I suspect it is because semantic integration is only one need among many.

For example, enabling federated, global debate is ok but I need relevant documents for an IRS auditor. Who is waiting for an answer. Can we do that first?

Meeting user needs as the users understand them may explain the success of NetworkedPlanet. They have used topic maps to enhance Sharepoint, something users see a need for.

We need to preserve the semantic integration that defines topic maps but let’s express it in terms of meeting the needs others have articulated. In the context of their projects.

My first target? (First question you should ask when anyone has a call to action.) Next generation library catalog projects. I am creating a list of them now. Will lurk for a while to learn their culture but will be spreading the topic maps gospel.

The conversation will naturally develop to include the treatment of relationships (associations in our speak), roles, and in some cases, interchange of the resulting information (when interchange syntax questions arise).

That sounds like a good way to spread the good news of topic maps to me.

April 14, 2010

Degrees of Separation and Scope

Filed under: Scope,TMDM,Topic Maps — Patrick Durusau @ 7:54 pm

Most people have heard of the Six degrees of separation.

I am wondering how to adapt that to scope?

Reasoning that when I find that a particular author uses the term “list washing” as an alternative way to identify “record linkage,” I should scope that term by that author.

Assuming that author has co-authors, those authors should be used as scopes on that term as well.

That seems straight forward enough but then it occurred to me that anyone who either cites that article or one of those authors, is probably using the same term to identify the same subject. So I need to extend the scope to include those authors as well.

You can see where this is going.

But unlike the usual citation network, this is tracing at a more fine grained level the identification of subjects, which isn’t necessarily co-extensive with citation of an article.

If a legal analogy would help, courts cite prior decisions for all sorts of reasons and being able to identify the ones that are important to your case would save enormous amounts of time. Remembering that even in hard times top firms charge anywhere from $300 to $750/hour, saving time can be important.*

Thinking about it visually, imagine a citation network, those are common enough, but where you can lift out a set of connections based on the usage of a particular term to identify a subject.

Add merging of the different identifications and it starts to sound like a game, with scores, etc., to tease apart citation networks into references to particular subjects, even though the authors use different terminology.

*****

*Public access to legal material projects should note that court opinions exhibit the same behavior. If a court in Case1 cites Case2 for a jurisdictional issue, it is likely that any other case citing Case1 and Case2, is also citing Case2 for a jurisdictional issue. Old law clerk trick. Not always true but true often enough to get a lot of mileage out of one identification of why a case was cited.

April 13, 2010

Linked Data Patterns (Book)

Filed under: Linked Data,RDF — Patrick Durusau @ 8:38 am

Leigh Dodds and Ian Davis have published an early draft of Linked Data Patterns.

I haven’t had time to look at the content but will be commenting on it in later posts.

Update: Other formats:

Linked Data Patterns (PDF)

Linked Data Patterns (EPUB)

Federated Search Blog

Filed under: Federated Search,Searching,Semantic Diversity — Patrick Durusau @ 7:00 am

Topic mappers need to read the Federated Search Blog on a regular basis.

First, “federated search” is how a significant part of the web community talks about gathering up diverse information resources.

Think of it as learning to say basic phrases in a foreign language. It may not be easy but your host will be impressed that you made the effort. Same lesson here.

Second, it has a high percentage of extremely useful resources. Two examples that I found while looking at the site this morning:

Third, we need to avoid being too narrowly focused. Semantic integration needs vary from navigation of known information resources to federation of information resources to integration based on probes of document sets too large for verification (those exist, to be covered in a future post).

Topic maps have something unique to offer those efforts but only if we understand the needs of others in their own terms.

Older Posts »

Powered by WordPress