Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

June 9, 2010

Motivations For Data Integration

Filed under: Data Integration,Marketing — Patrick Durusau @ 8:40 am

Talend Reference Library offers collections of case studies and white papers to make the case for data integration.

I can’t say that I care for some of the solutions that are proffered but I am aware that having a hammer (topic maps) doesn’t mean everything I see is a nail. 😉

You do have to submit contact information to download the papers.

The papers are useful as guides on making the case for data integration (read topic maps) to management level personnel. Not too much on the technical side and always keeping a focus on issues of concern to them, costs, customer satisfaction, missed opportunities, etc.

Save the “cool” stuff for when you meet with the geeks in the IT department, after you have the contract.

June 8, 2010

Semantic Overlay Networks

GridVine: Building Internet-Scale Semantic Overlay Networks sounds like they are dealing with topic map like issues to me. You be the judge:

This paper addresses the problem of building scalable semantic overlay networks. Our approach follows the principle of data independence by separating a logical layer, the semantic overlay for managing and mapping data and metadata schemas, from a physical layer consisting of a structured peer-to-peer overlay network for efficient routing of messages. The physical layer is used to implement various functions at the logical layer, including attribute-based search, schema management and schema mapping management. The separation of a physical from a logical layer allows us to process logical operations in the semantic overlay using different physical execution strategies. In particular we identify iterative and recursive strategies for the traversal of semantic overlay networks as two important alternatives. At the logical layer we support semantic interoperability through schema inheritance and semantic gossiping. Thus our system provides a complete solution to the implementation of semantic overlay networks supporting both scalability and interoperability.

The concept of “semantic gossiping” enables semantic similarity to be established the combination of local mappings, that is by adding the mappings together. (Similar to the set behavior of subject identifiers/locators in the TMDM. That is to say if you merge two topic maps, any additional subject identifiers, previously unknown to the first topic map, with enable those topics to merge with topics in later merges where previously they may not have.)

Open Question: If everyone concedes that:

  • we live in a heterogeneous world
  • we have stored vast amounts of heterogeneous data
  • we are going to continue to create/store even vaster amounts of heterogeneous data
  • we keep maintaining and creating more heterogeneous data structures to store our heterogeneous data

If every starting point is heterogeneous, shouldn’t heterogeneous solutions be the goal?

Such as supporting heterogeneous mapping technologies? (Granting there will also be a limit to those supported at any one time but it should be possible to extend to embrace others.)

Author Bibliographies:

Karl Aberer

Phillipe Cudré-Mauroux

Manfred Hauswirth

Tim Van Pelt

June 7, 2010

Datasets Galore! (Data.gov)

Filed under: Data Integration,Data Source,Linked Data,Subject Identity,Topic Maps — Patrick Durusau @ 9:56 am

Data.gov hosts 272,677 datasets.

LinkingOpenData will point you to a 400 subset that is available as “Linked Data.”

I guess that means that the other 272,277 datasets are not “Linked Data.”

Fertile ground for topic maps.

Topic Maps don’t limit users to “[u]se URIs as names for things.” (Linked Data)

A topic map can use the identifiers that are in place in one or more of the 272,277 datasets and create mappings to one or more of the 400 datasets in “Linked Data.”

Without creating “Linked Data” or the overhead of the “303 Cloud.”

Which datasets look the most promising to you?

The Value of Indexing

Filed under: Citation Indexing,Indexing,Subject Identity — Patrick Durusau @ 8:46 am

The Value of Indexing (2001) by Jan Sykes is a promotion piece for Factiva, a Dow Jones and Reuters Company, but is also a good overview of the value of indexing.

I find it interesting in its description of the use of a taxonomy for indexing purposes. You may remember from reading a print index the use of the term “see also.” This paper appears to argue that the indexing process consists of mapping one or more terms to a single term in the controlled vocabulary.

A single entry from the controlled vocabulary represents a particular concept no matter how it was referred to in the original article. (page 5)

I assume the mapping between the terms in the article and the term in the controlled vocabulary is documented. That mapping maybe of more interest to the professionals who create the indexes and power users than the typical user.

Perhaps that is a lesson in terms of what is presented to users of topic maps.

Delivery of the information a user wants/needs in their context is more important than demonstrating our cleverness.

That was one of the mistakes in promoting markup, too much emphasis on the cool, new, paradigm shifting and too little emphasis on the benefit to users. With office products that use markup in a non-visible manner to the average user, markup usage has spread rapidly around the world.

Suggestions on how to make that happen for topic maps?

PS: Obviously this is an old piece so in fairness I am contacting Factiva to advise them of this post and to ask if they have an updated paper, etc. that they might want me to post. I will take the opportunity to plug topic maps as well. 😉

June 6, 2010

Citation Indexing – Semantic Diversity – Exercise

Filed under: Citation Indexing,Exercises,Indexing,Semantic Diversity — Patrick Durusau @ 10:48 am

In A Conceptual View of Citation Indexing, which is chapter 1 of Citation Indexing — Its Theory and Application in Science, Technology, and Humanities (1979), Garfield says of the problem of changing terminology and semantics:

Citations, used as indexing statements, provide these lost measures of search simplicity, productivity, and efficiency by avoiding the semantics problems. For example, suppose you want information on the physics of simple fluids. The simple citation “Fisher, M.E., Math. Phys., 5,944, 1964” would lead the searcher directly to a list of papers that have cited this important paper on the subject. Experience has shown that a significant percentage of the citing papers are likely to be relevant. There is no need for the searcher to decide which subject terms an indexer would be most likely to use to describe the relevant papers. The language habits of the searcher would not affect the search results, nor would any changes in scientific terminology that took place since the Fisher paper was published.

In other words, the citation is a precise, unambiguous representation of a subject that requires no interpretation and is immune to changes in terminology. In addition, the citation will retain its precision over time. It also can be used in documents written in different languages. The importance of this semantic stability and precision to the search process is best demonstrated by a series of examples.

Question: What subject does a citation represent?

Question: What “precision” does the citation retain over time?

Exercise: Select any article that interests you with more than twenty (20) non-self citations. Identify ten (10) ideas in the article and examine at least twenty (20) citing articles. Why was your article cited? Was your article cited for an idea you identified? Was your article cited for an idea you did not identify? (Either one is correct. This is not a test of guessing why an article will be cited. It is exploration of a problem space. Your fact finding is important.)

Extra credit: Did you notice any evidence to support or contradict the notion that citation indexing avoids the issue of semantic diversity? If your article has been cited for more than ten (10) years, try one or two citations per year for every year it is cited. Again, your factual observations are important.

Citation Indexing

Eugene Garfield’s homepage may not be familiar to topic map fans but it should be.

Garfield invented citation indexing in the late 1950’s/early 1960’s.

Among the treasures you will find here:

June 5, 2010

@patrickDurusau

Filed under: Topic Maps — Patrick Durusau @ 8:01 am

I have taken the plunge and now have a Twitter account, @patrickDurusau

I persist in not having a cellphone so no topic map “tweets” as I go “Krogering” (I am informed by my daughter that means shopping at the local grocery store).

Mostly citations of articles, books, websites, posts relevant to topic maps.

PS: I accept books for review! patrick@durusau.net

The Future of the Journal

Filed under: Uncategorized — Patrick Durusau @ 6:21 am

The Future of the Journal is another slide deck by Anita de Waard that reads like a promotional piece for topic maps, sans any mention of topic maps.

While Anita makes a strong case for annotation of data in science publishing, the same is true for government, legal, environmental, business, finance, etc., publications. All publications are as complex as depicted on these slides. It isn’t as obvious in the humanities because that “data” has been locked away so long that we have forgotten it is there.

The more complex the information we record, via “annotations” or some other mechanism, the greater the need for librarians to organize it and help us find it. Self-help in research is like the guy about to do a self-appendectomy with his doctor’s advice over the phone. Doable, maybe, but the results are pretty much what you would expect.

Rather than future of the journal, I would say: Future of Information.

June 4, 2010

representing scientific discourse, or: why triples are not enough

Filed under: Classification,Indexing,Information Retrieval,Ontology,RDF,Semantic Web — Patrick Durusau @ 4:15 pm

representing scientific discourse, or: why triples are not enough by Anita de Waard, Disruptive Technologies Director (how is that for a cool title?), Elsevier Labs, merits a long look.

I won’t spoil the effect by trying to summarize the presentation.  It is only 23 slides long.

Read those slides carefully and then get yourself to: Rhetorical Document Structure Group HCLS IG W3C. Read, discuss, contribute.

PS: Based on this slide pack I am seriously thinking of getting a Twitter account so I can follow Anita. Not saying I will but am as tempted as I have ever been. This looks very interesting. Fertile ground for discussion of topic maps.

Tinkerpop

Filed under: Graphs,NoSQL,Semantic Web,Software — Patrick Durusau @ 3:58 pm

Tinkerpop is worth a visit, whether you are into graph software (its focus) or not.

Home for:

Pipes: A Data Flow Framework Using Process Graphs

reXster: A Graph Based Ranking Engine

Blueprints (…collection of interfaces and implementations to common, complex data structures.)

Project Gargamel: Distributed Graph Computing

Gremlin: A Graph Based Programming Language

Twitlogic: Real Time #SemanticWeb in <= 140 Chars

Ripple: Semantic Web Scripting Language

LoPSideD: Implementing The Linked Process Protocol

Hadoop-HBase-Lucene-Mahout-Nutch-Solr Digests

Filed under: Indexing,MapReduce,Search Engines,Software — Patrick Durusau @ 5:40 am

More interests than time?

Digests of developments in May 2010:

Hadoop

HBase

Lucene

Mahout

Nutch

Solr

Suggestions of other digest type sources and/or comments on such sources deeply appreciated.

I do not think it means what you think it means

Filed under: Ontology,OWL,RDF,Semantic Web,Software — Patrick Durusau @ 4:30 am

I do not think it means what you think it means by Taylor Cowan is a deeply amusing take on Pellet, an OWL 2 Reasoner for Java.

I particularly liked the line:

I believe the semantic web community is falling into the same trap that the AI community fell into, which is to grossly underestimate the meaning of “reason”. As Inigo Montoya says in the Princess Bride, “You keep using that word. I do not think it means what you think it means.”

(For an extra 5 points, what is the word?)

Taylor’s point that Pellet will underscore unstated assumptions in an ontology and make sure that your ontology is consistent is a good one. If you are writing an ontology to support inferences that is a good thing.

Topic maps can support “consistent” ontologies but I find encouragement in their support for how people actually view the world as well. That some people “logically” infer from Boeing 767 -> “means of transportation” should not prevent me from capturing that some people “logically” infer -> “air-to-ground weapon.”

A formal reasoning system could be extended to include that case, but can that be done as soon as an analyst has that insight or must it be carefully crafted and tested to fit into a reasoning system when “the lights are blinking red?”

June 3, 2010

Connecting The Dots

Filed under: Subject Identity,Topic Maps — Patrick Durusau @ 2:20 pm

I have listened to and tried to help refine marketing for topic maps. The one possible slogan is that topic maps make vendor X’s software suck less. Hardly a ringing endorsement of topic maps. 😉

There is the venerable “connecting the dots” theme, but I can connect dots with a pen and one of those puzzle books they sell at the airports. I don’t need a topic map to connect dots. Besides, I am the one who does the connecting of the dots, I just use a topic map to write my connecting of the dots down.

Maybe that is part of the answer.

Topic maps give us a way to write down our connecting of the dots. I can’t think of any search engine that allows you to store your connecting of any dots you find. True enough, applications like Talend help you write down your mapping of dots from one data source to another. But with one important difference from topic maps.

You can’t share your dots or their connections with others. Not and expect them to make sense to anyone else. It is the original topic map dilemma. No one knows what dots you have identified or connected and you don’t have any way to tell them.

With topic maps you can identify your dots, say how they are connected, and share them with others.

That sounds pretty close to being an elevator speech to me. Suggestions?

PS: I like the idea of connecting dots that can later be extended by others. Remember the original mapping European mapping expeditions in Africa or South America? They were all partial and all later extended by others. If that were to happen today, the argument would be how to best map the entire territory all at once. Which is doable, but only with omitting a lot of detail, such as meeting the actual residents.

Think of “exploring” one of the document archives that Jason Baron maintains at the U.S. National Archives and Records Administration and connecting a set of dots, that are later extended or perhaps merged with dots identified and connected by others. Eventually, with enough people connecting the dots, the “dark” areas become fewer and fewer. Not unlike what news reporters, lawyers and researchers do now, with the exception that the connected dots become useful to others. Collaborative discovery anyone?

June 2, 2010

Restful Interface to Topic Maps

Filed under: Software,Topic Map Software,Topic Maps — Patrick Durusau @ 7:20 pm

ULISSE (USOCs KnowLedge Integration and dissemination for Space Science and Exploration) is a research project to…

describe space experiments and their results using Topic Maps. This allows us to create a knowledge base with innovative navigation, filtering and querying capabilities for the project. We chose Ontopia as the topic maps engine to power this knowledge base.

While designing the overall ULISSE system, we identified the need for a RESTful web interface to Ontopia. Currently, we have been designing this interface internally and plan to start implementing it during the summer at which point we will also make it available as open source under the same license as Ontopia and hopefully/ideally as a part of Ontopia.

We approached the design of this REST interface as a generic interface for accessing a topic maps engine and it is not Ontopia- or ULISSE-specific. It could conceivably be implemented over any Topic Maps engine. (David Damen, 2 June 2010, Time to put Topic Maps to REST?)

Further details are available as a Google doc, http://bit.ly/9NEP2x.

You might also want to consider subscribing to TopicMapMail to follow this and other topic map related discussions.

******
Update: 3 June 2010

I was reminded of Robert Barta’s 2005 presentation at Extreme Markup, TMIP, A RESTful Topic Maps Interaction Protocol, Extreme Conference archive copy. Includes performance analysis.

And Robert points to the specification TMIP, Topic Map Interaction Protocol 0.3, Specification.

Visualizing Topic Maps

Filed under: Interface Research/Design,Topic Map Software,Topic Maps — Patrick Durusau @ 3:57 pm

Robert Barta forwarded a link to Protovis: A Graphical Approach to Visualization.

Protovis composes custom views of data with simple marks such as bars and dots. Unlike low-level graphics libraries that quickly become tedious for visualization, Protovis defines marks through dynamic properties that encode data, allowing inheritance, scales and layouts to simplify construction.

Protovis (Javascript + SVG) + topicmap backend + your imagination = The Next Big Thing In Topic Maps?

Examples include maps and a number of other starting points for visualization of data sets.

Topic Map: An Ontology Framework for Information Retrieval

Filed under: Examples,TMDM,Topic Maps — Patrick Durusau @ 6:06 am

Topic Maps Lab reports Topic Map: An Ontology Framework for Information Retrieval, a presentation by Rajkumar Kannan, at the National Conference on Advances in Knowledge Management 2010. (National Conference on Advances in Knowledge Management(NCAKM’10), pp195-198, March 2010, India.)

Nothing novel for long time topic map advocates but a place for others to start learning about topic maps.

Which reminds me, I need to return to the non-standard/technical introduction to topic maps. Will try to post the first installment, without illustrations (still looking for an illustrator) later this week.

June 1, 2010

Topincs 4.3.0 Released

Filed under: Topic Map Software — Patrick Durusau @ 9:11 am

Topic Maps Snippets reports that Robert Cerny’s Topincs 4.3.0 has been released!

Enhancing navigation in biomedical databases by community voting and database-driven text classification

Enhancing navigation in biomedical databases by community voting and database-driven text classification demonstrates improvement of automatic classification of literature by harnessing community knowledge.

From the authors:

Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled by the database, scales well with concurrent change events, and can be adapted to add text classification capability to other biomedical databases.

The system can be seen at: PepBank.

You need to read the article in full to appreciate what the authors have done but a couple of quick points to notice:

1) The use of heat maps to assist users in determining the relevance of a given abstract. (Domain specific facts.)

2) The user interface allows yes/no voting on the same facts as appear in the heat map.

Voting results in reclassification of the entries.

Equally important is a user interface that enables immediate evaluation of relevance and, quick user feedback on relevance.

The user is not asked a series of questions, given complex rating choices, etc., it is yes or no. That may seem coarse but the project demonstrates with proper design, that can be very useful.

« Newer Posts

Powered by WordPress