Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 4, 2010

Understanding web documents using semantic overlays

Filed under: Interface Research/Design,Mapping,Semantic Web — Patrick Durusau @ 5:08 am

Understanding web documents using semantic overlays Authors: Grégoire Burel, Amparo Elizabeth Cano Keywords: semantic overlays, semantic web, web augmentation

Abstract:

The Ozone Browser is a platform independent tool that enables users to visually augment the knowledge presented in a web document in an unobtrusive way. This tool supports the user comprehension of Web documents through the use of Semantic Overlays. This tool uses linked data and lightweight semantics for getting relevant information within a document. The current implementation uses a JavaScript bookmarklet.

The “overlay” nature of this interface attracted my attention. Suspect it would work with “other” sources of page annotation, such as topic maps.

Suspicion only since the project page, http://oak.dcs.shef.ac.uk/sparks/ is a dead link as of 4 October 2010. I have written to the project and will update its status.


Update:

Apologies for the long delay in following up on this entry!

The correct URL, not the one reported in the article: http://nebula.dcs.shef.ac.uk/sparks/ozone.

Now I will have to try to find the time to try the bookmarketlet. Comments if you have already?

September 23, 2010

KP-Lab Knowledge Practices Lab

Filed under: Interface Research/Design,RDF,Semantic Web,Software — Patrick Durusau @ 7:06 am

KP-Lab Knowledge Practices Lab.

KP-Lab project design and implement a modular, flexible, and extensible ICT system that supports pedagogical methods to foster knowledge creation in educational and workplace settings. The system provides tools for collaborative work around shared objects, and for knowledge practices in the various settings addressed by the project.

Offer the following tools:

  • Knowledge Practices Environment (KPE)
  • The Visual Modeling (Language) Editor
  • Activity System Design Tools (ASDT)
  • Semantic Multimedia Annotation tool (SMAT)
  • Map-It and M2T (meeting practices)
  • The CASS-Query tool
  • The CASS-Memo tool
  • Awareness Services
  • RDF Suite
  • KMS-Persistence API
  • Text Mining Services

Pick any one of these tools and name five (5) things you like about it and five (5) things you dislike about it. How would you change the things you dislike? (General prose description is sufficient.)

September 2, 2010

The Matching Web (semantics supplied by users)?

Filed under: Searching,Semantic Diversity,Semantic Web,Semantics — Patrick Durusau @ 10:15 am

Why do we call it ‘The Semantic Web? The web is nothing but a collection of electronic files. Where is the “semantic” in those files? (Even with linking, same question.)

Where was the “semantic” in Egyptian hieroglyphic texts? They had one semantic in the view of Horapollo, Atahanasius Kircher, and others. They have a different semantic in the view of later researchers, Jean-François_Champollion and see the Wikipedia article Egyptian Hieroglyphics.

Same text, different semantics. With the later ones viewed as being “correct.” Yet it would be essential to record the hieroglyphic semantics of Kircher to understand the discussions of his contemporaries and those who relied on his work. One text, multiple semantics.

All our search, reasoning, etc., engines can do is to mechanically apply patterns and return content to us. The returned content has no known “semantic” until we assign it one.  Different users may supply different semantics to the same content.

Perhaps a better name would be “The Matching Web (semantics supplied by users)”.*

******

*Then we could focus on managing the semantics supplied by users. A different task than the one underway in the “Semantic Web” at present.

August 3, 2010

Freebase?

Filed under: Semantic Web,Software — Patrick Durusau @ 6:53 pm

Freebase looks “lite” enough to actually be useful.

The problem of matching up the Freebase “unique” identifiers with the identifiers actually used by people in real communications remains.

Another problem is how to enable people work in a highly distributed fashion in terms of authoring matches between identifiers.

Are you using Freebase?

Comments?

July 8, 2010

Keeping Up With The “Competition”

Filed under: RDF,Semantic Web — Patrick Durusau @ 8:29 pm

New opportunities for linked data nose-following is a blog post from the W3C about three (3) new IETF RFCs.

Well, or at least two of them. As of my 8:55 PM local, 2010-07-08, “Defining Well-Known URIs” has the following URI, http://www.ietf.org/html/draft-nottingham-site-meta-05. Err, that doesn’t look right.

When it didn’t resolve I thought perhaps it was a redirect.

Nothing that complicated, just a bad URI. I got the IETF “404: Page Not Found” page.

Oh, the correct URI? Defining Well-Known URIs, http://www.rfc-editor.org/rfc/rfc5785.txt.

So, what is a well-known URI?

A well-known URI is a URI [RFC3986] whose path component begins with
the characters “/.well-known/”, and whose scheme is “HTTP”, “HTTPS”,
or another scheme that has explicitly been specified to use well-
known URIs.

Applications that wish to mint new well-known URIs MUST register
them, following the procedures in Section 5.1.

Wait for it….

5.1. The Well-Known URI Registry

This document establishes the well-known URI registry.

Well-known URIs are registered on the advice of one or more
Designated Experts (appointed by the IESG or their delegate), with a
Specification Required (using terminology from [RFC5226]). However,
to allow for the allocation of values prior to publication, the
Designated Expert(s) may approve registration once they are satisfied
that such a specification will be published.

Well, that’s a relief! We are going to have Designated Expert(s) sitting in judgment over “well-known” URIs.

We just narrowly escaped being able to judge for ourselves what are URIs worth treating as “well-known” or not.

Good thing we have TBL, the W3C and Designated Experts to keep us safe.

*******
Update: 2010-07-09

I was worried that since the “Defining Well-known URIs” RFC was dated in April that this was some complicated spoof or joke. I even check the cross linking in the RFC but finally erred on saying it was real.

I had that judgment confirmed this morning by learning that the page “went dark” briefly last night and when I checked it this morning, the incorrect URL that I reported above has been corrected, silently.

W3C blog, goes dark, comes back with correct information, all signs that this must be genuine. Or at least it is being reported as such.

July 7, 2010

Second Verse, Same As The First

Filed under: Marketing,RDF,Semantic Diversity,Semantic Web,Semantics — Patrick Durusau @ 2:44 pm

Unraveling Algol: US, Europe, and the Creation of a Programming Language by David Nofre, University of Amsterdam, is an interesting account of the early history of Algol.

The convention wisdom that what evolved was Algol vs. Fortran is deeply questionable.

The underlying difficulty, a familiar one in semantic integration circles, was a universal programming language versus a diversity of programming languages.

Can you guess who won?

Can you guess where I would put my money in a repeat of a universal solution vs. diverse solutions?

Where is your money riding?

July 5, 2010

Closed World vs. Open World: the First Semantic Web Battle – From Stefano’s Linotype

Filed under: OWL,RDF,Semantic Web — Patrick Durusau @ 7:20 pm

Closed World vs. Open World: the First Semantic Web Battle from Stefano’s Linotype is well worth your time.

See also Stack or Two Towers. Seems like one universal world view may not be a robust as previously thought.

Interesting that non-universal treatment of “doubt” may split the Semantic Web into incompatible parts. Can you say fragile?.

June 4, 2010

representing scientific discourse, or: why triples are not enough

Filed under: Classification,Indexing,Information Retrieval,Ontology,RDF,Semantic Web — Patrick Durusau @ 4:15 pm

representing scientific discourse, or: why triples are not enough by Anita de Waard, Disruptive Technologies Director (how is that for a cool title?), Elsevier Labs, merits a long look.

I won’t spoil the effect by trying to summarize the presentation.  It is only 23 slides long.

Read those slides carefully and then get yourself to: Rhetorical Document Structure Group HCLS IG W3C. Read, discuss, contribute.

PS: Based on this slide pack I am seriously thinking of getting a Twitter account so I can follow Anita. Not saying I will but am as tempted as I have ever been. This looks very interesting. Fertile ground for discussion of topic maps.

Tinkerpop

Filed under: Graphs,NoSQL,Semantic Web,Software — Patrick Durusau @ 3:58 pm

Tinkerpop is worth a visit, whether you are into graph software (its focus) or not.

Home for:

Pipes: A Data Flow Framework Using Process Graphs

reXster: A Graph Based Ranking Engine

Blueprints (…collection of interfaces and implementations to common, complex data structures.)

Project Gargamel: Distributed Graph Computing

Gremlin: A Graph Based Programming Language

Twitlogic: Real Time #SemanticWeb in <= 140 Chars

Ripple: Semantic Web Scripting Language

LoPSideD: Implementing The Linked Process Protocol

I do not think it means what you think it means

Filed under: Ontology,OWL,RDF,Semantic Web,Software — Patrick Durusau @ 4:30 am

I do not think it means what you think it means by Taylor Cowan is a deeply amusing take on Pellet, an OWL 2 Reasoner for Java.

I particularly liked the line:

I believe the semantic web community is falling into the same trap that the AI community fell into, which is to grossly underestimate the meaning of “reason”. As Inigo Montoya says in the Princess Bride, “You keep using that word. I do not think it means what you think it means.”

(For an extra 5 points, what is the word?)

Taylor’s point that Pellet will underscore unstated assumptions in an ontology and make sure that your ontology is consistent is a good one. If you are writing an ontology to support inferences that is a good thing.

Topic maps can support “consistent” ontologies but I find encouragement in their support for how people actually view the world as well. That some people “logically” infer from Boeing 767 -> “means of transportation” should not prevent me from capturing that some people “logically” infer -> “air-to-ground weapon.”

A formal reasoning system could be extended to include that case, but can that be done as soon as an analyst has that insight or must it be carefully crafted and tested to fit into a reasoning system when “the lights are blinking red?”

May 31, 2010

Authoritative Identifications?

Filed under: Semantic Web,Subject Identity — Patrick Durusau @ 3:10 pm

Sam Hunting reminded me that if a method of identification becomes authoritative, that can lead to massive loss of data (prior methods of identification). We were discussing the Semantic Web Challenge. That assumes systems that do not support multiple “authoritative” and alternative identifications.

While I can understand the concern, I think it is largely unwarranted.

Natural language and consequently identification have been taking care of themselves in the face of “planned” language proposals for centuries. According to Klaus Schubert in the introduction to: Interlinguistics: Aspects of the Science of Planned Languages, Berlin: Mouton de Gruyter, 1989, there are almost 1,000 such projects, most since the second half of the 19th century. I suspect the count was too low by the time it was published.

The welter of identifications has continued merrily along for more than the last 20 years so I don’t feel like we are in any imminent danger of uniformity.

And, as a practical matter, more that a Billion speakers of Chinese, Japanese and Korean are bringing their concerns and identifications of subjects to the WWW in a way that will be hard to ignore. (Nor should they be.)

Systems that support multiple authoritative and alternative identifications will be the future of the WWW.

PS:The use of owl:sameAs is a pale glimmer of what needs to be possible for reliable mappings of identifications. The reason for any mapping remains unknown.

Semantic Web Challenge

The Semantic Web Challenge 2010 details landed in my inbox this morning. My first reaction was to refine my spam filter. 😉 Just teasing. My second and more considered reaction was to think about the “challenge” in terms of topic maps.

Particularly because a posting from the Ontology Alignment Evaluation Initiative arrived the same day, in response to a posting from sameas.org.

I freely grant that URIs that cannot distinguish between identifiers and resources without 303 overhead are poor design. But the fact remains that there are many data sets, representing large numbers of subjects that have even poorer subject identification practices. And there are no known approaches that are going to result in the conversion of those data sets.

Personally I am unwilling to wait until some new “perfect” language for data sweeps the planet and results in all data being converted into the “perfect” format. Anyone who thinks that is going to happen needs to stand with the end-of-the-world-in-2012 crowd. They have a lot in common. Magical thinking being one common trait.

The question for topic mappers to answer is how do we attribute to whatever data language we are confronting, characteristics that will enable us to reliably merge information about subjects in that format either with other information in the same or another data language? Understanding that the necessary characteristics may vary from data language to data language.

Take the lack of a distinction between identifier and resource in the Semantic Web for instance. One easy step towards making use of such data would be to attribute to each URI the status of either being an identifier or a resource. I suspect, but cannot say, that the authors/users of those URIs know the answer to that question. It seems even possible that some sets of such URIs are all identifiers and if so marked/indicated in some fashion, they automatically become useful as just that, identifiers (without 303 overhead).

As identifiers they may lack the resolution that topic maps provide to the human user, which enables them to better understand what subject is being identified. But, since topic maps can map additional identifiers together, when you encounter a deficient identifier, simply create another one for the same subject and map them together.

I think we need to view the Semantic Web data sets as opportunities to demonstrate how understanding subject identity, however that is indicated, is the linchpin to meaningful integration of data about subjects.

Bearing in mind that all our identifications, Semantic Web, topic map or otherwise, are always local, provisional and subject to improvement, in the eye of another.

May 27, 2010

Blast From The Past

Filed under: Humor,Ontology,Semantic Web — Patrick Durusau @ 7:27 pm

Can you place the following quote?

…my invention uses reason in its entirety and it, in addition, a judge of controversies, an interpreter of notions, a balance of probabilities, a compass which will guide us over the ocean of experiences, an inventory of things, a table of thoughts, a microscope for scrutinizing present things, a telescope for predicting distant things, a general calculus, an innocent magic, a non-chimerical cabal, a script which all will read in their own language; and even a language which one will be able to learn in a few weeks, and which will be soon accepted amidst the world. And which will lead the way for the true religion everywhere it goes.

I have to admit when I first read the part about “…one will be able to learn it in a few week,…” I was thinking about John Sowa and one of his various proposals (some say perversions) of natural language.

Then I got to the part about “…the way for the true religion…” and realized that this was probably either a fundamentalist quote (you pick the tradition) or from an earlier time.

Curious? It was Leibniz, Letter to Duke of Hanover, 1679. Quoted in The Search For The Perfect Language by Umberto Eco. More on the book in later posts.

April 19, 2010

Zero-Sum Games and Semantic Technologies

Filed under: Mapping,Maps,Semantic Diversity,Semantic Web,Topic Maps — Patrick Durusau @ 12:39 pm

Kingsley Idehen asked why debates over semantic technologies are always zero-sum games?

I understood him to be asking about RDF vs. Topic Maps but the question could be applied to any two semantic technologies, including RDF vs. his Data 3.0 (a Manifesto for Platform Agnostic Structured Data) Update 1.

This isn’t a new problem but in fact is a very old one.

To take Kingsley’s OR seriously means a user may choose a semantic technology other than mine. Which means it may as well, or at all, with my software. (vendor interest) More importantly, given the lack of commercial interest in semantic technologies, it is a different way of viewing the world. That is, it is different from my way of viewing the world.

That is the linchpin that explains both the zero-sum nature of the debates over upper ontologies to the actual application of semantic technologies.

We prefer our view of the world to that of others.

Note that I said we. Not some of us, not part of the time, not some particular group or class, or any other possible distinction. Everyone, all the time.

That fact, everyone’s preference for their view of the world, underlies the semantic, cultural, linguistic diversity that we encounter day to day. It is a diversity that has persisted, as far as is known, throughout recorded history. There are no known periods without that diversity.

To advocate anyone adopt another view of the world, a view other than their own, even only Kingsley’s OR, means they have a different view than before. That is by definition, a zero-sum game. Either the previous view prevails, or it doesn’t.

I prefer mapping strategies (note I did not say a particular mapping strategy) because it enables diverse views to continue as is and to put the burden of that mapping on those who wish to have additional views.

April 17, 2010

Data 3.0 Manifesto (Reinventing Topic Maps, Almost)

Filed under: Linked Data,Semantic Web,Topic Maps — Patrick Durusau @ 3:54 pm

I happened across Data 3.0 (a Manifesto for Platform Agnostic Structured Data) Update 1.

Kingsley Idehen says:

  • An “Entity” is the “Referent” of an “Identifier.”
  • An Identifier SHOULD provide an unambiguous and unchanging (though it MAY be opaque!) “Name” for its Referent.
  • A Referent MAY have many Identifiers (Names), but each Identifier MUST have only one Referent. (A Referent MAY be a collective Entity, i.e., a Group or Class.)

Sounds like:

  • A proxy represents a subject
  • A proxy can have one or more identifiers for a subject
  • The identitifiers in a proxy have only one referent, the subject the proxy represents

Not quite a re-invention of topic maps as Kingsley’s proposal misses treating entity representatives, and their components, potentially as entities themselves. That can have identifiers, rules for mapping, etc.

“When you can do that, grasshopper, then you will be a topic map.”

March 31, 2010

One Billion Points of Failure!

Filed under: RDF,Semantic Web,Topic Maps — Patrick Durusau @ 9:14 pm

In No 303’s for Topic Maps? I mentioned that distinguishing between identifiers and addresses with 303’s has architectural implications.

The most obvious one is the additional traffic that 303 traffic is going to add to the Web.

Another concern is voiced in the Cool URIs for the Semantic Web document when it says:

Content negotiation, with all its details, is fairly complex, but it is a powerful way of choosing the best variant for mixed-mode clients that can deal with HTML and RDF.

Great, more traffic, it isn’t going to be easy to implement, what else could be wrong?

It is missing the one feature that made the Web a successful hypertext system when more complex systems failed. The localization of failure is missing from the Semantic Web.

If you follow a link and a 404 is returned, then what? Failure is localized because your document is still valuable. It can be processed just like before.

What if you need to know if a URL is an identifier for “people, products, places, ideas and concepts such as ontology classes”? If the 303 fails, you don’t get that information.

It is important enough information for the W3C to invent ways to fix the failure of RDF to distinguish between identifiers and resource addresses.

But the 303 fix puts you at the mercy of an unreliable network, unreliable software and unreliable users.

With triples relying on other triples, failure cascades. The system has one billion points of potential failure, the reported number of triples.

The Semantic Web only works if our admittedly imperfect systems, built and maintained by imperfect people, running over imperfect networks, don’t fail, maybe. I would rather take my chances with a technology that works for imperfect users, that would be us. The technology would be topic maps.

March 29, 2010

No 303’s for Topic Maps?

Filed under: RDF,Semantic Web,Topic Maps — Patrick Durusau @ 7:34 pm

I was puzzled that articles on 303’s, such as Cool URIs for the Semantic Web never mention topic maps. Then I remembered, topic maps don’t need 303’s!

Topic maps distinguish between URIs used as identifiers and URIs which are the addresses of resources.

Even if the Internet is down, a topic map can distinguish between an identifier and the address of a resource.

Topic maps use the URIs identified as identifiers to determine if topics are representing the same subjects.

Even if the Internet is down, a topic map can continue to use those identifiers for comparison purposes.

Topic maps use the URIs identified as subject locators to determine if topics are representing the same resource as a subject.

Even if the Internet is down, a topic map can continue to use those subject locators for comparison purposes.

You know what they say: Once is coincidence, twice is happenstance, three times is an engineering principle.

The engineering principle? And its consequences? Keep watching this space, I want to massage it a bit before posting.

*****

Techies see: kill -9 ‘/dev/cat’ (Robert Barta, one of my co-editors).

Non-Techies/Techies see: Topic Maps Lab.

Spec readers: XTM (syntax), Topic Maps Data Model.

Not all there is so say about topic maps but you have to start somewhere.

*****

Apologies! News on CTM (Compact Topic Map syntax) most likely tomorrow. Apologies for the delay.

« Newer Posts

Powered by WordPress