Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 14, 2013

DRM/WWW, Wealth/Salvation: Theological Parallels

Filed under: DRM,RDF,Semantic Web — Patrick Durusau @ 7:38 pm

Cory Doctorow misses a teaching moment in his: What I wish Tim Berners-Lee understood about DRM.

Cory says:

Whenever Berners-Lee tells the story of the Web’s inception, he stresses that he was able to invent the Web without getting any permission. He uses this as a parable to explain the importance of an open and neutral Internet.

The “…without getting any permission” was a principle for Tim Berners-Lee when he was inventing the Web.

A principle then, not now.

Evidence? The fundamentals of RDF have been mired in the same model for fourteen (14) years. Impeding the evolution of the “Semantic” Web. Whatever its merits.

Another example? HTML5 violates prior definitions of URL in order to widen the reach of HTML5. (URL Homonym Problem: A Topic Map Solution)

Same “principle” as DRM support, expanding the label of “WWW” beyond what early supporters would recognize as the WWW.

HTML5 rewriting of URL and DRM support are membership building exercises.

The teaching moment comes from early Christian history.

You may (or may not) recall the parable of the rich young ruler (Matthew 19:16-30), where a rich young man asks Jesus what he must do to be saved?

Jesus replies:

One thing you still lack. Sell all that you have and distribute to the poor, and you will have treasure in heaven; and come, follow me.

And for the first hundred or more years of Christianity, so far as can be known, that rule, divesting yourself of property was followed.

Until, Clement of Alexandria. Clement took the position that indeed the rich could retain their goods, so long as they used it charitably. (Now there’s a loophole!)

Created two paths to salvation, one for anyone foolish enough to take the Bible at its word and another for anyone would wanted to call themselves Christians, without any inconvenience or discomfort.

Following Clement of Alexandria, Tim Berners-Lee is creating two paths to the WWW.

One for people who are foolish enough to innovate and share information, the innovation model of the WWW that Cory speaks so highly of.

Another path for people (DRM crowd) who neither spin nor toil but who want to burden everyone who does.

Membership as a principle isn’t surprising considering how TBL sees himself in the mirror:

TBL as WWW Pope

March 13, 2013

Aaron Swartz’s A Programmable Web: An Unfinished Work

Filed under: Semantic Web,Semantics,WWW — Patrick Durusau @ 3:04 pm

Aaron Swartz’s A Programmable Web: An Unfinished Work

Abstract:

This short work is the first draft of a book manuscript by Aaron Swartz written for the series “Synthesis Lectures on the Semantic Web” at the invitation of its editor, James Hendler. Unfortunately, the book wasn’t completed before Aaron’s death in January 2013. As a tribute, the editor and publisher are publishing the work digitally without cost.

From the author’s introduction:

” . . . we will begin by trying to understand the architecture of the Web — what it got right and, occasionally, what it got wrong, but most importantly why it is the way it is. We will learn how it allows both users and search engines to co-exist peacefully while supporting everything from photo-sharing to financial transactions.

We will continue by considering what it means to build a program on top of the Web — how to write software that both fairly serves its immediate users as well as the developers who want to build on top of it. Too often, an API is bolted on top of an existing application, as an afterthought or a completely separate piece. But, as we’ll see, when a web application is designed properly, APIs naturally grow out of it and require little effort to maintain.

Then we’ll look into what it means for your application to be not just another tool for people and software to use, but part of the ecology — a section of the programmable web. This means exposing your data to be queried and copied and integrated, even without explicit permission, into the larger software ecosystem, while protecting users’ freedom.

Finally, we’ll close with a discussion of that much-maligned phrase, ‘the Semantic Web,’ and try to understand what it would really mean.”

Table of Contents: Introduction: A Programmable Web / Building for Users: Designing URLs / Building for Search Engines: Following REST / Building for Choice: Allowing Import and Export / Building a Platform: Providing APIs / Building a Database: Queries and Dumps / Building for Freedom: Open Data, Open Source / Conclusion: A Semantic Web?

Even if you disagree with Aaron, on issues both large and small, as I do, it is a very worthwhile read.

But I will save my disagreements for another day. Enjoy the read!

February 26, 2013

Simple Web Semantics: Multiple Dictionaries

Filed under: Semantic Web,Semantics,Simple Web Semantics — Patrick Durusau @ 2:06 pm

When I last posted about Simple Web Semantics, my suggested syntax was:

Simple Web Semantics (SWS) – Syntax Refinement

While you can use any one of multiple dictionaries for the URI in an <a> element, that requires manual editing of the source HTML.

Here is an improvement on that idea:

The content of the content attribute on a meta element with a name attribute with the value “dictionary” is one or more “URLs” (in the HTML 5 sense), if more than one, the “URLs” are separated by whitespace.

The content of the dictionary attribute on an a element is one or more “URLs” (in the HTML 5 sense), if more than one, the “URLs” are separated by whitespace.

Thinking that enables authors of content to give users choices as to which dictionaries to use with particular “URLs.”

For example, a popular account of a science experiment could use the term, H2O and have a dictionary entry pointing to: http://upload.wikimedia.org/wikipedia/commons/thumb/c/c2/SnowflakesWilsonBentley.jpg/220px-SnowflakesWilsonBentley.jpg, which produces this image:

snowflakes

Which would be a great illustration for a primary school class about a form of H2O.

On the other hand, another dictionary entry for the same URL might point to: http://upload.wikimedia.org/wikipedia/commons/thumb/0/03/Liquid-water-and-ice.png/220px-Liquid-water-and-ice.png, which produces this image:

ice structure

Which would be more appropriate for a secondary school class.

Writing this for an inline <a> element, I would write:

<a href="http://en.wikipedia.org/wiki/Water" dictionary="http://upload.wikimedia.org/wikipedia/commons/
thumb/c/c2/SnowflakesWilsonBentley.jpg/220px-SnowflakesWilsonBentley.jpg http://upload.wikimedia.org/wikipedia/commons/
thumb/0/03/Liquid-water-and-ice.png/220px-Liquid-water-and-ice.png">H2O</a>

The use of a “URL” and images all from Wikipedia is just convenience for this example. Dictionary entries are not tied to the “URL” in the href attribute.

That presumes some ability on the part of the dictionary server to respond with meaningful information to display to a user who must choose between two dictionaries.

Enabling users to have multiple sources of additional information at their command versus the simplicity of a single dictionary, seems like a good choice.

Nothing prohibits a script writer from enabling users to insert their own dictionary preferences either for the document as a whole or for individual <a> elements.

If you missed my series on Simple Web Semantics, see: Simple Web Semantics — Index Post.


Apologies for quoting “URL/s” throughout the post but after reading:

Note: The term “URL” in this specification is used in a manner distinct from the precise technical meaning it is given in RFC 3986. Readers familiar with that RFC will find it easier to read this specification if they pretend the term “URL” as used herein is really called something else altogether. This is a willful violation of RFC 3986. [RFC3986]

in the latest HTML5 draft, it seemed like the right thing to do.

Would it have been too much trouble to invent “something else altogether” for this new meaning of “URL?”

February 21, 2013

Precursors to Simple Web Semantics

Filed under: RDF,Semantic Web,Semantics — Patrick Durusau @ 9:04 pm

A couple of precursors to Simple Web Semantics have been brought to my attention.

Wanted to alert you so you can consider these prior/current approaches while evaluating Simple Web Semantics.

The first one was from Rob Weir (IBM), who suggested I look at “smart tags” from Microsoft and sent the link to Soft tags (Wikipedia).

The second one was from Nick Howard (a math wizard I know) who pointed out the similarity to bookmarklets. On that see: Bookmarklet (Wikipedia).

I will be diving deeper into both of these technologies.

Not so much a historical study but what did/did not work, etc.

Other suggestions, directions, etc. are most welcome!

I have a another refinement to the syntax that I will be posting tomorrow.

February 18, 2013

Simple Web Semantics – Index Post

Filed under: OWL,RDF,Semantic Web — Patrick Durusau @ 4:23 pm

Sam Hunting suggested that I add indexes to the Simple Web Semantics posts to facilitate navigating from one to the other.

It occurred to me that having a single index page could also be useful.

The series began with:

Reasoning why something isn’t working is important to know before proposing a solution.

I have gotten good editorial feedback on the proposal and will be posting a revision in the next couple of days.

Nothing substantially different but clearer and more precise.

If you have any comments or suggestions, please make them at your earliest convenience.

I am always open to comments but the sooner they arrive the sooner I can make improvements.

February 17, 2013

Simple Web Semantics (SWS) – Syntax Refinement

Filed under: OWL,RDF,Semantic Web — Patrick Durusau @ 8:19 pm

In Saving the “Semantic” Web (part 5), the only additional HTML syntax I proposed was:

<meta name=”dictionary” content=”URI”>

in the <head> element of an HTML document.

(Where you would locate the equivalent declaration of a URI dictionary in other document formats will vary.)

But that sets the URI dictionary for an entire document.

What if you want more fine grained control over the URI dictionary for a particular URI?

It would be possible to do something complicated with namespaces, containers, scope, etc. but the simpler solution would be:

<a dictionary="URI" href="yourURI">

Either the URI is governed by the declaration for the entire page or it has a declared dictionary URI.

Or to summarize the HTML syntax of SWS at this point:

<meta name=”dictionary” content=”URI”>

<a dictionary="URI" href="yourURI">

February 15, 2013

Saving the “Semantic” Web (part 5)

Filed under: RDF,Semantic Web,Topic Maps — Patrick Durusau @ 4:33 pm

Simple Web Semantics

For what it’s worth, what follows in this post is a partial, non-universal and useful only in some cases proposal.

That has been forgotten by this point but in my defense, I did try to warn you. 😉

1. Division of Semantic Labor

The first step towards useful semantics on the web must be a division of semantic labor.

I won’t recount the various failures of the Semantic Web, topic maps and other initiatives to “educate” users on how they should encode semantics.

All such efforts have, are now and will fail.

That is not a negative comment on users.

In another life I advocated tools that would enable biblical scholars to work in XML, without having to learn angle-bang syntax. It wasn’t for lack of intelligence, most of them were fluent in five or six ancient languages.

They were focused on being biblical scholars and had no interest in learning the minutiae of XML encoding.

After many years, due to a cast of hundreds if not thousands, OpenOffice, OpenDocumentFormat (ODF) and XML editing became available to the ordinary users.

Not the fine tuned XML of the Text Encoding Initiative (TEI) or DocBook, but having a 50 million plus user share is better than being in the 5 to 6 digit range.

Users have not succeeded in authoring structured data, such as RDF, but have demonstrated competence at authoring <a> elements with URIs.

I propose the following division of semantic labor:

Users – Responsible for identification of subjects in content they author, using URIs in the <a> element.

Experts – Responsible for annotation (by whatever means) of URIs that can be found in <a> elements in content.

2. URIs as Pointers into a Dictionary

One of the comments in these series pointed out that URIs are like “pointers into a dictionary.” I like that imagery and it is easier to understand than the way I intended to say it.

If you think of words as pointers into a dictionary, how many dictionaries does a word point into?

And contrast your answer with the number of dictionaries into which a URI points?

If we are going to use URIs as “pointers into a dictionary,” then there should be no limit on the number of dictionaries into which they can point.

A URI can be posed to any number of dictionaries as a query, with possibly different results from each dictionary.

3. Of Dictionaries

Take for example the URI, http://data.nytimes.com/47271465269191538193 as an example of a URI that can appear in a dictionary.

If you follow that URI, you will notice a couple of things:

  1. It isn’t content suitable for primary or secondary education.
  2. The content is limited to that of the New York Times.
  3. The content of the NYT consists of article pointers

Not to mention it is a “pull” interface that requires effort on the part of users, as opposed to a “push” interface that reduces that effort.

What if rather than “following” the URI http://data.nytimes.com/47271465269191538193, you could submit that same URI to another dictionary, one than had different information?

A dictionary that for that URI returns:

  1. Links to content suitable for primary or secondary education.
  2. Broader content than just New York Times.
  3. Curated content and not just article pointers

Just as we have semantic diversity:

URI dictionaries shall not be required to use a particular technology or paradigm.

4. Immediate Feedback

Whether you will admit it or not, we have all coded HTML and then loaded it in a browser to see the results.

That’s called “immediate feedback” and made HTML, the early versions anyway, extremely successful.

When <a> elements with URIs are used to identify subjects, how can we duplicate that “immediate feedback” experience?

My suggestion is that users encode in the <head> of their documents a meta element that reads:

<meta name=”dictionary” content=”URI”>

And insert either JavaScript or JQuery code that creates an array of all the URIs in the document, passes those URIs to the dictionary specified by the user and then displays a set of values when a user mouses over a particular URI.

Think of it as being the equivalent of spell checking except for subjects. You could even call it “subject checking.”

For most purposes, dictionaries should only return 3 or 4 key/values pairs, enough for users to verify their choice of a URI. With an option to see more information.

True enough, I haven’t asked for users to say which of those properties identify the subject in question and I don’t intend to. That lies in the domain of experts.

The inline URI mechanism lends itself to automatic insertion of URIs, which users could then verify capture their meaning. (Wikifier is a good example, assuming you have a dictionary based on Wikipedia URIs.)

Users should be able to choose the dictionaries they prefer for identification of subjects. Further, users should be able to verify their identifications from observing properties associated with a URI.

5. Incentives, Economic and Otherwise

There are economic and other incentives that arise from “Simple Web Semantics.”

First, divorcing URI dictionaries from any particular technology will create an easy on ramp for dictionary creators to offer as many or few services as they choose. Users can vote with their feet on which URI dictionaries meet their needs.

Second, divorcing URIs from their sources creates the potential for economic opportunities and competition in the creation of URI dictionaries. Dictionary creators can serve up definitions for popular URIs, along with pointers to other content, free and otherwise.

Third, giving users the right to choose their URI dictionaries is a step towards returning democracy to the WWW.

Fourth, giving users immediate feedback based on URIs they choose, makes users the judges of their own semantics, again.

Fifth, with the rise of URI dictionaries, the need to maintain URIs, “cool” or otherwise, simply disappears. No one maintains the existence of words. We have dictionaries.

There are technical refinements that I could suggest but I wanted to draw the proposal in broad strokes and improve it based on your comments.

Comments/Suggestions?

PS: As I promised at the beginning, this proposal does not address many of the endless complexities of semantic integration. If you need a different solution, for a different semantic integration problem, you know where to find me.


February 13, 2013

Saving the “Semantic” Web (part 4)

Filed under: RDF,Semantic Diversity,Semantic Web,Semantics — Patrick Durusau @ 4:15 pm

Democracy vs. Aristocracy

Part of a recent comment on this series reads:

What should we have been doing instead of the semantic web? ISO Topic Maps? There is some great work in there, but has it been a better success?

That is an important question and I wanted to capture it outside of comments on a prior post.

Earlier in this series of posts I pointed out the success of HTML, especially when contrasted with Semantic Web proposals.

Let me hasten to add the same observation is true for ISO Topic Maps (HyTime or later versions).

The critical difference between HTML (the early and quite serviceable versions) and Semantic Web/Topic Maps is that the former democratizes communication and the latter fosters a technical aristocracy.

Every user who can type and some who hunt-n-peck, can author HTML and publish their content for others around the world to read, discuss, etc.

That is a very powerful and democratizing notion about content creation.

The previous guardians, gate keepers, insiders, and their familiars, who didn’t add anything of value to prior publications processes, are still reeling from the blow.

Even as old aristocracies crumble, new ones evolve.

Technical aristocracies for example. A phrase relevant to both the Semantic Web and ISO Topic Maps.

Having tasted freedom, the crowds aren’t as accepting of the lash/leash as they once were. Nor of the aristocracies who would wield them. Nor should they be.

Which make me wonder: Why the emphasis on creating dumbed down semantics for computers?

We already have billions of people who are far more competent semantically than computers.

Where are our efforts to enable them to transverse the different semantics of other users?

Such as the semantics of the aristocrats who have self-anointed themselves to labor on their behalf?

If you have guessed that I have little patience with aristocracies, you are right in one.

I came by that aversion honestly.

I practiced law in a civilian jurisdiction for a decade. A specialist language, law, can be more precise, but it also excludes others from participation. The same experience was true when I studied theology and ANE languages. A bit later, in markup technologies (then SGML/HyTime), the same lesson was repeated. What I do with ODF and topic maps are two more specialized languages.

Yet a reasonably intelligent person can discuss issues in any of those fields, if they can get past the language barriers aristocrats take so much comfort in maintaining.

My answer to what we should be doing is:

Looking for ways to enable people to traverse and enjoy the semantic diversity that accounts for the richness of the human experience.

PS: Computers have a role to play in that quest, but a subordinate one.


February 12, 2013

Saving the “Semantic” Web (part 3)

Filed under: OWL,RDF,Semantic Web — Patrick Durusau @ 6:22 pm

On Semantic Transparency

The first responder to this series of posts, j22, argues the logic of the Semantic Web has been found to be useful.

I said as much in my post and stand by that position.

The difficulty is that the “logic” of the Semantic Web excludes vast swathes of human expression and the people who would make those expressions.

If you need authority for that proposition, consider George Boole (An Investigation of the Laws of Thought, pp. 327-328):

But the very same class of considerations shows with equal force the error of those who regard the study of Mathematics, and of their applications, as a sufficient basis either of knowledge or of discipline. If the constitution of the material frame is mathematical, it is not merely so. If the mind, in its capacity of formal reasoning, obeys, whether consciously or unconsciously, mathematical laws, it claims through its other capacities of sentiment and action, through its perceptions of beauty and of moral fitness, through its deep springs of emotion and affection, to hold relation to a different order of things. There is, moreover, a breadth of intellectual vision, a power of sympathy with truth in all its forms and manifestations, which is not measured by the force and subtlety of the dialectic faculty. Even the revelation of the material universe in its boundless magnitude, and pervading order, and constancy of law, is not necessarily the most fully apprehended by him who has traced with minutest accuracy the steps of the great demonstration. And if we embrace in our survey the interests and duties of life, how little do any processes of mere ratiocination enable us to comprehend the weightier questions which they present! As truly, therefore, as the cultivation of the mathematical or deductive faculty is a part of intellectual discipline, so truly is it only a part. The prejudice which would either banish or make supreme any one department of knowledge or faculty of mind, betrays not only error of judgment, but a defect of that intellectual modesty which is inseparable from a pure devotion to truth. It assumes the office of criticising a constitution of things which no human appointment has established, or can annul. It sets aside the ancient and just conception of truth as one though manifold. Much of this error, as actually existent among us, seems due to the special and isolated character of scientific teaching—which character it, in its turn, tends to foster. The study of philosophy, notwithstanding a few marked instances of exception, has failed to keep pace with the advance of the several departments of knowledge, whose mutual relations it is its province to determine. It is impossible, however, not to contemplate the particular evil in question as part of a larger system, and connect it with the too prevalent view of knowledge as a merely secular thing, and with the undue predominance, already adverted to, of those motives, legitimate within their proper limits, which are founded upon a regard to its secular advantages. In the extreme case it is not difficult to see that the continued operation of such motives, uncontrolled by any higher principles of action, uncorrected by the personal influence of superior minds, must tend to lower the standard of thought in reference to the objects of knowledge, and to render void and ineffectual whatsoever elements of a noble faith may still survive.

Or Justice Holmes writing in 1881 (The Common Law, page 1)

The life of the law has not been logic: it has been experience. The felt necessities of the time, the prevalent moral and political theories, intuitions of public policy, avowed or unconscious, even the prejudices which judges share with their fellow-men, have had a good deal more to do than the syllogism in determining the rules by which men should be governed. The law embodies the story of a nation’s development through many centuries, and it cannot be dealt with as if it contained only the axioms and corollaries of a book of mathematics.

In terms of historical context, remember that Holmes is writing at a time when works like John Stuart Mill’s A System of Logic, Ratiocinative and Inductive: being a connected view of The Principles of Evidence and the Methods of Scientific Investigation, were in high fashion.

The Semantic Web isn’t the first time “logic” has been seized upon as useful (as no doubt it is) and exclusionary (the part I object to) of other approaches.

Rather than presuming the semantic monotone the Semantic Web needs for its logic, a false presumption for owl:sameAs and no doubt other subjects, why not empower users to use more complex identifiers for subjects than solitary URIs?

It would not take anything away from the current Semantic Web infrastructure, simply makes its basis, URIs, less semantically opaque to users.

Isn’t semantic transparency a good thing?


February 11, 2013

Saving the “Semantic” Web (part 2) [NOTLogic]

Filed under: Linked Data,RDF,Semantic Web — Patrick Durusau @ 5:45 pm

Expressing Your Semantics: NOTLogic

Saving the “Semantic” Web (part 1) ended concluding authors of data/content should be asked about the semantics of their content.

I asked if there were compelling reasons to ask someone else and got no takers.

The acronym, NOTLogic may not be familiar. It expands to: Not Only Their Logic.

Users should express their semantics in the “logic” of their domain.

After all, it is their semantics, knowledge and domain that are being captured.

Their “logic” may not square up with FOL (first order logic) but where’s the beef?

Unless one of the project requirements is to maintain consistency with FOL, why bother?

The goal in most BI projects is ROI on capturing semantics, not adhering to FOL for its own sake.

Some people want to teach calculators how to mimic “reasoning” by using that subset known as “logic.”

However much I liked the Friden rotary calculator of my youth:

Calculator

teaching it to mimic “reasoning” isn’t going to happen on my dime.

What about yours?

There are cases where machine learning technique are very productive and fully justified.

The question you need to ask yourself (after discovering if you should be using RDF at all, The Semantic Web Is Failing — But Why? (Part 2)) is whether “their” logic works for your use case.

I suspect you will find that you can express your semantics, including relationships, without resort to FOL.

Which may lead you to wonder: Why would anyone want you to use a technique they know, but you don’t?

I don’t know for sure but have some speculations on that score I will share with you tomorrow.

In the mean time, remember:

  1. As the author of content or data, you are the person to ask about its semantics.
  2. You should express your semantics in a way comfortable for you.

February 8, 2013

Saving the “Semantic” Web (part 1)

Filed under: Semantic Web,Semantics — Patrick Durusau @ 5:17 pm

Semantics: Who You Gonna Call?

I quote “semantic” in ‘Semantic Web’ to emphasize the web had semantics long before puff pieces in Scientific American.

As a matter of fact, people traffic in semantics every day, in a variety of mediums. The “Web,” for all of its navel gazing, is just one.

At your next business or technical meeting, if a colleague uses a term you don’t know, here are some options:

  1. Search Cyc.
  2. Query WordNet.
  3. Call Pat Hayes.
  4. Ask the speaker what they meant.

Take a minute to think about it and put your answer in a comment below.

Other than Tim Berners-Lee, I suspect the vast majority of us will pick #4.

Here’s another quiz.

If asked, will the speaker respond with:

  1. Repeating the term over again, perhaps more loudly? (An Americanism that English spoken loudly is more understandable by non-English speakers. Same is true for technical terms.)
  2. Restating the term in Common Logic syntax?
  3. Singing a “cool” URI?
  4. Expanding the term by offering other properties that may be more familiar to you?

Again, other than Tim Berners-Lee, I suspect the vast majority of us will pick #4.

To summarize up to this point:

  1. We all have experience with semantics and encountering unknown semantics.
  2. We all (most of us) ask the speaker of unknown semantics to explain.
  3. We all (most of us) expect an explanation to offer additional information to clue us into the unknown semantic.

My answer to the question of “Semantics: Who You Gonna Call?” is the author of the data/information.

Do you have a compelling reason for asking someone else?


February 7, 2013

The Semantic Web Is Failing — But Why? (Part 5)

Filed under: Identity,OWL,RDF,Semantic Web — Patrick Durusau @ 4:30 pm

Impoverished Identification by URI

There is one final part of the faliure of the Semantic Web puzzle to explore before we can talk about a solution.

In owl:sameAs and Linked Data: An Empircal Study, Ding, Shinavier, Finin and McGuinness write:

Our experimental results have led us to identify several issues involving the owl:sameAs property as it is used in practice in a linked data context. These include how best to manage owl:sameAs assertions from “third parties”, problems in merging assertions from sources with different contexts, and the need to explore an operational semantics distinct from the strict logical meaning provided by OWL.

To resolve varying usages of owl:sameAs, the authors go beyond identifications provided by a URI to look to other properties. For example:

Many owl:sameAs statements are asserted due to the equivalence of the primary feature of resource description, e.g. the URIs of FOAF profiles of a person may be linked just because they refer to the same person even if the URIs refer the person at different ages. The odd mashup on job-title in previous section is a good example for why the URIs in different FOAF profiles are not fully equivalent. Therefore, the empirical usage of owl:sameAs only captures the equivalence semantics on the projection of the URI on social entity dimension (removing the time and space dimensions). In thisway, owl:sameAs is used to indicate p artial equivalence between two different URIs, which should not be considered as full equivalence.

Knowing the dimensions covered by a URI and the dimensions covered by a property, it is possible to conduct better data integration using owl:sameAs. For example, since we know a URI of a person provides a temporal-spatial identity, descriptions using time-sensitive properties, e.g. age, height and workplace, should not be aggregated, while time-insensitive properties, such as eye color and social security number, may be aggregated in most cases.

When an identification is insufficient based on a single URI, additional properties can be considered.

My question then is why do ordinary users have to wait for experts to decide their identifications are insufficient? Why can’t we empower users to declare multiple properties, including URIs, as a means of identification?

It could be something as simple as JSON key/value pairs with a notation of “+” for must match, “-” for must not match, and “?” for optional to match.

A declaration of identity by users about the subjects in their documents. Who better to ask?

Not to mention that the more information supplies with for an identification, the more likely they are to communicate, successfully, with other users.

URIs may be Tim Berners-Lee’s nails, but they are insufficient to support the scaffolding required for robust communication.


The next series starts with Saving the “Semantic” Web (Part 1)

The Semantic Web Is Failing — But Why? (Part 4)

Filed under: Interface Research/Design,RDF,Semantic Web — Patrick Durusau @ 4:30 pm

Who Authors The Semantic Web?

With the explosion of data, “big data” to use the oft-abused terminology, authoring semantics cannot be solely the province of a smallish band of experts.

Ordinary users must be enabled to author semantics on subjects of importance to them, without expert supervision.

The Semantic Web is designed for the semantic equivalent of:

F16 Cockpit

An F16 cockpit has an interface some people can use, but hardly the average user.

VW Dashboard

The VW “Bettle” has an interface used by a large number of average users.

Using a VW interface, users still have accidents, disobey rules of the road, lock their keys inside and make other mistakes. But the number of users who can use the VW interface is several orders of magnitude greater than the F-16/RDF interface.

Designing a solution that only experts can use, if participation by average users is a goal, is a path to failure.


The next series starts with Saving the “Semantic” Web (Part 1)

The Semantic Web Is Failing — But Why? (Part 3)

Filed under: Linked Data,RDF,Semantic Web — Patrick Durusau @ 4:30 pm

Is Linked Data the Answer?

Leaving the failure of users to understand RDF semantics to one side, there is also the issue of the complexity of its various representations.

Consider Kingsley Idehem’s “simple” example Turtle document, which he posted in: Simple Linked Data Deployment via Turtle Docs using various Storage Services:

##### Starts Here #####
# Note: the hash is a comment character in Turtle
# Content start
# You can save this to a local file. In my case I use Local File Name: kingsley.ttl .
# Actual Content:

# prefix decalaration that enable the use of compact identifiers instead of fully expanded 
# HTTP URIs.

@prefix owl:   .
@prefix foaf:  .
@prefix rdfs:  . 
@prefix wdrs:  .
@prefix opl:  .
@prefix cert:  .
@prefix:<#>.

# Profile Doc Stuff

<> a foaf:Document . 
<> rdfs:label "DIY Linked Data Doc About: kidehen" .
<> rdfs:comment "Simple Turtle File That Describes Entity: kidehen " .

# Entity Me Stuff

<> foaf:primaryTopic :this .
<> foaf:maker :this .
:this a foaf:Person . 
:this wdrs:describedby <> . 
:this foaf:name "Kingsley Uyi Idehen" .
:this foaf:firstName "Kingsley" .
:this foaf:familyName "Idehen" .
:this foaf:nick "kidehen" .
:this owl:sameAs  .
:this owl:sameAs  .
:this owl:sameAs  .
:this owl:sameAs  .
:this foaf:page  .
:this foaf:page  .
:this foaf:page  .
:this foaf:page  . 
:this foaf:knows , , , , ,  .

# Entity Me: Identity & WebID Stuff 

#:this cert:key :pubKey .
#:pubKey a cert:RSAPublicKey;
# Public Key Exponent
# :pubkey cert:exponent "65537" ^^ xsd:integer;
# Public Key Modulus
# :pubkey cert:modulus "d5d64dfe93ab7a95b29b1ebe21f3cd8a6651816c9c39b87ec51bf393e4177e6fc
2ee712d92caf9d9f1423f5e65f127274529a2e6cc53f1e452c6736e8db8732f919c4160eaa9b6f327c8617c
40036301b547abfc4c5de610780461b269e3d8f8e427237da6152ac2047d88ff837cddae793d15427fa7ce
067467834663737332be467eb353be678bffa7141e78ce3052597eae3523c6a2c414c2ae9f8d7be807bb3
fc0d516b8ecd2fafee4f20ff3550919601a0ad5d29126fb687c2e8c156f04918a92c4fc09f136473f3303814e1
83185edf0046e124e856ca7ada027345e614f8d665f5d7172d880497005ff4626c2b0f2206f7dce717e4f279
dd2a0ddf04b" ^^ xsd:hexBinary .

# :this opl:hasCertificate :cert .
# :cert opl:fingerprint "640F9DD4CFB6DD6361CBAD12C408601E2479CC4A" ^^ xsd:hexBinary;
#:cert opl:hasPublicKey "d5d64dfe93ab7a95b29b1ebe21f3cd8a6651816c9c39b87ec51bf393e4177e6fc2
ee712d92caf9d9f1423f5e65f127274529a2e6cc53f1e452c6736e8db8732f919c4160eaa9b6f327c8617c400
36301b547abfc4c5de610780461b269e3d8f8e427237da6152ac2047d88ff837cddae793d15427fa7ce06746
7834663737332be467eb353be678bffa7141e78ce3052597eae3523c6a2c414c2ae9f8d7be807bb3fc0d516b
8ecd2fafee4f20ff3550919601a0ad5d29126fb687c2e8c156f04918a92c4fc09f136473f3303814e183185edf00
46e124e856ca7ada027345e614f8d665f5d7172d880497005ff4626c2b0f2206f7dce717e4f279dd2a0ddf04b" 
^^ xsd:hexBinary .

### Ends or Here###

Try handing that “simple” example and Idehem’s article to some non-technical person in your office to gauge its “simplicity.”

For that matter, hand it to some of your technical but non-Semantic Web folks as well.

Your experience with that exercise will speak louder than anything I can say.


The next series starts with Saving the “Semantic” Web (Part 1)

The Semantic Web Is Failing — But Why? (Part 2)

Filed under: RDF,Semantic Web — Patrick Durusau @ 4:30 pm

Should You Be Using RDF?

Pay Hayes (editor of RDF Semantics) and Richard Cyganiak (a linked data expert), had this interchange on the RDF Working Group discussion list:

Cyganiak: The text stresses that the presence of an ill-typed literals does not constitute an inconsistency.

Cyganiak: But why does the distinction matter?

Hayes: I am not sure what you mean by “the distinction” here. Why would you expect that an ill-typed literal would produce an inconsistency? Why would the presence of an ill-typed literal make a triple false?

Cyganiak: Is there any reason anybody needs to know about this distinction who isn’t interested in the arcana of the model theory?

Hayes: I’m not sure what you consider to be “arcana”. Someone who cannot follow the model theory probably shouldn’t be using RDF. (emphasis added) Re: Ill-typed vs. inconsistent? (Mon, 12 Nov 2012 01:58:51 -0600)

When challenged on the need to follow model theory, Hayes retreats, but only slightly:

Well, it was rather late and I had just finished driving 2400 miles so maybe I was a bit abrupt. But I do think that anyone who does not understand what “inconsistent” means should not be using RDF, or at any rate should only be using it under the supervision of someone who *does* know the basics of semantic notions. Its not like nails versus metallurgy so much as nails versus hammers. If you are trying to push the nails in by hand, you probably need to hire a framer. (emphasis added) Re: Ill-typed vs. inconsistent? (Mon, 12 Nov 2012 09:58:52 -0600)

A portion of the Introduction to RDF Semantics reads:

RDF is an assertional language intended to be used to express propositions using precise formal vocabularies, particularly those specified using RDFS [RDF-VOCABULARY], for access and use over the World Wide Web, and is intended to provide a basic foundation for more advanced assertional languages with a similar purpose. The overall design goals emphasise generality and precision in expressing propositions about any topic, rather than conformity to any particular processing model: see the RDF Concepts document [RDF-CONCEPTS] for more discussion.

Exactly what is considered to be the ‘meaning’ of an assertion in RDF or RDFS in some broad sense may depend on many factors, including social conventions, comments in natural language or links to other content-bearing documents. Much of this meaning will be inaccessible to machine processing and is mentioned here only to emphasize that the formal semantics described in this document is not intended to provide a full analysis of ‘meaning’ in this broad sense; that would be a large research topic. The semantics given here restricts itself to a formal notion of meaning which could be characterized as the part that is common to all other accounts of meaning, and can be captured in mechanical inference rules.

This document uses a basic technique called model theory for specifying the semantics of a formal language. Readers unfamiliar with model theory may find the glossary in appendix B helpful; throughout the text, uses of terms in a technical sense are linked to their glossary definitions. Model theory assumes that the language refers to a ‘world‘, and describes the minimal conditions that a world must satisfy in order to assign an appropriate meaning for every expression in the language. A particular world is called an interpretation, so that model theory might be better called ‘interpretation theory’. The idea is to provide an abstract, mathematical account of the properties that any such interpretation must have, making as few assumptions as possible about its actual nature or intrinsic structure, thereby retaining as much generality as possible. The chief utility of a formal semantic theory is not to provide any deep analysis of the nature of the things being described by the language or to suggest any particular processing model, but rather to provide a technical way to determine when inference processes are valid, i.e. when they preserve truth. This provides the maximal freedom for implementations while preserving a globally coherent notion of meaning.

Model theory tries to be metaphysically and ontologically neutral. It is typically couched in the language of set theory simply because that is the normal language of mathematics – for example, this semantics assumes that names denote things in a set IR called the ‘universe‘ – but the use of set-theoretic language here is not supposed to imply that the things in the universe are set-theoretic in nature. Model theory is usually most relevant to implementation via the notion of entailment, described later, which makes it possible to define valid inference rules.

Readers should read RDF Semantics to answer for themselves whether they understand “inconsistent” as defined therein. Noting that Richard Cyganiak, a linked data expert, did not.


The next series starts with Saving the “Semantic” Web (Part 1)

The Semantic Web Is Failing — But Why? (Part 1)

Filed under: Identity,OWL,RDF,Semantic Web — Patrick Durusau @ 4:29 pm

Introduction

Before proposing yet another method for identification and annotation of entities in digital media, it is important to draw lessons from existing systems. Failing systems in particular, so their mistakes are not repeated or compounded. The Semantic Web is an example of such a system.

Doubters of that claim should the report Additional Statistics and Analysis of the Web Data Commons August 2012 Corpus by Web Data Commons.

Web Data Commons is a structured data research project based at the Research Group Data and Web Science at the University of Mannheim and the Institute AIFB at the Karlsruhe Institute of Technology. Supported by PlanetData and LOD2 research projects, the Web Data Commons is not opposed to the Semantic Web.

But the Additional Statistics and Analysis of the Web Data Commons August 2012 Corpus document reports:

Altogether we discovered structured data within 369 million of the 3 billion pages contained in the Common Crawl corpus (12.3%). The pages containing structured data originate from 2.29 million among the 40.5 million websites (PLDs) contained in the corpus (5.65%). Approximately 519 thousand websites use RDFa, while only 140 thousand websites use Microdata. Microformats are used on 1.7 million websites. It is interesting to see that Microformats are used by approximately 2.5 times as many websites as RDFa and Microdata together. (emphasis added)

To sharpen the point, RDFa is 1.28% of the 40.5 million websites, eight (8) years after its introduction (2004) and four (4) years after reaching Recommendation status (2008).

Or more generally:

Parsed HTML URLs 3,005,629,093
URLs with Triples 369,254,196

On in a layperson’s terms, for this web corpus, parsed HTML URLs outnumber URLs with Triples between approximately eight to one.

Being mindful that the corpus is only web accessible data and excludes “dark data,” the need for a more robust solution that the Semantic Web is self-evident.

The failure of the Semantic Web is no assurance that any alternative proposal will fare better. Understanding why the Semantic Web is failing is a prerequisite to any successful alternative.


Before you “flame on,” you might want to read the entire series. I end up with a suggestion based on work by Ding, Shinavier, Finin and McGuinness.


The next series starts with Saving the “Semantic” Web (Part 1)

January 21, 2013

RDF 1.1 Concepts and Abstract Syntax [New Draft]

Filed under: RDF,Semantic Web — Patrick Durusau @ 7:23 pm

RDF 1.1 Concepts and Abstract Syntax

From the introduction:

The Resource Description Framework (RDF) is a framework for representing information in the Web.

RDF 1.1 Concepts and Abstract Syntax defines an abstract syntax (a data model) which serves to link all RDF-based languages and specifications. The abstract syntax has two key data structures: RDF graphs are sets of subject-predicate-object triples, where the elements may be IRIs, blank nodes, or datatyped literals. They are used to express descriptions of resources. RDF datasets are used to organize collections of RDF graphs, and comprise a default graph and zero or more named graphs. This document also introduces key concepts and terminology, and discusses datatyping and the handling of fragment identifiers in IRIs within RDF graphs.

Numerous issues await your comments and suggestions.

January 20, 2013

Semantic Web meets Integrative Biology: a survey

Filed under: Bioinformatics,Semantic Web — Patrick Durusau @ 8:04 pm

Semantic Web meets Integrative Biology: a survey by Haujun Chen, Tong Yu and Jake Y. Chen.

Abstract:

Integrative Biology (IB) uses experimental or computational quantitative technologies to characterize biological systems at the molecular, cellular, tissue and population levels. IB typically involves the integration of the data, knowledge and capabilities across disciplinary boundaries in order to solve complex problems. We identify a series of bioinformatics problems posed by interdisciplinary integration: (i) data integration that interconnects structured data across related biomedical domains; (ii) ontology integration that brings jargons, terminologies and taxonomies from various disciplines into a unified network of ontologies; (iii) knowledge integration that integrates disparate knowledge elements from multiple sources; (iv) service integration that build applications out of services provided by different vendors. We argue that IB can benefit significantly from the integration solutions enabled by Semantic Web (SW) technologies. The SW enables scientists to share content beyond the boundaries of applications and websites, resulting into a web of data that is meaningful and understandable to any computers. In this review, we provide insight into how SW technologies can be used to build open, standardized and interoperable solutions for interdisciplinary integration on a global basis. We present a rich set of case studies in system biology, integrative neuroscience, bio-pharmaceutics and translational medicine, to highlight the technical features and benefits of SW applications in IB.

A very good summary the issues of data integration in bioinformatics.

I disagree with the prescription, as you might imagine, but it is a good starting place for discussion of the issues of data integration.

January 19, 2013

Could Governments Run Out of Patience with Open Data? [Semantic Web?]

Filed under: Government,Open Data,Semantic Web — Patrick Durusau @ 7:06 pm

Could Governments Run Out of Patience with Open Data? by Andrea Di Maio.

From the post:

Yesterday I had yet another client conversation – this time with a mid-size municipality in the north of Europe – on the topic of the economic value generated through open data. The problem we discussed is the same I highlighted in a post last year: nobody argues the potential long term value of open data but it may be difficult to maintain a momentum (and to spend time, money and management bandwidth) on something that will come to fruition in the more distant future, while more urgent problems need to be solved now, under growing budget constraints.

Faith is not enough, nor are the many examples that open data evangelists keep sharing to demonstrate value. Open data must help solve today’s problems too, in order to gain the credibility and the support required to realize future economic value.

While many agree that open data can contribute to shorter term goals, such as improving inter-agency transparency and data exchange or engaging citizens on solving concrete problems, making this happen in a more systematic way requires a change of emphasis and a change of leadership.

Emphasis must be on directing efforts – be they idea collections, citizen-.developed dashboards or mobile apps – onto specific, concrete problems that government organizations need to solve. One might argue that this is not dissimilar from having citizens offer perspectives on how they see existing issues and related solutions. But there is an important difference: what usually happens is that citizens and other stakeholders are free to use whichever data they want to use. The required change is to entice them to help governments solve problems the way governments see them. In other terms, whereas citizens would clearly remain free to come up with whichever use of any open data they deem important, they should get incentives, awards, prizes only for those uses that meet clear government requirements. Citizens would be at the service of government rather than the other way around. For those who might be worried that this advocates for an unacceptable change of responsibility and that governments are at the service of citizens and not the other way around, what I mean is that citizens should help governments serve them.

January 18, 2013

ISWC 2013 : The 12th International Semantic Web Conference

Filed under: Conferences,Semantic Web — Patrick Durusau @ 7:17 pm

ISWC 2013 : The 12th International Semantic Web Conference

Dates:

When Oct 21, 2013 – Oct 25, 2013
Where Sydney, Australia
Abstract Registration Due May 1, 2013
Submission Deadline May 10, 2013
Notification Due Jul 3, 2013
Final Version Due Aug 5, 2013

ISWC is the premier venue for presenting innovative systems and research results related to the Semantic Web and Linked Data. We solicit the submission of original research papers for ISWC 2013’s research track, dealing with analytical, theoretical, empirical, and practical aspects of all areas of the Semantic Web. Submissions to the research track should describe original, significant research on the Semantic Web or on Semantic Web technologies, and are expected to provide some principled means of evaluation.

To maintain the high level of quality and impact of the ISWC series, all papers will be reviewed by at least three program committee members and one vice chair of the program committee. To assess papers, reviewers will judge their originality and significance for further advances in the Semantic Web, as well as the technical soundness of the proposed approaches and the overall readability of the submitted papers. We will give specific attention to the evaluation of the approaches described in the papers. We strongly encourage evaluations that are repeatable: preference will be given to papers that provide links to the data sets and queries used to evaluate their approach, as well as systems papers providing links to their source code or to some live deployment.

Topics of Interest

Topics of interest include, but are not limited to:

  • Management of Semantic Web data and Linked Data
  • Languages, tools, and methodologies for representing and managing Semantic Web data
  • Database, IR, NLP and AI technologies for the Semantic Web
  • Search, query, integration, and analysis on the Semantic Web
  • Robust and scalable knowledge management and reasoning on the Web
  • Cleaning, assurance, and provenance of Semantic Web data, services, and processes
  • Semantic Web Services
  • Semantic Sensor Web
  • Semantic technologies for mobile platforms
  • Evaluation of semantic web technologies
  • Ontology engineering and ontology patterns for the Semantic Web
  • Ontology modularity, mapping, merging, and alignment
  • Ontology Dynamics
  • Social and Emergent Semantics
  • Social networks and processes on the Semantic Web
  • Representing and reasoning about trust, privacy, and security
  • User Interfaces to the Semantic Web
  • Interacting with Semantic Web data and Linked Data
  • Information visualization of Semantic Web data and Linked Data
  • Personalized access to Semantic Web data and applicationsSemantic Web technologies for eGovernment, eEnvironment, eMobility or eHealth
  • Semantic Web and Linked Data for Cloud environments

Submission

Pre-submission of abstracts is a strict requirement. All papers and abstracts have to be submitted electronically via the EasyChair conference submission System https://www.easychair.org/conferences/?conf=iswc2013.

Semantic Web Gets Closer To The Internet of Things [Close Enough To Be Useful?]

Filed under: Semantic Web — Patrick Durusau @ 7:17 pm

Semantic Web Gets Closer To The Internet of Things by Jennifer Zaino. [herein, IoT = Internet of Things]

From the post:

The Internet of Things is coming, but it needs a semantic backbone to flourish.

Applying semantic technologies to IoT, however, has several research challenges, the authors note, pointing out that IoT and using semantics in IoT is still in its early days. Being in on the ground floor of this movement is undeniably exciting to the research community, including people such as Konstantinos Kotis, Senior Research Scientist at University of the Aegean, and IT Manager in the regional division of the Samos and Ikaria islands at North Aegean Regional Administration Authority.

Well, but the Semantic Web has been in “its early days” for quite some time now. A decade or more?

And with every proposal, now the Internet of Things, the semantic issues are going to be solved real soon now. But in reality the semantic can gets kicked down the road. Again.

Not that I have a universal semantic solution, different from all the other universal semantic solutions to propose. Mainly because universal semantic solutions fail. (full stop)

Not to mention that selling a business case for an investment now and for the foreseeable future that will pay off, maybe, if someday everyone else adopts the same solution.

I am not an investor or business mogul but that sounds on the risky side to me.

If I were going to invest in a semantic solution, I would want it to have a defined payoff for my enterprise or organization, whether anyone else adopted it or not.

Web accessible identifiers are not a counter-example. Web accessible identifiers work perfectly well in the absence of any universal semantic solution.

January 15, 2013

Chinese Rock Music

Filed under: Music,OWL,RDF,Semantic Web — Patrick Durusau @ 8:30 pm

Experiences on semantifying a Mediawiki for the biggest recource about Chinese rock music: rockinchina .com by René Pickhardt.

From the post:

During my trip in China I was visiting Beijing on two weekends and Maceau on another weekend. These trips have been mainly motivated to meet old friends. Especially the heads behind the biggest English resource of Chinese Rock music Rock in China who are Max-Leonhard von Schaper and the founder of the biggest Chinese Rock Print Magazin Yang Yu. After looking at their wiki which is pure gold in terms of content but consists mainly of plain text I introduced them the idea of putting semantics inside the project. While consulting them a little bit and pointing them to the right resources Max did basically the entire work (by taking a one month holiday from his job. Boy this is passion!).

I am very happy to anounce that the data of rock in china is published as linked open data and the process of semantifying the website is in great shape. In the following you can read about Max experiences doing the work. This is particularly interesting because Max has no scientific background in semantic technologies. So we can learn a lot on how to improve these technologies to be ready to be used by everybody:

Good to see that René hasn’t lost his touch for long blog titles. 😉

A very valuable lesson in the difficulties posed by current “semantic” technologies.

Max and company succeed, but only after heroic efforts.

January 9, 2013

@AMS Webinars on Linked Data

Filed under: Linked Data,LOD,Semantic Web — Patrick Durusau @ 12:01 pm

@AMS Webinars on Linked Data

From the website:

The traditional approach of sharing data within silos seems to have reached its end. From governments and international organizations to local cities and institutions, there is a widespread effort of opening up and interlinking their data. Linked Data, a term coined by Tim Berners-Lee in his design note regarding the Semantic Web architecture, refers to a set of best practices for publishing, sharing, and interlinking structured data on the Web.

Linked Open Data (LOD), a concept that has leapt onto the scene in the last years, is Linked Data distributed under an open license that allows its reuse for free. Linked Open Data becomes a key element to achieve interoperability and accessibility of data, harmonisation of metadata and multilinguality.

There are four remaining seminars in this series:

Webinar in French | 22nd January 2013 – 11:00am Rome time
Clarifiez le sens de vos données publiques grâce au Web de données
Christophe Guéret, Royal Netherlands Academy of Arts and Sciences, Data Archiving and Networked Services (DANS)

Webinar in Chinese | 29th January 2013 – 02:00am Rome time
基于网络的研讨会 “题目:理解和利用关联数据 --图情档博(LAM)作为关联数据的提供者和使用者”
Marcia Zeng, School of Library and Information Science, Kent State University

Webinar in Russian | 5th February 2013 – 11:00am Rome time
Введение в концепцию связанных открытых данных
Irina Radchenko, Centre of Semantic Technologies, Higher School of Economics

Webinar in Arabic | 12th February 2013 – 11:00am Rome time
Ibrahim Elbadawi, UAE Federal eGovernment

Mark your agenda! New Free Webinars @ AIMS on Linked Open Data for registration and more details.

January 5, 2013

The Semantic Link [ODI Drug Example?]

Filed under: Open Data,Semantic Web — Patrick Durusau @ 3:10 pm

The Semantic Link

Archive of the Semantic Link podcasts.

Semantic Link is a monthly podcast on Semantic Technologies from Semanticweb.com.

In December of 2012, Nigel Shadbolt, chairman and co-founder of ODI (Open Data Institute) is a special guest.

Nigel offers an odd example of the value of open data. See what you think:

The prescriptions, but not for who, written by all physicians, are made public. A start-up company noticed that many prescribed drugs were “off-license” (generic to use the U.S. terminology) but doctors were still prescribing the brand name drug.

Reported savings of 200 million £ in one drug area.

That success isn’t a function of having “open data” but having an intelligent person review the data. Whether open or not.

I can assure you my drug company knows the precise day when it anticipates a generic version of a drug will become available. 😉

December 24, 2012

10 Rules for Persistent URIs [Actually only one] Present of Persistent URIs

Filed under: Linked Data,Semantic Web,WWW — Patrick Durusau @ 2:11 pm

Interoperability Solutions for European Public Administrations got into the egg nog early:

D7.1.3 – Study on persistent URIs, with identification of best practices and recommendations on the topic for the MSs and the EC (PDF) (I’m not kidding, go see for yourself.)

Five (5) positive rules:

  1. Follow the pattern: http://(domain)/(type)/(concept)/(reference)
  2. Re-use existing identifiers
  3. Link multiple representations
  4. Implement 303 redirects for real-world objects
  5. Use a dedicated servive

Five (5) negative rules:

  1. Avoid stating ownership
  2. Avoid version numbers
  3. Avoid using auto-increment
  4. Avoid query strings
  5. Avoid file extensions

If the goal is “persistent” URIs, only the “Use a dedicated server” has any relationship to making a URIs “persistent.”

That is that five (5) or ten (10) years from now, a URI used as an identifier will return the same value as today.

The other nine rules have no relationship to persistence. Good arguments can be made for some of them, but persistence isn’t one of them.

Why the report hides behind the rhetoric of persistence I cannot say.

But you can satisfy yourself that only a “dedicated server” can persist a URI, whatever its form.

W3C confusion over identifiers and locators for web resources continues to plague this area.

There isn’t anything particularly remarkable about using a URI as an identifier. So long as it is understood that URI identifiers are just like any other identifier.

That is they can be indexed, annotated, searched for and returned to users with data about the object of the identification.

Viewed that way, that once upon a time there was a resource with the location specified by a URI, has little or nothing to do with the persistent of that URI.

So long as we have indexed the URI, that index can serve as a resolution of that URI/identifier for as long as the index persists. With additional information should we choose to create and provide it.

The EU document concedes as much when it says:

Without exception, all the use cases discussed in section 3 where a policy of URI persistence has been adopted, have used a dedicated service that is independent of the data originator. The Australian National Data Service uses a handle resolver, Dublin Core uses purl.org, services, data.gov.uk and publications.europa.eu are all also independent of a specific government department and could readily be transferred and run by someone else if necessary. This does not imply that a single service should be adopted for multiple data providers. On the contrary – distribution is a key advantage of the Web. It simply means that the provision of persistent URIs should be independent of the data originator.

That is if you read: “…independent of the data originator” to mean independent of a particular location on the WWW.

No changes in form, content, protocols, server software, etc., required. And you get persistent URIs.

Merry Christmas to all and to all…, persistent URIs as identifiers (not locators)!

(I first saw this at: New Report: 10 Rules for Persistent URIs)

December 14, 2012

Semantic Technology ROI: Article of Faith? or Benchmarks for 1.28% of the web?

Filed under: Benchmarks,Marketing,Performance,RDFa,Semantic Web — Patrick Durusau @ 3:58 pm

Orri Erling, in LDBC: A Socio-technical Perspective, writes in part:

I had a conversation with Michael at a DERI meeting a couple of years ago about measuring the total cost of technology adoption, thus including socio-technical aspects such as acceptance by users, learning curves of various stakeholders, whether in fact one could demonstrate an overall gain in productivity arising from semantic technologies. [in my words, paraphrased]

“Can one measure the effectiveness of different approaches to data integration?” asked I.

“Of course one can,” answered Michael, “this only involves carrying out the same task with two different technologies, two different teams and then doing a double blind test with users. However, this never happens. Nobody does this because doing the task even once in a large organization is enormously costly and nobody will even seriously consider doubling the expense.”

LDBC does in fact intend to address technical aspects of data integration, i.e., schema conversion, entity resolution, and the like. Addressing the sociotechnical aspects of this (whether one should integrate in the first place, whether the integration result adds value, whether it violates privacy or security concerns, whether users will understand the result, what the learning curves are, etc.) is simply too diverse and so totally domain dependent that a general purpose metric cannot be developed, at least not in the time and budget constraints of the project. Further, adding a large human element in the experimental setting (e.g., how skilled the developers are, how well the stakeholders can explain their needs, how often these needs change, etc.) will lead to experiments that are so expensive to carry out and whose results will have so many unquantifiable factors that these will constitute an insuperable barrier to adoption.

The need for parallel systems to judge the benefits of a new technology is a straw man. And one that is easy to dispel.

For example, if your company provides technical support, you are tracking metrics on how quickly your staff can answer questions. And probably customer satisfaction with your technical support.

Both are common metrics in use today.

Assume the suggestion that linked data to improve technical support for your products. You begin with a pilot project to measure the benefit from the suggested change.

If the length of support calls goes down or customer customer satisfaction goes up, or both, change to linked data. If not, don’t.

Naming a technology as “semantic” doesn’t change how you measure the benefits of a change in process.

LDBC will find purely machine based performance measures easier to produce than answering more difficult socio-technical issues.

But of what value are great benchmarks for a technology that no one wants to use?

See my comments under: Web Data Commons (2012) – [RDFa at 1.28% of 40.5 million websites]. Benchmarks for 1.28% of the web?

December 2, 2012

Cool Code [Chess Program in 4.8 Tweets]

Filed under: Programming,Semantic Web — Patrick Durusau @ 10:51 am

Cool Code by Kevlin Henney.

From the description:

In most disciplines built on skill and knowledge, from art to architecture, from creative writing to structural engineering, there is a strong emphasis on studying existing work. Exemplary pieces from past and present are examined and discussed in order to provoke thinking and learn techniques for the present and the future. Although programming is a discipline with a very large canon of existing work to draw from, the only code most programmers read is the code they maintain. They rarely look outside the code directly affecting their work. This talk examines some examples of code that are interesting because of historical significance, profound concepts, impressive technique, exemplary style or just sheer geekiness.

Some observations:

At about 3:11 or a little before, Kevlin has a slide that reads:

There is an art, craft, and science to programming that exceeds far beyond the program. The act of programming marries the discrete world of computers to the fluid world of human affairs. Programmers mediate between the negotiated and uncertain truths of business and the crisp, uncompromising domain of bits and bytes and higher constructed types.

I rather like the phrases “…marries the discrete world of computers to the fluid world of human affairs,” and “…the negotiated and uncertain truths of business….

It captures the divergence of the AI/Semantic Web paradigm from life as we experience it.

In order to have the Semantic Web, we have to prune “…negotiated and uncertain truths…” until what remains can fit into “…the discrete world of computers….”

You will enjoy Kevlin’s take on RUD (Rapid Unscheduled Disassembly). 😉

Or a chess program written in 672 bytes or 4.8 tweets. (On which see: 1K ZX Chess The code and numerous other resources.)

The presentation is marred only by the unreadability (on the video) of some of the code examples.

Kevlin closes with:

If you don’t have time to read, you don’t have the time or tools to write. (Stephen King)


Kevlin’s homepage, and his papers.

97 Things Every Programmer Should Know: Collective Wisdom from the Experts (Kevlin as editor)

November 28, 2012

Dereferencing Issues

Filed under: Humor,Semantic Web — Patrick Durusau @ 3:06 pm

Robert Cerny, a well known topic map maven, tweeted his favourite #GaryLarson cartoon, this one on dereferencing:

Dereferencing

Semantic Web Explained

Filed under: Humor,Semantic Web — Patrick Durusau @ 6:11 am

Inge Hendriksen tweets: “The #SemanticWeb explained in a single cartoon frame…

November 13, 2012

Managing Conference Hashtags

Filed under: Conferences,Semantic Web,Tagging,Tweets — Patrick Durusau @ 5:13 pm

David Karger tweets today:

Ironically amusing that ontology researchers can’t manage to agree on a canonical tag for their conference #iswc #iswc12 #iswc2012

If that’s true for ontology researchers, what chance does the rest of the world have?

Just to help ontology researchers along a bit (in LTM syntax):

*****

/* typing topics */

[conf = "conference"]

/* scoping topics */

[SWTwiiter01 : conf = "Semantic Web, Twitter hashtag 01."]

[SWTwiiter02 : conf = "Semantic Web, Twitter hashtag 02."]

[SWTwiiter03 : conf = "Semantic Web, Twitter hashtag 03."]

[iswc2012 : conf = "ISWC 2012, The 11th International Semantic Web Conference"
("#iswc" / SWTwitter01)
("#iswc12" / SWTwitter02)
("#iswc2012" / SWTwitter03)]

*****

I added the “conf” typing topic to the scoping topics to distinguish those tags from other for:

ISWC (International Standard Musical Work Code)

Welcome to ISWC 2013! The International Symposium on Wearable Computers (ISWC)

Wikipedia – ISWC, also lists:

International Speed Windsurfing Class

But missed:

International Student Welcome Committee

There remains the task of distinguishing tags in the wild from tags for these other subjects.

Once that is done, all the tweets about the conference, under these or other tags, can be collocated for a full set of tweets about the conference.

Other subjects and relationships, such as person, date, location, topic, tags, retweets, etc., can be just as easily added.


Personally I would make the default sort order for Tweet a function of date/time, quite possibly mis-using sortname for that purpose. People are accustomed to seeing Tweets in time order and fancy collocation can wait until they select an author, subject, tag, etc.

« Newer PostsOlder Posts »

Powered by WordPress