Simple Web Semantics
For what it’s worth, what follows in this post is a partial, non-universal and useful only in some cases proposal.
That has been forgotten by this point but in my defense, I did try to warn you.
1. Division of Semantic Labor
The first step towards useful semantics on the web must be a division of semantic labor.
I won’t recount the various failures of the Semantic Web, topic maps and other initiatives to “educate” users on how they should encode semantics.
All such efforts have, are now and will fail.
That is not a negative comment on users.
In another life I advocated tools that would enable biblical scholars to work in XML, without having to learn angle-bang syntax. It wasn’t for lack of intelligence, most of them were fluent in five or six ancient languages.
They were focused on being biblical scholars and had no interest in learning the minutiae of XML encoding.
After many years, due to a cast of hundreds if not thousands, OpenOffice, OpenDocumentFormat (ODF) and XML editing became available to the ordinary users.
Not the fine tuned XML of the Text Encoding Initiative (TEI) or DocBook, but having a 50 million plus user share is better than being in the 5 to 6 digit range.
Users have not succeeded in authoring structured data, such as RDF, but have demonstrated competence at authoring <a> elements with URIs.
I propose the following division of semantic labor:
Users – Responsible for identification of subjects in content they author, using URIs in the <a> element.
Experts – Responsible for annotation (by whatever means) of URIs that can be found in <a> elements in content.
2. URIs as Pointers into a Dictionary
One of the comments in these series pointed out that URIs are like “pointers into a dictionary.” I like that imagery and it is easier to understand than the way I intended to say it.
If you think of words as pointers into a dictionary, how many dictionaries does a word point into?
And contrast your answer with the number of dictionaries into which a URI points?
If we are going to use URIs as “pointers into a dictionary,” then there should be no limit on the number of dictionaries into which they can point.
A URI can be posed to any number of dictionaries as a query, with possibly different results from each dictionary.
3. Of Dictionaries
Take for example the URI, http://data.nytimes.com/47271465269191538193 as an example of a URI that can appear in a dictionary.
If you follow that URI, you will notice a couple of things:
- It isn’t content suitable for primary or secondary education.
- The content is limited to that of the New York Times.
- The content of the NYT consists of article pointers
Not to mention it is a “pull” interface that requires effort on the part of users, as opposed to a “push” interface that reduces that effort.
What if rather than “following” the URI http://data.nytimes.com/47271465269191538193, you could submit that same URI to another dictionary, one than had different information?
A dictionary that for that URI returns:
- Links to content suitable for primary or secondary education.
- Broader content than just New York Times.
- Curated content and not just article pointers
Just as we have semantic diversity:
URI dictionaries shall not be required to use a particular technology or paradigm.
4. Immediate Feedback
Whether you will admit it or not, we have all coded HTML and then loaded it in a browser to see the results.
That’s called “immediate feedback” and made HTML, the early versions anyway, extremely successful.
When <a> elements with URIs are used to identify subjects, how can we duplicate that “immediate feedback” experience?
My suggestion is that users encode in the <head> of their documents a meta element that reads:
<meta name=”dictionary” content=”URI”>
Think of it as being the equivalent of spell checking except for subjects. You could even call it “subject checking.”
For most purposes, dictionaries should only return 3 or 4 key/values pairs, enough for users to verify their choice of a URI. With an option to see more information.
True enough, I haven’t asked for users to say which of those properties identify the subject in question and I don’t intend to. That lies in the domain of experts.
The inline URI mechanism lends itself to automatic insertion of URIs, which users could then verify capture their meaning. (Wikifier is a good example, assuming you have a dictionary based on Wikipedia URIs.)
Users should be able to choose the dictionaries they prefer for identification of subjects. Further, users should be able to verify their identifications from observing properties associated with a URI.
5. Incentives, Economic and Otherwise
There are economic and other incentives that arise from “Simple Web Semantics.”
First, divorcing URI dictionaries from any particular technology will create an easy on ramp for dictionary creators to offer as many or few services as they choose. Users can vote with their feet on which URI dictionaries meet their needs.
Second, divorcing URIs from their sources creates the potential for economic opportunities and competition in the creation of URI dictionaries. Dictionary creators can serve up definitions for popular URIs, along with pointers to other content, free and otherwise.
Third, giving users the right to choose their URI dictionaries is a step towards returning democracy to the WWW.
Fourth, giving users immediate feedback based on URIs they choose, makes users the judges of their own semantics, again.
Fifth, with the rise of URI dictionaries, the need to maintain URIs, “cool” or otherwise, simply disappears. No one maintains the existence of words. We have dictionaries.
There are technical refinements that I could suggest but I wanted to draw the proposal in broad strokes and improve it based on your comments.
PS: As I promised at the beginning, this proposal does not address many of the endless complexities of semantic integration. If you need a different solution, for a different semantic integration problem, you know where to find me.