Matt O’Donnell replied to my tweet on a post about Open Graph Database Protocol, saying:
@patrickDurusau re #opengraph thoughts how #topicmaps can be used as alternatives for organic user driven practice for this sort of thing?
A bit more question than I can answer in 140 characters so I am replying here and will tweet the post. 😉
I am not sure that we should be seeking alternatives to “organic user driven practice” for adding information to web pages.
I didn’t always think so and it may be helpful if I say why my opinion has changed.
Set the way-back machine for the late 1980’s, early 1990’s, SciAM had coverage of SGML, the TEI was formed, great days were ahead for document re-use, preservation and processing. Except that the toolkits, for the average user, really sucked. For a variety of reasons, most of which are covered elsewhere, SGML found its way into very sophisticated systems but not much further.
XML, a subset of SGML, with simplified processing rules, was created to enter the market spaces not occupied by SGML. In fact, XML parsers were touted as being a weekend project for the average computer programmer. And certainly, given the gains of this simplied (some would say bastardized) system of markup, users would flock to its use.
Ahem, well, XML has certainly proven to be useful for data interchange (one of its original use cases) and with the advent of better interfaces (i.e., users don’t know they are using XML), XML has finally entered the broad consumer market.
And without a doubt, for a large number of use cases, XML is an excellent solution. Unfortunately is it one that the average user cannot stand. So solutions involving XML markup for the most part display the results and not the process to the average user.
I think there is a lesson to be learned from the journey in markup. I read text with markup almost as easy as I do clear text but that isn’t the average experience. Most users want their titles to be centered, special terms bolded, paragraphs, lists, etc., and they don’t really care how it got that way.
In semantic domains, users want to be able to find the information they need. (full stop) They could care less whether we use enslaved elves, markup, hidden Markov models or some other means to deliver that result.
True enough we have to enlist the assistance of users in such quests but expecting all but a few to use Topic Maps as topic maps is as futile as the Open Graph Database Protocol.
What we need to do is learn from user behavior what semantics they intend and create mechanisms to map that into our information systems. Since that information exists in different information systems, we will need topic maps to merge the resulting content together.
For example, if we determine that a a matter of practice that when a user writes: someText and the someText appears in the resource pointed to by someURL, what is intended is an identification, then we should treat all the same uses of someURL as identifying the same subject.
May be wrong in some cases but it is a start towards capturing what users intend without asking them to do more than they are right now.
Next move (this works for the new ODF metadata mechanism): We associate metadata vocabularies written in RDF or other forms, with documents with the express purpose of being associated with the content of documents. For example, if I write a document about WG3 (the ISO topic maps working group), I should be able to associate a vocabulary with that document that identifies all the likely subjects with no help from me. And when the document is saved, links to identifiers are inserted into the document.
That is we start to create smarter documents rather than trying to harvest dumb ones. Could be done with medical reports, patient charts, economic reports, computer science articles, etc.
Still haven’t reached topic map have I? Well, but all those somewhat smarter documents are going to have different vocabularies. At least if we ever expect usage of that sort of system to take off. Emory Hospital (just to pick one close to home) isn’t going to have exactly the same vocabulary as the Mayo Clinic. And we should not wait for them to have the same vocabulary.
Topic maps come in when we decide that the cost of the mapping is less than the benefit we will gain from mapping across the domains. May never map the physical plant records between Emory Hospital and the Mayo Clinic but could be likely to map nursing reports on particular types of patients or results of medical trials.
My gut instinct is that we need to ask users “what do you mean when you say X?” And whenever possible (a form of validation) ask them if that is another way to say Y?
So, back to your original question: Yes, we can use topic maps to capture more semantics than the Open Graph Database Protocol but I would start with Open Graph Database as a way to gather rough semantics that we could then refine.
It is the jumping from rough or even uncontrolled semantics to a polished result that is responsible for so many problems. A wood worker does not go from a bark covered piece of wood to a finished vase in one step. They slowly remove the bark, bring the wood into round and then begin several steps of refinement. We should go and do likewise.