Blog Odometer Reads: 10,000 (with this post)

I haven’t been posting as heavily every day for the last week or so. Mostly because I wanted to have something special for post #10,000. That “something special” is still a couple of weeks away but I do have observations to mark post #10,000 on this blog.

First and foremost, I have been deeply impressed with the variety of projects seeking to make information easier to retrieve, use and archive. Those are just on the ones I managed to find and post about. I have literally missed thousands of others. My apologies for missing any of your favorite projects and consider this an open invitation to sent them to my attention:

Second, I have been equally saddened by the continued use of names as semantic primitives, that is without any basis for comparison to other names. A name for an element or attribute may be “transparent” to some observers today, but what about ten (10) years from now? Or one hundred (100) years from now? Many of our “classic” texts survive in only one copy or even multiple fragments. Do you really want to rely on chance documenting of data?

Thousands if not hundreds of thousands of people saw the pyramids being built. Such common knowledge they never bothered to write down how it was done. Are you trusting mission critical applications with the same level of documentation?

Third, the difference between semantic projects that flourish and less successful projects isn’t technology, syntax, or an array of vendors leading the band. Rather, the difference is one of ROI (return on investment). If your solution requires decades of investment by third parties who may or may not choose to participate, however clever your solution, it is DOA.

Despite my deep interest in complex and auditable identity based information systems, those aren’t going to be market leaders. Weapons manufacturers, research labs, biomedical, governments and/or wannabe governments are their natural markets.

The road to less complex and perhaps in some ways unauditable identity based information systems has to start with what subjects are you not going to identify? It’s a perfectly legitimate choice to make and one I would be asking about in the world of big data.

You need to know which subjects are not documented and which subjects are documented. As a conscious decision. Unless you don’t mind paying IT to reconstruct what might have been meant by a former member of IT staff.

Fourth, the world of subjects and the “semantic impedance” that Steve Newcomb identified so long ago, is increasing at an exponential rate.

Common terminologies or vocabularies emerge in some fields but even there the question of access to legacy data remains. Not to mention that “legacy” is a term that moves a frame behind our current stage of progress.

Windows XP, used by 95% of bank ATMs becomes unsupported as of today. In twelve short years XP has gone from being “new” software, to being the standard software, now legacy software and in not too many years, dead software.

What are your estimates for the amount of data that will die with Windows XP? For maximum impact, give your estimate in terms of equivalents to the Library at Alexandria. (An unknown amount but it has as much validity as many government and RIAA estimates.)

Finally, as big data and data processing power grows, the need and opportunity for using data from diverse sources grows. Is that going to be your opportunity or the opportunity someone else has to sell you their view of semantics?

I am far more interested in learning and documenting the semantics of you and your staff than creating alien semantics to foist on a workforce (FBI Virtual Case Management project) or trying to boil some tide pool of the semantic ocean (RDF).

You can document your semantics where there is a business, scientific, or research ROI, or you can document someone else’s semantics about your data.

Your call.

If you have found the material in this blog helpful (I hope so), please consider making a donation or send me a book via Amazon or your favorite bookseller.

I have resisted carrying any advertising because I find it distracting at best and at worse it degrades the information content of the blog. Others have made different choices and that’s fine, for their blogs.

Comments are closed.