Archive for the ‘Standards’ Category

Program Derivation for Functional Languages – Tuesday, March 29, 2016 Utecht

Wednesday, March 9th, 2016

Program Derivation for Functional Languages by Felienne Hermans.

From the webpage:

Program Derivation for Functional Languages

Program derivation of course was all the rage in the era of Dijkstra, but is it still relevant today in the age of TDD and model checking? Felienne thinks so!

In this session she will show you how to systematically and step-by-step derive a program from a specification. Functional languages especially are very suited to derive programs for, as they are close to the mathematical notation used for proofs.

You will be surprised to know that you already know and apply many techniques for derivation, like Introduce Parameter as supported by Resharper. Did you know that is actually program derivation technique called generalization?

I don’t normally post about local meetups but as it says in the original post, Felienne is an extraordinary speaker and the topic is an important one.

Personally I am hopeful that at least slides and/or perhaps even video will emerge from this presentation.

If you can attend, please do!

In the meantime, if you need something to tide you over, consider:

A Calculus of Functions for Program Derivation by Richard Bird (1987).

Lectures on Constructive Functional Programming by R.S. Bird (1988).

A brief introduction to the derivation of programs by Juris Reinfelds (1986).

MapR on Open Data Platform: Why we declined

Wednesday, April 29th, 2015

From the post:

Open Data Platform is “solving” problems that don’t need solving

Companies implementing Hadoop applications do not need to be concerned about vendor lock-in or interoperability issues. Gartner analysts Merv Adrian and Nick Heudecker disclosed in a recent blog that less than 1% of companies surveyed thought that vendor lock-in or interoperability was an issue—dead last on the list of customer concerns. Project and sub-project interoperability are very good and guaranteed by both free and paid-for distributions. Applications built on one distribution can be migrated with virtually zero switching costs to the other distributions.

~75% of Hadoop implementations run on MapR and Cloudera. MapR and Cloudera have both chosen not to participate. The Open Data Platform without MapR and Cloudera is a bit like one of the Big Three automakers pushing for a standards initiative without the involvement of the other two.

I mention this post because it touches on two issues that should concern all users of Hadoop applications.

On “vendor lock-in” you will find the question that was asked was “…how many attendees considered vendor lock-in a barrier to investment in Hadoop. It came in dead last. With around 1% selecting it.” Who Asked for an Open Data Platform?. Considering that it was in the context of a Gartner webinar, it could have been only one person selected it. Not what I would call a representative sample.

Still, I think John in right in saying that vendor lock-in isn’t a real issue with Hadoop. Hadoop applications aren’t off the shelf items and are custom constructs for your needs and data. Not much opportunity for vendor lock-in. You’re in greater danger of IT lock-in due to poor or non-existent documentation for your Hadoop application. If anyone tells you a Hadoop application doesn’t need documentation because you can “…read the code…,” they are building up job security, quite possibly at your future expense.

John is spot on about the Open Data Platform not including all of the Hadoop market leaders. As John says, Open Data Platform does not include those responsible for 75% of the existing Hadoop implementations.

I have seen that situation before in standards work and it never leads to a happy conclusion, for the participants, non-participants and especially the consumers, who are supposed to benefit from the creation of standards. Non-standards for a minority of the market only serve to confuse not overly clever consumers. To say nothing of the popular IT press.

The Open Data Platform also raises questions about how one goes about creating a standard. One approach is to create a standard based on your projection of market needs and to campaign for its adoption. Another is to create a definition of an “ODP Core” and see if it is used by customers in development contracts and purchase orders. If consumers find it useful, they will no doubt adopt it as a de facto standard. Formalization can follow in due course.

So long as we are talking about possible future standards, a practice of documentation more advanced than C style comments for Hadoop ecosystems would be a useful Hadoop standard in the future.

58 XML Specs Led the Big Parade!

Friday, January 16th, 2015

Earlier this week I ferreted out most of the current XML specifications from the W3C site. I say “most” because I didn’t take the time to run down XML “related” standards such as SVG, etc. At some point I will spend the time to track down all the drafts, prior versions, and related materials.

But, for today, I have packaged up the fifty-eight (58) current XML standards in 58XMLRecs.tar.gz.

BTW, do realize that Extensible Stylesheet Language (XSL) Version 1.0 and XHTML™ Modularization 1.1 – Second Edition have table of contents only versions. I included the full HTML file versions in the package.

You can use grep or other search utilities to search prior XML work for definitions, productions, etc.

Do you remember the compilation of XML standards that used the old MS Help application? The file format was a variation on RTF. Ring any bells? Anything like that available now?

Draft Sorted Definitions for XPath 3.1

Thursday, January 15th, 2015

I have uploaded a draft of sorted definitions for XPath 3.1. See: http://www.durusau.net/publications/xpath-alldefs-sorted.html

I ran across an issue you may encounter in the future with W3C documents in general and these drafts in particular.

While attempting to sort on the title attribute of the a elements that mark each definition, I got the following error:

A sequence of more than one item is not allowed as the @select attribute of xsl:sort

Really?

The stylesheet was working with a subset of the items but not when I added more items to it.

<p>[<a name=”dt-focus” id=”dt-focus” title=”focus” shape=”rect”>Definition</a>: The first three components of the <a title=”dynamic context” href=”#dt-dynamic-context” shape=”rect”>dynamic context</a> (context item, context position, and context size) are called the <b>focus</b> of the expression. ] The focus enables the processor to keep track of which items are being processed by the expression. If any component in the focus is defined, all components of the focus are defined.</p>

Ouch! The title attribute on the second a element was stepping into my sort select.

The solution:

<xsl:sort select=”a[position()=1]/@title” data-type=”text”/>

As we have seen already, markup in W3C specifications varies from author to author so a fixed set of stylesheets may or may not be helpful. Some XSLT snippets on the other hand are likely to turn out to be quite useful.

One of the requirements for the master deduped and sorted definitions is that I want to know the origin(s) of all the definitions. That is if the definition only occurs in XQuery, I want to know that as well was if the definition only occurs in XPath and XQuery, and so on.

Still thinking about the best way to make that easy to replicate. Mostly because you are going to encounter definition issues in any standard you proof.

Corrected Definitions Lists for XPath/XQuery/etc.

Thursday, January 15th, 2015

In my extraction of the definitions yesterday I produced files that had HTML <p> elements embedded in other HTML <p> elements.

The corrected files are as follows:

These lists are unsorted and the paragraphs with multiple definitions are repeated for each definition. Helps me spot where I have multiple definitions that may be followed by non-normative prose, applicable to one or more definitions.

The XSLT code I used yesterday was incorrect:

<xsl:for-each select=”//p/a[contains(@name, ‘dt’)]”>
<p>
<xsl:copy-of select=”ancestor::p”/>
</p>
</xsl:for-each>

And results in:

<p>
<p>[<a name=”dt-expression-context” id=”dt-expression-context” title=”expression context” shape=”rect”>Definition</a>: The <b>expression
context</b> for a given expression consists of all the information
that can affect the result of the expression.]
</p>
</p>

Which is both ugly and incorrect.

When using xsl:copy-of for a p element, the surrounding p elements were unnecessary.

Thus (correctly):

&lt:xsl:for-each select=”//p/a[contains(@name, ‘dt’)]”>
<xsl:copy-of select=”ancestor::p”/>
</xsl:for-each>

I reproduced the corrected definition files above. Apologies for any inconvenience.

Work continues on the sorting and deduping.

Building Definitions Lists for XPath/XQuery/etc.

Wednesday, January 14th, 2015

I have extracted the definitions from:

These lists are unsorted and the paragraphs with multiple definitions are repeated for each definition. Helps me spot where I have multiple definitions that may be followed by non-normative prose, applicable to one or more definitions.

Usual follies trying to extract the definitions.

My first attempt (never successful in my experience but I have to try it so as to get to the second, third, etc.) resulted in:

DefinitionDefinitionDefinitionDefinitionDefinitionDefinitionDefinitionDefinitionDefinition

Which really wasn’t what I meant. Unfortunately it was what I had asked for. 😉

Just in case you are curious, the guts to extracting the definitions reads:

<xsl:for-each select=”//p/a[contains(@name, ‘dt’)]”>
<p>
<xsl:copy-of select=”ancestor::p”/>
</p>
</xsl:for-each>

Each of the definitions is contained in a p element where the anchor for the definition is contained in an a element with the attribute name, “dt-(somename).”

This didn’t work in all four (4) cases because XPath and XQuery Functions and Operators 3.1 records its “[Definition” elements as:

<p><span class=”termdef”><a name=”character” id=”character” shape=”rect”></a>[Definition] A <b>character</b> is an instance of the <a href=”http://www.w3.org/TR/REC-xml/#NT-Char” shape=”rect”>Char</a><sup><small>XML</small></sup> production of <a href=”#xml” shape=”rect”>[Extensible Markup Language (XML) 1.0 (Fifth Edition)]</a>.</span></p>

I’m sure there is some complex trickery you could use to account for that case but with four files, this is meatball XSLT, results over elegance.

Multiple definitions in one paragraph must be broken out so they can be sorted along with the other definitions.

The one thing I forgot to do in the XSLT that you should do when comparing multiple standards was to insert an identifier at the end of each paragraph for the text it was drawn from. Thus:

[Definition: Every instance of the data model is a sequence. XDM]

Where XDM is in a different color for each source.

Proofing all these definitions across four (4) specifications (XQueryX has no additions definitions, aside from unnecessarily restating RFC 2119) is no trivial matter. Which is why I have extracted them and will be producing a deduped and sorted version.

When you have long or complicated standards to proof, it helps to break them down in to smaller parts. Especially if the parts are out of their normal reading context. That helps avoid simply nodding along because you have read the material so many times.

FYI, comments on errors most welcome! Producing the lists was trivial. Proofing the headers, footers, license language, etc. took longer than the lists.

Enjoy!

More on Definitions in XPath/XQuery/XDM 3.1

Tuesday, January 13th, 2015

I was thinking about the definitions I extracted from XPath 3.1 Definitions Here! Definitions There! Definitions Everywhere! XPath/XQuery 3.1 and since the XPath 3.1 draft says:

Because these languages are so closely related, their grammars and language descriptions are generated from a common source to ensure consistency, and the editors of these specifications work together closely.

We are very likely to find that the material contained in definitions and the paragraphs containing definitions are in fact the same.

To make the best use of your time then, what is needed is a single set of the definitions from XPath 3.1, XQuery 3.1, XQueryX 3.1, XQuery and XPath Data Model 3.1, and XQuery Functions and Operators 3.1.

I say that, but then on inspecting some of the definitions in XQuery and XPath Data Model 3.1, I read:

[Definition: An atomic value is a value in
the value space of an atomic type and is labeled with the name of
that atomic type.]

[Definition: An atomic type is a primitive
simple type
or a type derived by restriction from another
atomic type.] (Types derived by list or union are not atomic.)

But in the file of definitions from XPath 3.1, I read:

[Definition: An atomic value is a value in the value space of an atomic type, as defined in [XML Schema 1.0] or [XML Schema 1.1].]

Not the same are they?

What happened to:

and is labeled with the name of that atomic type.

That seems rather important. Yes?

The phrase “atomic type” occurs forty-six (46) times in the XPath 3.1 draft, none of which define “atomic type.”

It does define “generalized atomic type:”

[Definition: A generalized atomic type is a type which is either (a) an atomic type or (b) a pure union type ].

Which would make you think it would have to define “atomic type” as well, to declare the intersection with “pure union type.” But it doesn’t.

In case you are curious, XML Schema 1.1 doesn’t define “atomic type” either. Rather it defines “anyAtomicType.”

In XML Schema 1.0 Part 1, the phrase “atomic type” is used once and only once in “3.14.1 (non-normative) The Simple Type Definition Schema Component,” saying:

Each atomic type is ultimately a restriction of exactly one such built-in primitive datatype, which is its {primitive type definition}.

There is no formal definition nor is there any further discussion of “atomic type” in XML Schema 1.0 Part 1.

XML Schema Part 2 is completely free of any mention of “atomic type.”

Summary of the example:

At this point we have been told that XPath 3.1 relies on XQuery and XPath Data Model 3.1 but also XML Schema 1 and XML Schema 1.1, which have inconsistent definitions of “atomic type,” when it exists at all.

Moreover, XPath 3.1 relies upon undefined term (atomic value) to define another term (generalized atomic type), which is surely an error in any reading.

This is a good illustration of what happens when definitions are not referenced from other documents with specific and resolvable references. Anyone checking such a definition would have noticed it missing in the referenced location.

Summary on next steps:

I was going to say a deduped set of definitions would serve for proofing all the drafts and now, despite the “production from a common source,” I’m not so sure.

Probably the best course is to simply extract all the definitions and check them for duplication rather than assuming it.

The questions of what should be note material and other issues will remain to be explored.

Definitions Here! Definitions There! Definitions Everywhere! XPath/XQuery 3.1

Monday, January 12th, 2015

Would you believe there are one hundred and forty-eight definitions embedded in XPath 3.1?

What strikes me as odd is that the same one hundred and forty-eight definitions appear in a non-normative glossary, sans what looks like the note material that follows some definitions in the normative prose.

The first issue is why have definitions in both normative and non-normative prose? Particularly when the versions in non-normative prose lack the note type material found in the main text.

Speaking of normative, did you now that normatively, document order is defined as:

Informally, document order is the order in which nodes appear in the XML serialization of a document.

So we have formal definitions that are giving us informal definitions.

That may sound like being picky but haven’t we seen definitions of “document order” before?

Grepping the current XML specifications from the W3C, I found 147 mentions of “document order” outside of the current drafts.

I really don’t think we have gotten this far with XML without a definition of “document order.”

Or “node,” “implementation defined,” “implementation dependent,” “type,” “digit,” “literal,” “map,” “item,” “axis step,” in those words or ones very close to them.

• My first puzzle is why redefine terms that already exist in XML?
• My second puzzle is the one I mentioned above, why repeat shorter versions of the definitions in an explicitly non-normative appendix to the text?

For a concrete example of the second puzzle:

For example:

[Definition: The built-in functions
supported by XPath 3.1 are defined in [XQuery and XPath Functions and Operators
3.1]
.] Additional functions may be provided
in the static
context
. XPath per se does not provide a way to declare named
functions, but a host language may provide such a
mechanism.

First, you are never told what section of XQuery and XPath Functions and Operators 3.1 has this definition so we are back to the 5,000 x N problem.

Second, what part of:

XPath per se does not provide a way to declare named functions, but a host language may provide such a mechanism.

Does not look like a note to you?

Does it announce some normative requirement for XPath?

Proofing is made more difficult because of the overlap of these definitions, verbatim, in XQuery 3.1. Whether it is a complete overlap or not I can’t say because I haven’t extracted all the definitions from XQuery 3.1. The XQuery draft reports one hundred and sixty-five (165) definitions, so it introduces additional definitions. Just spot checking, the overlap looks substantial. Add to that the same repetition of terms as shorter entries in the glossary.

There is the accomplice XQuery and XPath Data Model 3.1, which is alleged to be the source of many definitions but not well known enough to specify particular sections. In truth, many of the things it defines have no identifiers so precise reference (read hyperlinking to a particular entry) may not even be possible.

I make that to be at least six sets of definitions, mostly repeated because one draft won’t or can’t refer to prior XML definitions of the same terms or the lack of anchors in these drafts, prevents cross-referencing by section number for the convenience of the reader.

I can ease your burden to some extent, I have created an HTML file with all the definitions in XPath 3.1, the full definitions, for your use in proofing these drafts.

I make no warranty about the quality of the text as I am a solo shop so have no one to proof copy other than myself. If you spot errors, please give a shout.

I will see what I can do about extracting other material for your review.

What we actually need is a concordance of all these materials, sans the digrams and syntax productions. KWIC concordances don’t do so well with syntax productions. Or tables. Still, it might be worth the effort.

Structural Issues in XPath/XQuery/XPath-XQuery F&O Drafts

Friday, January 9th, 2015

Apologies as I thought I was going to be further along in demonstrating some proofing techniques for XPath 3.1, XQuery 3.1, XPath and XQuery Functions and Operations 3.1 by today.

Instead, I encountered structural issues that are common to all three drafts that I didn’t anticipate but that need to be noted before going further with proofing. I will be using sample material to illustrate the problems and will not always have a sample from all three drafts or even note every occurrence of the issues. They are too numerous for that treatment and it would be repetition for repetition’s sake.

First, consider these passages from XPath 3.1, 1 Introduction:

[Definition: XPath 3.1 operates on the abstract, logical structure of an XML document, rather than its surface syntax. This logical structure, known as the data model, is defined in [XQuery and XPath Data Model (XDM) 3.1].]

[Definition: An XPath 3.0 Processor processes a query according to the XPath 3.0 specification.] [Definition: An XPath 2.0 Processor processes a query according to the XPath 2.0 specification.] [Definition: An XPath 1.0 Processor processes a query according to the XPath 1.0 specification.]

1. Unnumbered Definitions – Unidentified Cross-References

The first structural issue that you will note with the “[Definition…” material is that all such definitions are unnumbered and appear throughout all three texts. The lack of numbering means that it is difficult to refer with any precision to a particular definition. How would I draw your attention to the third definition of the second grouping? Searching for XPath 1.0 turns up 79 occurrences in XPath 3.1 so that doesn’t sound satisfactory. (FYI, “Definition” turns up 193 instances.)

While the “Definitions” have anchors that allow them to be addressed by cross-references, you should note that the cross-references are text hyperlinks that have no identifier by which a reader can find the definition without using the hyperlink. That is to say when I see:

A lexical QName with a prefix can be converted into an expanded QName by resolving its namespace prefix to a namespace URI, using the statically known namespaces. [These are fake links to draw your attention to the text in question.]

The hyperlinks in the original will take me to various parts of the document where these definitions occur, but if I have printed the document, I have no clue where to look for these definitions.

The better practice is to number all the definitions and since they are all self-contained, to put them in a single location. Additionally, all interlinear references to those definitions (or other internal cross-references) should have a visible reference that enables a reader to find the definition or cross-reference, without use of an internal hyperlink.

Example:

A lexical QName Def-21 with a prefix can be converted into an expanded QName Def-19 by resolving its namespace prefix to a namespace URI, using the statically known namespaces. Def-99 [These are fake links to draw your attention to the text in question. The Def numbers are fictitious in this example. Actual references would have the visible definition numbers assigned to the appropriate definition.]

2. Vague references – $N versus 5000 x$N

Another problem I encountered was what I call “vague references,” or less generously, $N versus 5,000 x$N.

For example:

[Definition: An atomic value is a value in the value space of an atomic type, as defined in [XML Schema 1.0] or [XML Schema 1.1].] [Definition: A node is an instance of one of the node kinds defined in [XQuery and XPath Data Model (XDM) 3.1].

Contrary to popular opinion, standards don’t write themselves and every jot and tittle was placed in a draft at the expense of someone’s time and resources. Let’s call that $N. In the example, you and I both know somewhere in XML Schema 1.0 and XML Schema 1.1 that the “value space of the atomic type” is defined. The same is true for nodes and XQuery and XPath Data Model (XDM) 3.1. But where? The authors of these specifications could insert that information at a cost of$N.

What is the cost of not inserting that information in the current drafts? I estimate the number of people interested in reading these drafts to be 5,000. So each of those person will have to find the same information omitted from these specifications, which is a cost of 5,000 x $N. In terms of convenience to readers and reducing their costs of reading these specifications, references to exact locations in other materials are a necessity. In full disclosure, I have no more or less reason to think 5,000 people are interested in these drafts than the United States has for positing the existence of approximately 5,000 terrorists in the world. I suspect the number of people interested in XML is actually higher but the number works to make the point. Editors can either convenience themselves or their readers. Vague references are also problematic in terms of users finding the correct reference. The citation above, [XML Schema 1.0] for “value space of an atomic type,” refers to all three parts of XML Schema 1.0. Part 1, at 3.14.1 (non-normative) The Simple Type Definition Schema Component, has the only reference to “atomic type.” Part 2, actually has “0” hits for “atomic type.” True enough, “2.5.1.1 Atomic datatypes” is likely the intended reference but that isn’t what the specification says to look for. Bottom line is that any external reference needs to include in the inline citation the precise internal reference in the work being cited. If you want to inconvenience readers by pointing to internal bibliographies rather than online HTML documents, where available, that’s an editorial choice. But in any event, for every external reference, give the internal reference in the work being cited. Your readers will appreciate it and it could make your work more accurate as well. 3. Normative vs. Non-Normative Text Another structural issue which is important for proofing is the distinction between normative and non-normative text. In XPath 3.1, still in the Introduction we read: This document normatively defines the static and dynamic semantics of XPath 3.1. In this document, examples and material labeled as “Note” are provided for explanatory purposes and are not normative. OK, and under 2.2.3.1 Static Analysis Phase (XPath 3.1), we find: Examples of inferred static types might be: Which is followed by a list so at least we know where the examples end. However, there are numerous cases of: For example, with the expression substring($a, $b,$c), $a must be of type xs:string (or something that can be converted to xs:string by the function calling rules), while$b and $c must be of type xs:double. [also in 2.2.3.1 Static Analysis Phase (XPath 3.1)] So, is that a non-normative example? If so, what is the nature of the “must” that occurs in it? Is that normative? Moreover, the examples (XPath 3.1 has 283 occurrences of that term, XQuery has 455 occurrences of that term, XPath and XQuery Functions and Operators have 537 occurrences of that term) are unnumbered, which makes referencing the examples by other materials very imprecise and wordy. For the use of authors creating secondary literature on these materials, to promote adoption, etc., number of all examples should be the default case. Oh, before anyone protests that XPath and XQuery Functions and Operators has separated its examples into lists, that is true but only partially. There remain 199 occurrences of “for example” which do not occur in lists. Where lists are used, converting to numbered examples should be trivial. The elimination of “for example” material may be more difficult. Hard to say without a good sampling of the cases. Conclusion: As I said at the outset, apologies for not reaching more substantive proofing techniques but structural issues are important for the readability and usability of specifications for readers. Being correct and unreadable isn’t a useful goal. It may seem like some of the changes I suggest are a big “ask” this late in the processing of these specifications. If this were a hand edited document, I would quickly agree with you. But it’s not. Or at least it shouldn’t be. I don’t know where the source is held but the HTML you read is an generated artifact. Gathering and numbering the definitions and inserting those numbers into the internal cross-references are a matter of applying a different style sheet to the source. Fixing the vague references and unnumbered example texts would take more editorial work but readers would greatly benefit from precise references and a clear separation of normative from non-normative text. I will try again over the weekend to reach aids for substantive proofing on these drafts. With luck, I will return to these drafts on Monday of next week (12 January 2014). MUST in XPath 3.1/XQuery 3.1/XQueryX 3.1 Wednesday, January 7th, 2015 I mentioned the problems with redefining may and must in XPath and XQuery Functions and Operators 3.1 in Redefining RFC 2119? Danger! Danger! Will Robinson! last Monday. Requirements language is one of the first things to check for any specification so I thought I should round that issue out by looking at the requirement language in XPath 3.1, XQuery 3.1, and, XQueryX 3.1. XPath 3.1 XPath 3.1 includes RFC 2119 as a normative reference but then never cites RFC 2119 in the document or use the uppercase MUST. I suspect that is the case because of Appendix F Conformance: XPath is intended primarily as a component that can be used by other specifications. Therefore, XPath relies on specifications that use it (such as [XPointer] and [XSL Transformations (XSLT) Version 3.0]) to specify conformance criteria for XPath in their respective environments. Specifications that set conformance criteria for their use of XPath must not change the syntactic or semantic definitions of XPath as given in this specification, except by subsetting and/or compatible extensions. The specification of such a language may describe it as an extension of XPath provided that every expression that conforms to the XPath grammar behaves as described in this specification. (Edited on include the actual links to XPointer and XSLT, pointing internally to a bibliography defeats the purpose of hyperlinking.) Personally I would simply remove the RFC 2119 reference since XPath 3.1 is a set of definitions to which conformance is mandated or not, by other specifications. XQuery 3.1 and XQueryX 3.1 XQuery 3.1 5 Conformance reads in part: This section defines the conformance criteria for an XQuery processor. In this section, the following terms are used to indicate the requirement levels defined in [RFC 2119]. [Definition: MUST means that the item is an absolute requirement of the specification.] [Definition: MUST NOT means that the item isan absolute prohibition of the specification.] [Definition: MAY means that an item is truly optional.] [Definition: SHOULD means that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.] (Emphasis in the original) XQueryX 3.1 5 Conformance reads in part: This section defines the conformance criteria for an XQueryX processor (see Figure 1, “Processing Model Overview”, in [XQuery 3.1: An XML Query Language] , Section 2.2 Processing Model XQ31. In this section, the following terms are used to indicate the requirement levels defined in [RFC 2119]. [Definition: MUST means that the item is an absolute requirement of the specification.] [Definition: SHOULD means that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.] [Definition: MAY means that an item is truly optional.] First, the better practice is not to repeat definitions found elsewhere (a source of error and misstatement) but to cite RFC 2119 as follows: The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in [RFC2119]. Second, the bolding found in XQuery 3.1 of MUST, etc., is unnecessary, particularly when not then followed by bolding in the use of MUST in the conformance clauses. Best practice to simply use UPPERCASE in both cases. Third, and really my principal reason for mentioning XQuery 3.1 and XQueryX 3.1 is to call attention to their use of RFC 2119 keywords. That is to say you will find the keywords in the conformance clauses and not any where else in the specification. Both use the word “must” in their texts but only as would normally appear in prose and implementers don’t have to pour through a sprinkling of MUST as you see in some drafts, which makes for stilted writing and traps for the unwary. The usage of RFC 2119 keywords in XQuery 3.1 and XQueryX 3.1 make the job of writing in declarative prose easier, eliminates the need to distinguish MUST and must in the normative text, and gives clear guidance to implementers as to the requirements to be met for conformance. I was quick to point out an error in my last post so it is only proper that I be quick to point out a best practice in XQuery 3.1 and XQueryX 3.1 as well. This coming Friday, 9 January 2015, I will have a post on proofing content proper for this bundle of specifications. PS: I am encouraging you to take on this venture into proofing specifications because this particular bundle of W3C specification work is important for pointing into data. If we don’t have reliable and consistent pointing, your topic maps will suffer. Redefining RFC 2119? Danger! Danger! Will Robinson! Monday, January 5th, 2015 I’m lagging behind in reading XQuery 3.1: An XML Query Language, XML Path Language (XPath) 3.1, and, XPath and XQuery Functions and Operators 3.1 in order to comment by 13 February 2015. In order to catch up this past weekend I started trying to tease these candidate recommendations apart to make them easier to proof. One of the things I always do I check for key word conformance language and that means, outside of ISO, RFC 2119. I was reading XPath and XQuery Functions and Operators 3.1 (herein Functions and Operators) when I saw: 1.1 Conformance The Functions and Operators specification is intended primarily as a component that can be used by other specifications. Therefore, Functions and Operators relies on specifications that use it (such as [XML Path Language (XPath) 3.1], [XQuery 3.1: An XML Query Language], and potentially future versions of XSLT) to specify conformance criteria for their respective environments. That works. You have a normative document of definitions, etc., and some other standard cites those definitions and supplies the must,should, may according to RFC 2119. Not common but that works. But then I started running scripts for usage of key words and I found in Functions and Operators: 1.6.3 Conformance terminology [Definition] may Conforming documents and processors are permitted to, but need not, behave as described. [Definition] must Conforming documents and processors are required to behave as described; otherwise, they are either non-conformant or else in error. Thus the title: Redefining RFC 2119? Danger! Danger! Will Robinson! RFC 2119 reads in part: 1. MUST This word, or the terms “REQUIRED” or “SHALL”, mean that the definition is an absolute requirement of the specification. 5. MAY This word, or the adjective “OPTIONAL”, mean that an item is truly optional. One vendor may choose to include the item because a particular marketplace requires it or because the vendor feels that it enhances the product while another vendor may omit the same item. An implementation which does not include a particular option MUST be prepared to interoperate with another implementation which does include the option, though perhaps with reduced functionality. In the same vein an implementation which does include a particular option MUST be prepared to interoperate with another implementation which does not include the option (except, of course, for the feature the option provides.) 6. Guidance in the use of these Imperatives Imperatives of the type defined in this memo must be used with care and sparingly. In particular, they MUST only be used where it is actually required for interoperation or to limit behavior which has potential for causing harm (e.g., limiting retransmisssions) For example, they must not be used to try to impose a particular method on implementors where the method is not required for interoperability. First, the referencing of RFC 2119 is standard practice at the W3C, at least with regard to XML specifications. I wanted to have more than personal experience to cite so I collected the fifty-eight current XML specifications and summarize them in the list at the end of this post. Of the fifty-nine (59) current XML specifications (there may be others, the W3C has abandoned simply listing its work without extraneous groupings), fifty-two of the standards cite and follow RFC 2119. Three of the remaining seven (7) fail to cite RFC due to errors in editing. The final four (4) as it were that don’t cite RFC 2119 are a good illustration of how errors get perpetuated from one standard to another. The first W3C XML specification to not cite RFC 2119 was: Extensible Markup Language (XML) 1.0 (Second Edition) where it reads in part: 1.2 Terminology may [Definition: Conforming documents and XML processors are permitted to but need not behave as described.] must [Definition: Conforming documents and XML processors are required to behave as described; otherwise they are in error. ] The definitions of must and may were ABANDONED in Extensible Markup Language (XML) 1.0 (Third Edition), which simply dropped those definitions and instead reads in part: 1.2 Terminology The terminology used to describe XML documents is defined in the body of this specification. The key words must, must not, required, shall, shall not, should, should not, recommended, may, and optional, when emphasized, are to be interpreted as described in [IETF RFC 2119]. The exclusive use of RFC 2119 continues through Extensible Markup Language (XML) 1.0 (Fourth Edition) to the current Extensible Markup Language (XML) 1.0 (Fifth Edition) However, as is often said, whatever good editing we do is interred with us and any errors we make live on. Before the abandonment of attempts to define may and must appeared in XML 3rd edition, XML Schema Part 1: Structures Second Edition and XML Schema Part 2: Datatypes Second Edition cite XML 2nd edition as their rationale for defining may and must. That error has never been corrected. Which brings us to W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes which is the last W3C XML specification to not cite RFC 2119. XSD 1.1 Part 2 reads in part, under Appendix I Changes since version 1.0, I.4 Other Changes: The definitions of must, must not, and ·error· have been changed to specify that processors must detect and report errors in schemas and schema documents (although the quality and level of detail in the error report is not constrained). The problem being XML Schema Part 2: Datatypes Second Edition relies upon XML Schema Part 2: Datatypes Second Edition which cites Extensible Markup Language (XML) 1.0 (Second Edition) as the reason for redefining the terms may and must. The redefining of may and must relies upon language in a superceded version of the XML standard. Language that was deleted ten (10) years ago from the XML standard. If you have read this far, you have a pretty good guess that I am going to suggest that XPath and XQuery Functions and Operators 3.1 drop the attempt to redefine terms that appear in RFC 2119. First, redefining widely used terms for conformance is clearly a bad idea. Do you mean RFC2119 must or do you mean and F&O must? Clearly different. If a requirement has an RFC2119 must, my application either conforms or fails. If a requirement has an F&O must, my application may simple be in error. All the time. Is that useful? Second, by redefining must, we lose the interoperability aspects as define by RFC2119 for all uses of must. Surely interoperability is a goal of Functions and Operators. Yes? Third, the history of redefining may and must at the W3C shows (to me) the perpetuation of an error long beyond its correction date. It’s time to put an end to redefining may and must. PS: Before you decide you “know” the difference in upper and lower case key words from RFC 2119, take a look at: RFC Editorial Guidelines and Procedures, Normative References to RFC 2119. Summary, UPPER CASE is normative, lower case is “a necessary logical relationship.” PPS: Tracking this error down took longer than expected so it will be later this week before I have anything that may help with proofing the specifications. XML Standards Consulted in preparation of this post. Y = Cites RFC 2119, N = Does not cite RFC 2119. More Public Input @ W3C Tuesday, November 11th, 2014 In an effort to get more public input on W3C drafts, a new mailing list has been created: public-review-announce@w3.org list. One outcome of this list could be little or not increase in public input on W3C drafts. In which case the forces that favor a closed club at the W3C will be saying “I told you so,” privately of course. Another outcome of this list could be an increase in public input on W3C drafts. From a broader range of stakeholders that has been the case in the past. In which case the W3C drafts will benefit from the greater input and the case can be made for a greater public voice at the W3C. But the fate of a greater public voice at the W3C rests with you and others like you. If you don’t speak up when you have the opportunity, people will assume you don’t want to speak at all. Perhaps wrong but that is the way it works. My recommendation is that you subscribe to this new list and as appropriate, spread the news of W3C drafts of interest to stakeholders in your community. More than that, you should actively encourage people to review and submit comments. And review and submit comments yourself. The voice at risk is yours. Your call. subscribe to public-review-announce Project Paradox Monday, September 22nd, 2014 Care to name projects and standards that suffered from the project paradox? I first saw this in a tweet by Tobias Fors A Greater Voice for Individuals in W3C – Tell Us What You Would Value [Deadline: 30 Sept 2014] Friday, September 12th, 2014 A Greater Voice for Individuals in W3C – Tell Us What You Would Value by Coralie Mercier. From the post: How is the W3C changing as the world evolves? Broadening in recent years the W3C focus on industry is one way. Another was the launch in 2011 of W3C Community Groups to make W3C the place for new standards. W3C has heard the call for increased affiliation with W3C, and making W3C more inclusive of the web community. W3C responded through the development of a program for increasing developer engagement with W3C. Jeff Jaffe is leading a public open task force to establish a program which seeks to provide individuals a greater voice within W3C, and means to get involved and help shape web technologies through open web standards. Since Jeff announced the version 2 of the Webizen Task Force, we focused on precise goals, success criteria and a selection of benefits, and we built a public survey. The W3C is a membership based organisation supported by way of membership fees, as to form a common set of technologies, written to the specifications defined through the W3C, which the web is built upon. The proposal (initially called Webizen but that name may change and we invite your suggestions in the survey), seeks to extend participation beyond the traditional forum of incorporated entities with an interest in supporting open web standards, through new channels into the sphere of individual participation, already supported through the W3C community groups. Today the Webizen Task Force is releasing a survey which will identify whether or not sufficient interest exists. The survey asks if you are willing to become a W3C Webizen. It offers several candidate benefits and sees which ones are of interest; which ones would make it worthwhile to become Webizens. I took the survey today and suggest that you do the same before 30 September 2014. In part I took the survey because on one comment on the original post that reads: What a crock of shit! The W3C is designed to not be of service to individuals, but to the corporate sponsors. Any ideas or methods to improve web standards should not be taken from sources other then the controlling corporate powers. I do think that as a PR stunt the Webizen concept could be a good ploy to allow individuals to think they have a voice, but the danger is that they may be made to feel as if they should have a voice. This could prove detrimental in the future. I believe the focus of the organization should remain the same, namely as a organization that protects corporate interests and regulates what aspects of technology can be, and should be, used by individuals. The commenter apparently believes in a fantasy world where those with the gold don’t make the rules. I am untroubled by those with the gold making the rules, so long as the rest of us have the opportunity for persuasion, that is to be heard by those making the rules. My suggestion at #14 of the survey reads: The anti-dilution of “value of membership” position creates a group of second class citizens, which can only lead to ill feelings and no benefit to the W3C. It is difficult to imagine that IBM, Oracle, HP or any of the other “members” of the W3C are all that concerned with voting on W3C specifications. They are likely more concerned with participating in the development of those standards. Which they could do without being members should they care to submit public comments, etc. In fact, “non-members” can contribute to any work currently under development. If their suggestions have merit, I rather doubt their lack of membership is going to impact acceptance of their suggestions. Rather than emphasizing the “member” versus “non-member” distinction, I would create a “voting member” and “working member” categories, with different membership requirements. “Voting members” would carry on as they are presently and vote on the administrative aspects of the W3C. “Working members” who consist of employees of “voting members,” “invited experts,” and “working members” who meet some criteria for interest in and expertise at a particular specification activity. Like an “invited expert” but without heavy weight machinery. Emphasis on the different concerns of different classes of membership would go a long way to not creating a feeling of second class citizenship. Or at least it would minimize it more than the “in your face” type approach that appears to be the present position. Being able to participate in teleconferences for example, should be sufficient for most working members. After all, if you have to win votes for a technical position, you haven’t been very persuasive in presenting your position. Nothing against “voting members” at the W3C but I would rather be a “working member” any day. How about you? Linked Data Platform Best Practices… Thursday, August 28th, 2014 Linked Data Platform Best Practices and Guidelines Note Published From the post: The Linked Data Platform (LDP) Working Group has published a Group Note of Linked Data Platform Best Practices and Guidelines. This document provides best practices and guidelines for implementing Linked Data Platform servers and clients. Learn more about the Data Activity. The document takes pains to distinguish “best practice” from “guidance:” For the purposes of this document, it is useful to make a minor, yet important distinction between the term ‘best practice’ and the term ‘guideline’. We define and differentiate the terms as follows: best practice An implementation practice (method or technique) that has consistently shown results superior to those achieved with other means and that is used as a benchmark. Best practices within this document apply specifically to the ways that LDP servers and clients are implemented as well as how certain resources are prepared and used with them. In this document, the best practices might be used as a kind of check-list against which an implementer can directly evaluate a system’s design and code. Lack of adherence to any given best practice, however, does not necessarily imply a lack of quality; they are recommendations that are said to be ‘best’ in most cases and in most contexts, but not all. A best practice is always subject to improvement as we learn and evolve the Web together. guideline A tip, a trick, a note, a suggestion, or answer to a frequently asked question. Guidelines within this document provide useful information that can advance an implementer’s knowledge and understanding, but that may not be directly applicable to an implementation or recognized by consensus as a ‘best practice’. Personally I don’t see the distinction as useful but I bring it to your attention in case you are reading or authoring in this space. CSV on the Web Tuesday, July 15th, 2014 CSV on the Web – What’s Happening in the W3C Working Group by Jeni Tennison. After seeing Software Carpentry: Lessons Learned yesterday, I have a new appreciation for documenting the semantics of data as used by its users. Not to say we don’t need specialized semantic syntaxes and technologies, but if we expect market share, then we need to follow the software and data users are using. How important is CSV? Jeni gives that stats as: • >90% open data is tabular • 2/3rds “CSV” files on data.gov.uk aren’t machine readable Which means people use customized solutions (read vendor lockin). A good overview of the CSV WG’s work so far with a request for your assistance: I need to start following this workgroup. Curious to see if they reuse XQuery addressing to annotate CSV files, columns, rows, cells. PS: If you don’t see arrows in the presentation, I didn’t, use your space bar to change slides and Esc to see all the slides. ACTUS Monday, March 17th, 2014 ACTUS (Algorithmic Contract Types Unified Standards) From the webpage: The Alfred P. Sloan Foundation awarded Stevens Institute of Technology a grant to work on the proposal entitled “Creating a standard language for financial contracts and a contract-centric analytical framework”. The standard follows the theoretical groundwork laid down in the book “Unified Financial Analysis” (1) – UFA.The goal of this project is to build a financial instrument reference database that represents virtually all financial contracts as algorithms that link changes in risk factors (market risk, credit risk, and behavior, etc.) to cash flow obligations of financial contracts. This reference database will be the technological core of a future open source community that will maintain and evolve standardized financial contract representations for the use of regulators, risk managers, and researchers. The objective of the project is to develop a set of about 30 unique contract types (CT’s) that represent virtually all existing financial contracts and which generate state contingent cash flows at a high level of precision. The term of art that describes the impact of changes in the risk factors on the cash flow obligations of a financial contract is called “state contingent cash flows,” which are the key input to virtually all financial analysis including models that assess financial risk. 1- Willi Brammertz, Ioannis Akkizidis, Wolfgang Breymann, Rami Entin, Marco Rustmann; Unified Financial Analysis – The Missing Links of Finance, Wiley 2009. This will help with people who are not cheating in the financial markets. After the revelations of the past couple of years, any guesses on the statistics of non-cheating members of the financial community? 😉 Even if these are used by non-cheaters, we know that the semantics are going to vary from user to user. The real questions are: 1) How will we detect semantic divergence? and 2) How much semantic divergence can be tolerated? I first saw this in a tweet by Stefano Bertolo. The curse of NOARK Tuesday, November 26th, 2013 The curse of NOARK by Lars Marius Garshol. From the post: I’m writing about a phenomenon that’s specifically Norwegian, but some things are easier to explain to foreigners, because we Norwegians have been conditioned to accept them. In this case I’m referring to the state of the art for archiving software in the Norwegian public sector, where everything revolves around the standard known as NOARK. Let’s start with the beginning. Scandinavian chancelleries have a centuries-long tradition for a specific approach to archiving, which could be described as a kind of correspondence journal. Essentially, all incoming and outgoing mail, as well as important internal documents, were logged in a journal, with title, from, to, and date for each document. In addition, each document would be filed under a “sak”, which translates roughly as “case” or “matter under consideration”. Effectively, it’s a kind of tag which ties together one thread of documents relating to a specific matter. The classic example is if the government receives a request of some sort, then produces some intermediate documents while processing it, and then sends a response. Perhaps there may even be couple of rounds of back-and-forth with the external party. This would be an archetypal “sak” (from now on referred to as “case”), and you can see how having all these documents in a single case file would be absolutely necessary for anyone responding to the case. In fact, it’s not dissimilar to the concept of an issue in an issue-tracking system. In this post and its continuation in Archive web services: a missed opportunity Lars details the shortcomings of the NOARK standard. To some degree specifically Norwegian but the problem of poor IT design is truly an international phenomena. I haven’t made any suggestions since the U.S. is home to the virtual case management debacle, the incredible melting NSA data center, not to mention the non-functional health care IT system known as HeathCare.gov. Read these posts by Lars because you will encounter projects before mistakes similar to the ones Lars describes have been set in stone. No guarantees of success but instead of providing semantic data management on top of a broken IT system, you could be providing semantic data management on top of a non-broken IT system. Perhaps never a great IT system but I would settle for a non-broken one any day. Current RFCs and Their Citations Sunday, November 17th, 2013 Current RFCs and Their Citations A resource I created to give authors and editors a cut-n-paste way to use correct citations to current RFCs. I won’t spread bad data by repeating some of the more imaginative citations of RFCs that I have seen. Being careless about citations has the same impact as being careless about URLs. The end result is at best added work for your reader and at worst, no communication at all. I will be updating this resource on a weekly basis but remember the canonical source of information on RFCs is the RFC-Editor’s page. From a topic map perspective, the URLs you see in this resource are subject locators for the subjects, which are the RFCs. New Data Standard May Save Billions [Danger! Danger! Will Robinson] Wednesday, November 13th, 2013 New Data Standard May Save Billions by Isaac Lopez. When I read: The international World Wide Web Consortium (W3C) is finalizing a new data standard that could lead to$3 billion of savings each year for the global web industry. The new standard, called the Customer Experience Digital Data Acquisition standard, aims to simplify and standardize data for such endeavors as marketing, analytics, and personalization across websites worldwide.

“At the moment, every technology ingests and outputs information about website visitors in a dizzying array of different formats,” said contributing company Qubit in a statement. “Every time a site owner wants to deploy a new customer experience technology such as web analytics, remarketing or web apps, overstretched development teams have to build a bespoke set of data interfaces to make it work, meaning site owners can’t focus on what’s important.”

The new standard aims to remove complexity by unifying the language that is used by marketing, analytics, and other such tools that are being used as part of the emerging big data landscape. According to the initial figures from customer experience management platform company (and advocate of the standard), Qubit, the savings from the increased efficiency could reach the equivalent of 0.1% of the global internet economy.

Of those benefitting the most from the standard, the United States comes in a clear winner, with savings that reach into the billions, with average savings per business in the tens of thousands of dollars.

I thought all my news feeds from, on and about the W3C had failed. I couldn’t recall any W3C standard work that resembled what was being described.

I did find it hosted at the W3C: Customer Experience Digital Data Community Group, where you will read:

The Customer Experience Digital Data Community Group will work on reviewing and upgrading the W3C Member Submission in Customer Experience Digital Data, starting with the Customer Experience Digital Data Acquisition submission linked here (http://www.w3.org/Submission/2012/04/). The group will also focus on developing connectivity between the specification and the Data Privacy efforts in the industry, including the W3C Tracking Protection workgroup. The goal is to upgrade the Member Submission specification via this Community Group and issue a Community Group Final Specification.

Note: Community Groups are proposed and run by the community. Although W3C hosts these conversations, the groups do not necessarily represent the views of the W3C Membership or staff. (emphasis added)

So, The international World Wide Web Consortium (W3C) is [NOT] finalizing a new data standard….

The W3C should not be attributed work it has not undertaken or approved.

Saturday, November 2nd, 2013

ANSI Launches Online Portal for Standards Incorporated by Reference

From the post:

The American National Standards Institute (ANSI) is proud to announce the official launch of the ANSI IBR Portal, an online tool for free, read-only access to voluntary consensus standards that have been incorporated by reference (IBR) into federal laws and regulations.

In recent years, issues related to IBR have commanded increased attention, particularly in connection to requirements that standards that have been incorporated into federal laws and regulations be “reasonably available” to the U.S. citizens and residents affected by these rules. This requirement had led some to call for the invalidation of copyrights for IBR standards. Others have posted copyrighted standards online without the permission of the organizations that developed them, triggering legal action from standards developing organizations (SDOs).

“In all of our discussions about the IBR issue, the question we are trying to answer is simple. Why aren’t standards free? In the context of IBR, it’s a valid point to raise,” said S. Joe Bhatia, ANSI president and CEO. “A standard that has been incorporated by reference does have the force of law, and it should be available. But the blanket statement that all IBR standards should be free misses a few important considerations.”

As coordinator of the U.S. standardization system, ANSI has taken a lead role in informing the public about the reality of free standards, the economics of standards setting, and how altering this infrastructure will undermine U.S. competitiveness. Specifically, the loss of revenue from the sale of standards could negatively impact the business model supporting many SDOs – potentially disrupting the larger U.S. and international standardization system, a major driver of innovation and economic growth worldwide. In response to concerns raised by ANSI members and partner organizations, government officials, and other stakeholders, ANSI began to develop its IBR Portal, with the goal of providing a single solution to this significant issue that also provides SDOs with the flexibility they require to safeguard their ability to develop standards.

This is “free” access to standards that have the force of law in the United States.

Whether it is meaningful access is something I will leave for you to consider in light of restrictions that prevent printing, copying, downloading or taking screenshots.

Particularly since some standards run many pages and are not easy documents to read.

I wonder if viewing these “free” standards disables your cellphone camera?

SDOs could be selling enhanced electronic versions, think XML versions that are interlinked together or linked into information systems and giving the PDFs away as advertising.

That would require using the standards others (not the SDOs who house such efforts) have labored so hard to produce.

The response I get to that suggestion has traditionally been: “Our staff doesn’t have the skills for that suggestion.”

I know how to fix that. Don’t you?

Open Discovery Initiative Recommended Practice [Comments due 11-18-2013]

Thursday, October 17th, 2013

From the Open Discovery Initiative (NISO) webpage:

The Open Discovery Initiative (ODI) aims at defining standards and/or best practices for the new generation of library discovery services that are based on indexed search. These discovery services are primarily based upon indexes derived from journals, ebooks and other electronic information of a scholarly nature. The content comes from a range of information providers and products–commercial, open access, institutional, etc. Given the growing interest and activity in the interactions between information providers and discovery services, this group is interested in establishing a more standard set of practices for the ways that content is represented in discovery services and for the interactions between the creators of these services and the information providers whose resources they represent.

If you are interested in the discovery of information, as a publisher, consumer of information, library or otherwise, please take the time to read and comment on this recommended practice.

Spend some time with the In Scope and Out of Scope sections.

So that your comments reflect what the recommendation intended to cover and not what you would prefer that it covered. (That’s advice I need to heed as well.)

Apologies for Sudden Slowdown

Tuesday, April 23rd, 2013

Sorry about the sudden slow down!

I have a couple of posts for today and will be back at full strength tomorrow.

I got distracted by a standards dispute at OASIS where a TC wanted an “any model” proposal to be approved as an OASIS standard.

Literally, the conformance clause says “must” but when you look at the text, it says any old model will do.

Hard to think of that as a standard.

If you are interested, see: Voting No on TGF at OASIS.

Deadline is tomorrow so if you know anyone who is interested, spread the word.

Opening Standards: The Global Politics of Interoperability

Sunday, March 31st, 2013

Opening Standards: The Global Politics of Interoperability Edited by Laura DeNardis.

Overview:

Openness is not a given on the Internet. Technical standards–the underlying architecture that enables interoperability among hardware and software from different manufacturers–increasingly control individual freedom and the pace of innovation in technology markets. Heated battles rage over the very definition of “openness” and what constitutes an open standard in information and communication technologies. In Opening Standards, experts from industry, academia, and public policy explore just what is at stake in these controversies, considering both economic and political implications of open standards. The book examines the effect of open standards on innovation, on the relationship between interoperability and public policy (and if government has a responsibility to promote open standards), and on intellectual property rights in standardization–an issue at the heart of current global controversies. Finally, Opening Standards recommends a framework for defining openness in twenty-first-century information infrastructures.

Contributors discuss such topics as how to reflect the public interest in the private standards-setting process; why open standards have a beneficial effect on competition and Internet freedom; the effects of intellectual property rights on standards openness; and how to define standard, open standard, and software interoperability.

If you think “open standards” have impact, what would you say about “open data?”

At a macro level, “open data” has many of the same issues as “open standards.”

At a micro level, “open data” has unique social issues that drive the creation of silos for data.

So far as I know, a serious investigation of the social dynamics of data silos has yet to be written.

Understanding the dynamics of data silos might, no guarantees, lead to better strategies for dismantling them.

Suggestions for research/reading on the social dynamics of data silos?

Forget standards … you’ll never get one [Of plugs, adapters, standards and many things]

Tuesday, November 13th, 2012

Forget standards … you’ll never get one by Chris Skinner.

A post by Ed Dodds on the ontologies-based-standards@ontolog.cim3.net list pointed me to this rather interesting post in finance of all places.

Chris writes:

Anyways, after coffee I got into a chat with the Standards Forum and one of their brethren told me that banks are childish about standards.

Childish?

Yes, he said. I deal with many industries – automotive, airlines, utilities and more – and banks are really juvenile when it comes to agreeing standards. For example, I asked a group of senior bankers the other day: “how many legs are there in an OTC Derivative exchange”.

One said two, the two counterparties; another said three or four, if you include the end customer; and two others said an infinite number.

Then they argued about it and could not agree.

I said: “there you go. If you cannot even agree on a simple question about OTC Derivatives, you will never agree global standards.”

I laughed and asked what the solution was.

He said: “avoid a global standard as you will never have one. You’ve tried for years and you will never agree such a thing. Instead, work on adapters.”

In other words, like electricity, we need plug adapters to our networks, not standards.

Totally agree with that.

Well, yes and no as to “Totally agree with that.”

Yes, there won’t be any universal standards, but no, that doesn’t mean we need to forget about standards.

Take “plug adapters” for example. Plug adapters could not exist without standards for the plugs that go into plug adapters. Yes?

We need to forget “universal” standards and instead concentrate on “local” standards. Standards that extend only so far as we are competent to define them.

Leave the task of writing standards adapters to people with experience with one or more “local” standards who have a need for the adapter.

They will be far more aware of the requirements for the adapter than we are.

Sounds like a use case for topic maps doesn’t it?

Standards and Infrastructure for Innovation Data Exchange [#6000]

Saturday, October 13th, 2012

Standards and Infrastructure for Innovation Data Exchange by Laurel L. Haak, David Baker, Donna K. Ginther, Gregg J. Gordon, Matthew A. Probus, Nirmala Kannankutty and Bruce A. Weinberg. (Science 12 October 2012: Vol. 338 no. 6104 pp. 196-197 DOI: 10.1126/science.1221840)

Appropriate that post number six thousand (6000) should report an article on data exchange standards.

But the article seems to be at war with itself.

Consider:

There is no single database solution. Data sets are too large, confidentiality issues will limit access, and parties with proprietary components are unlikely to participate in a single-provider solution. Security and licensing require flexible access. Users must be able to attach and integrate new information.

Unified standards for exchanging data could enable a Web-based distributed network, combining local and cloud storage and providing public-access data and tools, private workspace “sandboxes,” and versions of data to support parallel analysis. This infrastructure will likely concentrate existing resources, attract new ones, and maximize benefits from coordination and interoperability while minimizing resource drain and top-down control.

As quickly as the authors say “[t]here is no single database solution.”, they take a deep breath and outline the case for a uniform data sharing structure.

If there is no “single database solution,” it stands to reason there is no single infrastructure for sharing data. The same diversity that blocks the single database, impedes the single exchange infrastructure.

We need standards, but rather than unending quests for enlightened permanence, we should focus on temporary standards, to be replaced by other temporary standards, when circumstances or needs change.

A narrow range required to demonstrate benefits from temporary standards is a plus as well. A standard enabling data integration between departments at a hospital, one department at a time, will show benefits (if there are any to be had), far sooner than a standard that requires universal adoption prior to any benefits appearing.

The Topic Maps Data Model (TMDM) is an example of a narrow range standard.

While the TMDM can be extended, in its original form, subjects are reliably identified using IRI’s (along with data about those subjects). All that is required is that one or more parties use IRIs as identifiers, and not even the same IRIs.

The TMDM framework enables one or more parties to use their own IRIs and data practices, without prior agreement, and still have reliable merging of their data.

I think it is the without prior agreement part that distinguishes the Topic Maps Data Model from other data interchange standards.

We can skip all the tiresome discussion about who has the better name/terminology/taxonomy/ontology for subject X and get down to data interchange.

Data interchange is interesting, but what we find following data interchange is even more so.

More on that to follow, sooner rather than later, in the next six thousand posts.

Cliff Bleszinski’s Game Developer Flashcards

Tuesday, August 21st, 2012

Cliff Bleszinski’s Game Developer Flashcards by Cliff Bleszinski.

From the post:

As of this summer, I’ll have been making games for 20 years professionally. I’ve led the design on character mascot platform games, first-person shooters, single-player campaigns, multiplayer experiences, and much more. I’ve worked with some of the most amazing programmers, artists, animators, writers, and producers around. Throughout this time period, I’ve noticed patterns in how we, as creative professionals, tend to communicate.

I’ve learned that while developers are incredibly intelligent, they can sometimes be a bit insecure about how smart they are compared to their peers. I’ve seen developer message boards tear apart billion-dollar franchises, indie darlings, and everything in between by overanalyzing and nitpicking. We always want to prove that we thought of an idea before anyone else, or we will cite a case in which an idea has been attempted, succeeded, failed, or been played out.

In short, this article identifies communication techniques that are often used in discussions, arguments, and debates among game developers in order to “win” said conversations.

Written in a “game development” context but I think you can recognize some of these patterns in standards work, ontology development and other areas as well.

I did not transpose/translate it into standards lingo, reasoning that it would be easier to see the mote in someone else’s eye than the plank in our own. 😉

Only partially in jest.

Listening to others is hard, listening to ourselves (for patterns like these), is even harder.

I first saw this at: Nat Turkington’s Four short links: 21 August 2012.

Open Services for Lifecycle Collaboration (OSLC)

Sunday, July 29th, 2012

Open Services for Lifecycle Collaboration (OSLC)

This is one of the efforts mentioned in: Linked Data: Esperanto for APIs?.

Open Services for Lifecycle Collaboration (OSLC) is a community of software developers and organizations that is working to standardize the way that software lifecycle tools can share data (for example, requirements, defects, test cases, plans, or code) with one another.

We want to make integrating lifecycle tools a practical reality. (emphasis in original)

That’s a far cry from:

At the very least, however, a generally accepted approach to linking data within applications that make the whole programmable Web concept more accessible to developers of almost every skill level should not be all that far off from here.

It has an ambitious but well-defined scope, which will lend itself to the development and testing of standards for the interchange of information.

Despite semantic diversity, those are tasks that can be identified and that would benefit from standardization.

There is measurable ROI for participants who use the standard in a software lifecycle. They are giving up semantic diversity in exchange for other tangible benefits.

An effort to watch as a possible basis for integrating older software lifecycle tools.

NoSQL Standards [query languages – tuples anyone?]

Sunday, June 10th, 2012

Andrew Oliver write at InfoWorld: The time for NoSQL standards is now – Like Larry Ellison’s yacht, the RDBMS is sailing into the sunset. But if NoSQL is to take its place, a standard query language and APIs must emerge soon.

A bit dramatic for my taste but a good overview of possible areas for standardization for NoSQL.

Problem: NoSQL query languages are tied to the base format/data structure of their implementation.

For that matter, you could say the same thing about SQL. The query language is tied to the data structure.

I am not sure how you can have a query language that isn’t tied to a notion of structure. Even a very abstract one. That a NoSQL implementation could map against its data structure.

Tuples anyone?

Pointers and resources welcome!