## Structural Issues in XPath/XQuery/XPath-XQuery F&O Drafts

Apologies as I thought I was going to be further along in demonstrating some proofing techniques for XPath 3.1, XQuery 3.1, XPath and XQuery Functions and Operations 3.1 by today.

Instead, I encountered structural issues that are common to all three drafts that I didn’t anticipate but that need to be noted before going further with proofing. I will be using sample material to illustrate the problems and will not always have a sample from all three drafts or even note every occurrence of the issues. They are too numerous for that treatment and it would be repetition for repetition’s sake.

First, consider these passages from XPath 3.1, 1 Introduction:

[Definition: XPath 3.1 operates on the abstract, logical structure of an XML document, rather than its surface syntax. This logical structure, known as the data model, is defined in [XQuery and XPath Data Model (XDM) 3.1].]

[Definition: An XPath 3.0 Processor processes a query according to the XPath 3.0 specification.] [Definition: An XPath 2.0 Processor processes a query according to the XPath 2.0 specification.] [Definition: An XPath 1.0 Processor processes a query according to the XPath 1.0 specification.]

1. Unnumbered Definitions – Unidentified Cross-References

The first structural issue that you will note with the “[Definition…” material is that all such definitions are unnumbered and appear throughout all three texts. The lack of numbering means that it is difficult to refer with any precision to a particular definition. How would I draw your attention to the third definition of the second grouping? Searching for XPath 1.0 turns up 79 occurrences in XPath 3.1 so that doesn’t sound satisfactory. (FYI, “Definition” turns up 193 instances.)

While the “Definitions” have anchors that allow them to be addressed by cross-references, you should note that the cross-references are text hyperlinks that have no identifier by which a reader can find the definition without using the hyperlink. That is to say when I see:

A lexical QName with a prefix can be converted into an expanded QName by resolving its namespace prefix to a namespace URI, using the statically known namespaces. [These are fake links to draw your attention to the text in question.]

The hyperlinks in the original will take me to various parts of the document where these definitions occur, but if I have printed the document, I have no clue where to look for these definitions.

The better practice is to number all the definitions and since they are all self-contained, to put them in a single location. Additionally, all interlinear references to those definitions (or other internal cross-references) should have a visible reference that enables a reader to find the definition or cross-reference, without use of an internal hyperlink.

Example:

A lexical QName Def-21 with a prefix can be converted into an expanded QName Def-19 by resolving its namespace prefix to a namespace URI, using the statically known namespaces. Def-99 [These are fake links to draw your attention to the text in question. The Def numbers are fictitious in this example. Actual references would have the visible definition numbers assigned to the appropriate definition.]

2. Vague references – $N versus 5000 x$N

Another problem I encountered was what I call “vague references,” or less generously, $N versus 5,000 x$N.

For example:

[Definition: An atomic value is a value in the value space of an atomic type, as defined in [XML Schema 1.0] or [XML Schema 1.1].] [Definition: A node is an instance of one of the node kinds defined in [XQuery and XPath Data Model (XDM) 3.1].

Contrary to popular opinion, standards don’t write themselves and every jot and tittle was placed in a draft at the expense of someone’s time and resources. Let’s call that $N. In the example, you and I both know somewhere in XML Schema 1.0 and XML Schema 1.1 that the “value space of the atomic type” is defined. The same is true for nodes and XQuery and XPath Data Model (XDM) 3.1. But where? The authors of these specifications could insert that information at a cost of$N.

What is the cost of not inserting that information in the current drafts? I estimate the number of people interested in reading these drafts to be 5,000. So each of those person will have to find the same information omitted from these specifications, which is a cost of 5,000 x $N. In terms of convenience to readers and reducing their costs of reading these specifications, references to exact locations in other materials are a necessity. In full disclosure, I have no more or less reason to think 5,000 people are interested in these drafts than the United States has for positing the existence of approximately 5,000 terrorists in the world. I suspect the number of people interested in XML is actually higher but the number works to make the point. Editors can either convenience themselves or their readers. Vague references are also problematic in terms of users finding the correct reference. The citation above, [XML Schema 1.0] for “value space of an atomic type,” refers to all three parts of XML Schema 1.0. Part 1, at 3.14.1 (non-normative) The Simple Type Definition Schema Component, has the only reference to “atomic type.” Part 2, actually has “0” hits for “atomic type.” True enough, “2.5.1.1 Atomic datatypes” is likely the intended reference but that isn’t what the specification says to look for. Bottom line is that any external reference needs to include in the inline citation the precise internal reference in the work being cited. If you want to inconvenience readers by pointing to internal bibliographies rather than online HTML documents, where available, that’s an editorial choice. But in any event, for every external reference, give the internal reference in the work being cited. Your readers will appreciate it and it could make your work more accurate as well. 3. Normative vs. Non-Normative Text Another structural issue which is important for proofing is the distinction between normative and non-normative text. In XPath 3.1, still in the Introduction we read: This document normatively defines the static and dynamic semantics of XPath 3.1. In this document, examples and material labeled as “Note” are provided for explanatory purposes and are not normative. OK, and under 2.2.3.1 Static Analysis Phase (XPath 3.1), we find: Examples of inferred static types might be: Which is followed by a list so at least we know where the examples end. However, there are numerous cases of: For example, with the expression substring($a, $b,$c), $a must be of type xs:string (or something that can be converted to xs:string by the function calling rules), while$b and \$c must be of type xs:double. [also in 2.2.3.1 Static Analysis Phase (XPath 3.1)]

So, is that a non-normative example? If so, what is the nature of the “must” that occurs in it? Is that normative?

Moreover, the examples (XPath 3.1 has 283 occurrences of that term, XQuery has 455 occurrences of that term, XPath and XQuery Functions and Operators have 537 occurrences of that term) are unnumbered, which makes referencing the examples by other materials very imprecise and wordy. For the use of authors creating secondary literature on these materials, to promote adoption, etc., number of all examples should be the default case.

Oh, before anyone protests that XPath and XQuery Functions and Operators has separated its examples into lists, that is true but only partially. There remain 199 occurrences of “for example” which do not occur in lists. Where lists are used, converting to numbered examples should be trivial. The elimination of “for example” material may be more difficult. Hard to say without a good sampling of the cases.

Conclusion:

As I said at the outset, apologies for not reaching more substantive proofing techniques but structural issues are important for the readability and usability of specifications for readers. Being correct and unreadable isn’t a useful goal.

It may seem like some of the changes I suggest are a big “ask” this late in the processing of these specifications. If this were a hand edited document, I would quickly agree with you. But it’s not. Or at least it shouldn’t be. I don’t know where the source is held but the HTML you read is an generated artifact.

Gathering and numbering the definitions and inserting those numbers into the internal cross-references are a matter of applying a different style sheet to the source. Fixing the vague references and unnumbered example texts would take more editorial work but readers would greatly benefit from precise references and a clear separation of normative from non-normative text.

I will try again over the weekend to reach aids for substantive proofing on these drafts. With luck, I will return to these drafts on Monday of next week (12 January 2014).