Building Definitions Lists for XPath/XQuery/etc.

I have extracted the definitions from:

These lists are unsorted and the paragraphs with multiple definitions are repeated for each definition. Helps me spot where I have multiple definitions that may be followed by non-normative prose, applicable to one or more definitions.

Usual follies trying to extract the definitions.

My first attempt (never successful in my experience but I have to try it so as to get to the second, third, etc.) resulted in:


Which really wasn’t what I meant. Unfortunately it was what I had asked for. 😉

Just in case you are curious, the guts to extracting the definitions reads:

<xsl:for-each select=”//p/a[contains(@name, ‘dt’)]”>
<xsl:copy-of select=”ancestor::p”/>

Each of the definitions is contained in a p element where the anchor for the definition is contained in an a element with the attribute name, “dt-(somename).”

This didn’t work in all four (4) cases because XPath and XQuery Functions and Operators 3.1 records its “[Definition” elements as:

<p><span class=”termdef”><a name=”character” id=”character” shape=”rect”></a>[Definition] A <b>character</b> is an instance of the <a href=”” shape=”rect”>Char</a><sup><small>XML</small></sup> production of <a href=”#xml” shape=”rect”>[Extensible Markup Language (XML) 1.0 (Fifth Edition)]</a>.</span></p>

I’m sure there is some complex trickery you could use to account for that case but with four files, this is meatball XSLT, results over elegance.

Multiple definitions in one paragraph must be broken out so they can be sorted along with the other definitions.

The one thing I forgot to do in the XSLT that you should do when comparing multiple standards was to insert an identifier at the end of each paragraph for the text it was drawn from. Thus:

[Definition: Every instance of the data model is a sequence. XDM]

Where XDM is in a different color for each source.

Proofing all these definitions across four (4) specifications (XQueryX has no additions definitions, aside from unnecessarily restating RFC 2119) is no trivial matter. Which is why I have extracted them and will be producing a deduped and sorted version.

When you have long or complicated standards to proof, it helps to break them down in to smaller parts. Especially if the parts are out of their normal reading context. That helps avoid simply nodding along because you have read the material so many times.

FYI, comments on errors most welcome! Producing the lists was trivial. Proofing the headers, footers, license language, etc. took longer than the lists.


Comments are closed.