Archive for the ‘XPath’ Category

Balisage: The Markup Conference 2017 Program Now Available

Wednesday, May 17th, 2017

Balisage: The Markup Conference 2017 Program Now Available

An email from Tommie Usdin, Chair, Chief Organizer and herder of markup cats for Balisage advises:

Balisage: where serious markup practitioners and theoreticians meet every August.

The 2017 program includes papers discussing XML vocabularies, cutting-edge digital humanities, lossless JSON/XML roundtripping, reflections on concrete syntax and abstract syntax, parsing and generation, web app development using the XML stack, managing test cases, pipelining and micropipelinging, electronic health records, rethinking imperative algorithms for XSLT and XQuery, markup and intellectual property, why YOU should use (my favorite XML vocabulary), developing a system to aid in studying manuscripts in the tradition of the Ethiopian and Eritrean Highlands, exploring “shapes” in RDF and their relationship to schema validation, exposing XML data to users of varying technical skill, test-suite management, and use case studies about large conversion applications, DITA, and SaxonJS.

Up-Translation and Up-Transformation: A one-day Symposium on the goals, challenges, solutions, and workflows for significant XML enhancements, including approaches, tools, and techniques that may potentially be used for a variety of other tasks. The symposium will be of value not only to those facing up-translation and transformation but also to general XML practitioners seeking to get the most out of their data.

Are you interested in open information, reusable documents, and vendor and application independence? Then you need descriptive markup, and Balisage is your conference. Balisage brings together document architects, librarians, archivists, computer scientists, XML practitioners, XSLT and XQuery programmers, implementers of XSLT and XQuery engines and other markup-related software, semantic-Web evangelists, standards developers, academics, industrial researchers, government and NGO staff, industrial developers, practitioners, consultants, and the world’s greatest concentration of markup theorists. Some participants are busy designing replacements for XML while other still use SGML (and know why they do).

Discussion is open, candid, and unashamedly technical.

Balisage 2017 Program:
http://www.balisage.net/2017/Program.html

Symposium Program:
https://www.balisage.net/UpTransform

NOTE: Members of the TEI and their employees are eligible for discount Balisage registration.

You need to see the program for yourself but the highlights (for me) include: Ethiopic manuscripts (ok, so I have odd tastes), Earley parsers (of particular interest), English Majors (my wife was an English major), and a number of other high points.

Mark your calendar for July 31 – August 4, 2017 – It’s Balisage!

Pure CSS crossword – CSS Grid

Wednesday, April 19th, 2017

Pure CSS crossword – CSS Grid by Adrian Roworth.

The UI is slick, although creating the puzzle remains on you.

Certainly suitable for string answers, XQuery/XPath/XSLT expressions, etc.

Enjoy!

XQuery 3.1 and Company! (Deriving New Versions?)

Wednesday, March 22nd, 2017

XQuery 3.1: An XML Query Language W3C Recommendation 21 March 2017

Hurray!

Related reading of interest:

XML Path Language (XPath) 3.1

XPath and XQuery Functions and Operators 3.1

XQuery and XPath Data Model 3.1

These recommendations are subject to licenses that read in part:

No right to create modifications or derivatives of W3C documents is granted pursuant to this license, except as follows: To facilitate implementation of the technical specifications set forth in this document, anyone may prepare and distribute derivative works and portions of this document in software, in supporting materials accompanying software, and in documentation of software, PROVIDED that all such works include the notice below. HOWEVER, the publication of derivative works of this document for use as a technical specification is expressly prohibited.

You know I think the organization of XQuery 3.1 and friends could be improved but deriving and distributing “improved” versions is expressly prohibited.

Hmmm, but we are talking about XML and languages to query and transform XML.

Consider the potential of an query that calls XQuery 3.1: An XML Query Language and materials cited in it, then returns a version of XQuery 3.1 that has definitions from other standards off-set in the XQuery 3.1 text.

Or than inserts into the text examples or other materials.

For decades XML enthusiasts have bruited about dynamic texts but have produced damned few of them (as in zero) as their standards.

Let’s use the “no derivatives” language of the W3C as an incentive to not create another static document but a dynamic one that can grow or contract according to the wishes of its reader.

Suggestions for first round features?

Up-Translation and Up-Transformation … [Balisage Rocks!]

Sunday, January 29th, 2017

Up-Translation and Up-Transformation: Tasks, Challenges, and Solutions (a Balisage pre-conference symposium)

When & Where:

Monday July 31, 2017
CAMBRiA Hotel, Rockville, MD USA

Chair: Evan Owens, Cenveo

You need more details than that?

Ok, from the webpage:

Increasing the granularity and/or specificity of markup is an important task in many different content and information workflows. Markup transformations might involve tasks such as high-level structuring, detailed component structuring, or enhancing information by matching or linking to external vocabularies or data. Enhancing markup presents numerous secondary challenges including lack of structure of the inputs or inconsistency of input data down to the level of spelling, punctuation, and vocabulary. Source data for up-translation may be XML, word processing documents, plain text, scanned & OCRed text, or databases; transformation goals may be content suitable for page makeup, search, or repurposing, in XML, JSON, or any other markup language.

The range of approaches to up-transformation is as varied as the variety of specifics of the input and required outputs. Solutions may combine automated processing with human review or could be 100% software implementations. With the potential for requirements to evolve over time, tools may have to be actively maintained and enhanced.

The presentations in this pre-conference symposium will include goals, challenges, solutions, and workflows for significant XML enhancements, including approaches, tools, and techniques that may potentially be used for a variety of other tasks. The symposium will be of value not only to those facing up-translation and transformation but also to general XML practitioners seeking to get the most out of their data.

If I didn’t know better, up-translation and up-transformation sound suspiciously like conferred properties of topic maps fame.

Well, modulo that conferred properties could be predicated on explicit subject identity and not hidden in the personal knowledge of the author.

There are two categories of up-translation and up-transformation:

  1. Ones that preserve jobs like spaghetti Cobol code, and
  2. Ones that support easy long term maintenance.

While writing your paper for the pre-conference, which category fits yours the best?

XQuery/XSLT Proposals – Comments by 28 February 2017

Tuesday, January 24th, 2017

Proposed Recommendations Published for XQuery WG and XSLT WG.

From the webpage:

The XML Query Working Group and XSLT Working Group have published a Proposed Recommendation for four documents:

  • XQuery and XPath Data Model 3.1: This document defines the XQuery and XPath Data Model 3.1, which is the data model of XML Path Language (XPath) 3.1, XSL Transformations (XSLT) Version 3.0, and XQuery 3.1: An XML Query Language. The XQuery and XPath Data Model 3.1 (henceforth “data model”) serves two purposes. First, it defines the information contained in the input to an XSLT or XQuery processor. Second, it defines all permissible values of expressions in the XSLT, XQuery, and XPath languages.
  • XPath and XQuery Functions and Operators 3.1: The purpose of this document is to catalog the functions and operators required for XPath 3.1, XQuery 3.1, and XSLT 3.0. It defines constructor functions, operators, and functions on the datatypes defined in XML Schema Part 2: Datatypes Second Edition and the datatypes defined in XQuery and XPath Data Model (XDM) 3.1. It also defines functions and operators on nodes and node sequences as defined in the XQuery and XPath Data Model (XDM) 3.1.
  • XML Path Language (XPath) 3.1: XPath 3.1 is an expression language that allows the processing of values conforming to the data model defined in XQuery and XPath Data Model (XDM) 3.1. The name of the language derives from its most distinctive feature, the path expression, which provides a means of hierarchic addressing of the nodes in an XML tree. As well as modeling the tree structure of XML, the data model also includes atomic values, function items, and sequences.
  • XSLT and XQuery Serialization 3.1: This document defines serialization of an instance of the data model as defined in XQuery and XPath Data Model (XDM) 3.1 into a sequence of octets. Serialization is designed to be a component that can be used by other specifications such as XSL Transformations (XSLT) Version 3.0 or XQuery 3.1: An XML Query Language.

Comments are welcome through 28 February 2017.

Get your red pen out!

Unlike political flame wars on social media, comments on these proposed recommendatons could make a useful difference.

Enjoy!

XML.com Relaunch!

Monday, January 16th, 2017

XML.com

Lauren Wood posted this note about the relaunch of XML.com recently:

I’ve relaunched XML.com (for some background, Tim Bray wrote an article here: https://www.xml.com/articles/2017/01/01/xmlcom-redux/). I’m hoping it will become part of the community again, somewhere for people to post their news (submit your news here: https://www.xml.com/news/submit-news-item/) and articles (see the guidelines at https://www.xml.com/about/contribute/). I added a job board to the site as well (if you’re in Berlin, Germany, or able to
move there, look at the job currently posted; thanks LambdaWerk!); if your employer might want to post XML-related jobs please email me.

The old content should mostly be available but some articles were previously available at two (or more) locations and may now only be at one; try the archive list (https://www.xml.com/pub/a/archive/) if you’re looking for something. Please let me know if something major is missing from the archives.

XML is used in a lot of areas, and there is a wealth of knowledge in this community. If you’d like to write an article, send me your ideas. If you have comments on the site, let me know that as well.

Just in time as President Trump is about to stir, vigorously, that big pot of crazy known as federal data.

Mapping, processing, transformation demands will grow at an exponential rate.

Notice the emphasis on demand.

Taking a two weeks to write custom software to sort files (you know the Weiner/Abedin laptop story, yes?) won’t be acceptable quite soon.

How are your on-demand XML chops?

XQuery/XPath CRs 3.1! [#DisruptJ20 Twitter Game]

Tuesday, December 13th, 2016

Just in time for the holidays, new CRs for XQuery/XPath hit the street! Comments due by 2017-01-10.

XQuery and XPath Data Model 3.1 https://www.w3.org/TR/2016/CR-xpath-datamodel-31-20161213/

XML Path Language (XPath) 3.1 https://www.w3.org/TR/2016/CR-xpath-31-20161213/

XQuery 3.1: An XML Query Language https://www.w3.org/TR/2016/CR-xquery-31-20161213/

XPath and XQuery Functions and Operators 3.1 https://www.w3.org/TR/2016/CR-xpath-functions-31-20161213/

XQueryX 3.1 https://www.w3.org/TR/2016/CR-xqueryx-31-20161213/

#DisruptJ20 is too late for comments to the W3C but you can break the boredom of indeterminate waiting to protest excitedly for TV cameras and/or to be arrested.

How?

Play the XQuery/XPath 3.1 Twitter Game!

Definitions litter the drafts and appear as:

[Definition: A sequence is an ordered collection of zero or more items.]

You Tweet:

An ordered collection of zero or more items? #xquery

Correct response:

A sequence.

Some definitions are too long to be tweeted in full:

An expanded-QName is a value in the value space of the xs:QName datatype as defined in the XDM data model (see [XQuery and XPath Data Model (XDM) 3.1]): that is, a triple containing namespace prefix (optional), namespace URI (optional), and local name. (xpath-functions)

Suggest you tweet:

A triple containing namespace prefix (optional), namespace URI (optional), and local name.

or

A value in the value space of the xs:QName datatype as defined in the XDM data model (see [XQuery and XPath Data Model (XDM) 3.1].

In both cases, the correct response:

An expanded-QName.

Use a $10 burner phone and it unlocked at protests. If your phone is searched, imagine the attempts to break the “code.”

You could agree on definitions/responses as instructions for direct action. But I digress.

4 Days Left – Submission Alert – XML Prague

Sunday, December 11th, 2016

A tweet by Jirka Kosek reminded me there are only 4 days left for XML Prague submissions!

  • December 15th – End of CFP (full paper or extended abstract)
  • January 8th – Notification of acceptance/rejection of paper to authors
  • January 29th – Final paper

From the call for papers:

XML Prague 2017 now welcomes submissions for presentations on the following topics:

  • Markup and the Extensible Web – HTML5, XHTML, Web Components, JSON and XML sharing the common space
  • Semantic visions and the reality – micro-formats, semantic data in business, linked data
  • Publishing for the 21th century – publishing toolchains, eBooks, EPUB, DITA, DocBook, CSS for print, …
  • XML databases and Big Data – XML storage, indexing, query languages, …
  • State of the XML Union – updates on specs, the XML community news, …

All proposals will be submitted for review by a peer review panel made up of the XML Prague Program Committee. Submissions will be chosen based on interest, applicability, technical merit, and technical correctness.

Accepted papers will be included in published conference proceedings.

I don’t travel but if you need a last-minute co-author or proofer, you know where to find me!

Balisage 2016 Program Posted! (Newcomers Welcome!)

Monday, May 23rd, 2016

Tommie Usdin wrote today to say:

Balisage: The Markup Conference
2016 Program Now Available
http://www.balisage.net/2016/Program.html

Balisage: where serious markup practitioners and theoreticians meet every August.

The 2016 program includes papers discussing reducing ambiguity in linked-open-data annotations, the visualization of XSLT execution patterns, automatic recognition of grant- and funding-related information in scientific papers, construction of an interactive interface to assist cybersecurity analysts, rules for graceful extension and customization of standard vocabularies, case studies of agile schema development, a report on XML encoding of subtitles for video, an extension of XPath to file systems, handling soft hyphens in historical texts, an automated validity checker for formatted pages, one no-angle-brackets editing interface for scholars of German family names and another for scholars of Roman legal history, and a survey of non-XML markup such as Markdown.

XML In, Web Out: A one-day Symposium on the sub rosa XML that powers an increasing number of websites will be held on Monday, August 1. http://balisage.net/XML-In-Web-Out/

If you are interested in open information, reusable documents, and vendor and application independence, then you need descriptive markup, and Balisage is the conference you should attend. Balisage brings together document architects, librarians, archivists, computer
scientists, XML practitioners, XSLT and XQuery programmers, implementers of XSLT and XQuery engines and other markup-related software, Topic-Map enthusiasts, semantic-Web evangelists, standards developers, academics, industrial researchers, government and NGO staff, industrial developers, practitioners, consultants, and the world’s greatest concentration of markup theorists. Some participants are busy designing replacements for XML while other still use SGML (and know why they do).

Discussion is open, candid, and unashamedly technical.

Balisage 2016 Program: http://www.balisage.net/2016/Program.html

Symposium Program: http://balisage.net/XML-In-Web-Out/symposiumProgram.html

Even if you don’t eat RELAX grammars at snack time, put Balisage on your conference schedule. Even if a bit scruffy looking, the long time participants like new document/information problems or new ways of looking at old ones. Not to mention they, on occasion, learn something from newcomers as well.

It is a unique opportunity to meet the people who engineered the tools and specs that you use day to day.

Be forewarned that most of them have difficulty agreeing what controversial terms mean, like “document,” but that to one side, they are a good a crew as you are likely to meet.

Enjoy!

Balisage 2016, 2–5 August 2016 [XML That Makes A Difference!]

Tuesday, February 2nd, 2016

Call for Participation

Dates:

  • 25 March 2016 — Peer review applications due
  • 22 April 2016 — Paper submissions due
  • 21 May 2016 — Speakers notified
  • 10 June 2016 — Late-breaking News submissions due
  • 16 June 2016 — Late-breaking News speakers notified
  • 8 July 2016 — Final papers due from presenters of peer reviewed papers
  • 8 July 2016 — Short paper or slide summary due from presenters of late-breaking news
  • 1 August 2016 — Pre-conference Symposium
  • 2–5 August 2016 — Balisage: The Markup Conference

From the call:

Balisage is the premier conference on the theory, practice, design, development, and application of markup. We solicit papers on any aspect of markup and its uses; topics include but are not limited to:

  • Web application development with XML
  • Informal data models and consensus-based vocabularies
  • Integration of XML with other technologies (e.g., content management, XSLT, XQuery)
  • Performance issues in parsing, XML database retrieval, or XSLT processing
  • Development of angle-bracket-free user interfaces for non-technical users
  • Semistructured data and full text search
  • Deployment of XML systems for enterprise data
  • Web application development with XML
  • Design and implementation of XML vocabularies
  • Case studies of the use of XML for publishing, interchange, or archiving
  • Alternatives to XML
  • the role(s) of XML in the application lifecycle
  • the role(s) of vocabularies in XML environments

Full papers should be submitted by the deadline given below. All papers are peer-reviewed — we pride ourselves that you will seldom get a more thorough, skeptical, or helpful review than the one provided by Balisage reviewers.

Whether in theory or practice, let’s make Balisage 2016 the one people speak of in hushed tones at future markup and information conferences.

Useful semantics continues to flounder about, cf. Vice-President Biden’s interest in “one cancer research language.” Easy enough to say. How hard could it be?

Documents are commonly thought of and processed as if from BOM to EOF is the definition of a document. Much to our impoverishment.

Silo dissing has gotten popular. What if we could have our silos and eat them too?

Let’s set our sights on a Balisage 2016 where non-technicals come away saying “I want that!”

Have your first drafts done well before the end of February, 2016!

Facets for Christmas!

Friday, December 25th, 2015

Facet Module

From the introduction:

Faceted search has proven to be enormously popular in the real world applications. Faceted search allows user to navigate and access information via a structured facet classification system. Combined with full text search, it provides user with enormous power and flexibility to discover information.

This proposal defines a standardized approach to support the Faceted search in XQuery. It has been designed to be compatible with XQuery 3.0, and is intended to be used in conjunction with XQuery and XPath Full Text 3.0.

Imagine my surprise when after opening Christmas presents with family to see a tweet by XQuery announcing yet another Christmas present:

“Facets”: A new EXPath spec w/extension functions & data models to enable faceted navigation & search in XQuery http://expath.org/spec/facet

The EXPath homepage says:

XPath is great. XPath-based languages like XQuery, XSLT, and XProc, are great. The XPath recommendation provides a foundation for writing expressions that evaluate the same way in a lot of processors, written in different languages, running in different environments, in XML databases, in in-memory processors, in servers or in clients.

Supporting so many different kinds of processor is wonderful thing. But this also contrains which features are feasible at the XPath level and which are not. In the years since the release of XPath 2.0, experience has gradually revealed some missing features.

EXPath exists to provide specifications for such missing features in a collaborative- and implementation-independent way. EXPath also provides facilities to help and deliver implementations to as many processors as possible, via extensibility mechanisms from the XPath 2.0 Recommendation itself.

Other projects exist to define extensions for XPath-based languages or languages using XPath, as the famous EXSLT, and the more recent EXQuery and EXProc projects. We think that those projects are really useful and fill a gap in the XML core technologies landscape. Nevertheless, working at the XPath level allows common solutions when there is no sense in reinventing the wheel over and over again. This is just following the brilliant idea of the W3C’s XSLT and XQuery working groups, which joined forces to define XPath 2.0 together. EXPath purpose is not to compete with other projects, but collaborate with them.

Be sure to visit the resources page. It has a manageable listing of processors that handle extensions.

What would you like to see added to XPath?

Enjoy!

My Bad – You Are Not! 747 Edits Away From Using XML Tools

Thursday, December 17th, 2015

The original, unedited post is below but in response to comments, I checked the XQuery, XPath, XSLT and XQuery Serialization 3.1 files in Chrome (CNTR-U) before saving them.

All the empty elements were properly closed.

I then saved the files and re-opened in Emacs, to discover that Chrome had stripped the “/” from the empty elements, which then caused BaseX to complain. It was an accurate complaint but the files I was tossing against BaseX were not the files as published by the W3C.

So now I need to file a bug report on Chrome, Version 47.0.2526.80 (64-bit) on Ubuntu, for mangling closed empty elements.


You could tell in XQuery, XPath, XSLT and XQuery Serialization 3.1, New Candidate Recommendations! that I was really excited to see the new drafts hit the street.

Me and my big mouth.

I grabbed copies of all three and tossed the XQuery draft against an xquery to create a list of all the paths in it. Simple enough.

The result weren’t.

Here is the first error message:

[FODC0002] “file:/home/patrick/working/w3c/XQuery3.1.html” (Line 68): The element type “link” must be terminated by the matching end-tag “</link>”.

Ouch!

I corrected that and running the query a second time I got:

[FODC0002] “file:/home/patrick/working/w3c/XQuery3.1.html” (Line 68): The element type “meta” must be terminated by the matching end-tag “</meta>”.

The <meta> elements appear on lines three and four.

On the third try:

[FODC0002] “file:/home/patrick/working/w3c/XQuery3.1.html” (Line 69): The element type “img” must be terminated by the matching end-tag “</img>”.

There are 3 <img> elements that are not closed.

I’m getting fairly annoyed at this point.

Fourth try:

[FODC0002] “file:/home/patrick/working/w3c/XQuery3.1.html” (Line 78): The element type “br” must be terminated by the matching end-tag “</br>”.

Of course at this point I revert to grep and discover there are 353
elements that are not closed.

Sigh, nothing to do but correct and soldier on.

Fifth attempt.

[FODC0002] “file:/home/patrick/working/w3c/XQuery3.1.html” (Line 17618): The element type “hr” must be terminated by the matching end-tag “</hr>”.

There are 2 <hr> elements that are not closed.

A total of 361 edits in order to use XML based tools with the most recent XQuery 3.1 Candidate draft.

The most recent XPath 3.1 has 238 empty elements that aren’t closed (same elements as XQuery 3.1).

The XSLT and XQuery Serialization 3.1 draft has 149 empty elements that aren’t closed, same as the other but with the addition of four <col> elements that weren’t closed.

Grand total: 747 edits in order to use XML tools.

Not an editorial but a production problem. A rather severe one it seems to me.

Anyone who wants to use XML tools on these drafts will have to perform the same edits.

XQuery, XPath, XSLT and XQuery Serialization 3.1, New Candidate Recommendations!

Thursday, December 17th, 2015

As I forecast 😉 earlier this week, new Candidate Recommendations for:

XQuery 3.1: An XML Query Language

XML Path Language (XPath) 3.1

XSLT and XQuery Serialization 3.1

have hit the streets for your review and comments!

Comments due by 2016-01-31.

That’s forty-five days, minus the ones spent with drugs/sex/rock-n-roll over the holidays and recovering from same.

Say something shy of forty-four actual working days (my endurance isn’t what it once was) for the review process.

What tools, techniques are you going to use to review this latest set of candidates?

BTW, some people review software and check only fixes, for standards I start at the beginning, go to the end, then stop. (Or the reverse for backward proofing.)

My estimates on days spent with drugs/sex/rock-n-rock are approximate only and your experience may vary.

XQuery, XPath, XSLT and XQuery Serialization 3.1 (Back-to-Front) Drafts (soon!)

Monday, December 14th, 2015

XQuery, XPath, XSLT and XQuery Serialization 3.1 (Back-to-Front) Drafts will be published quite soon so I wanted to give you a heads up on your holiday reading schedule.

This is deep enough in the review cycle that a back-to-front reading is probably your best approach.

You have read the drafts and corrections often enough by this point that you read the first few words of a paragraph and you “know” what it says so you move on. (At the very least I can report that happens to me.)

By back-to-front reading I mean to start at the end of each draft and read the last sentence and then the next to last sentence and so on.

The back-to-front process does two things:

  1. You are forced to read each sentence on its own.
  2. It prevents skimming and filling in errors with silent corrections (unknown to your conscious mind).

The back-to-front method is quite time consuming so its fortunate these drafts are due to appear just before a series of holidays in a large number of places.

I hesitate to mention it but there is another way to proof these drafts.

If you have XML experienced visitors, you could take turns reading the drafts to each other. It was a technique used by copyists many years ago where one person read and two others took down the text. The two versions were then compared to each other and the original.

Even with a great reading voice, I’m not certain many people would be up to that sort of exercise.

PS: I will post on the new drafts as soon as they are published.

XQuery, 2nd Edition, Updated! (A Drawback to XQuery)

Tuesday, December 8th, 2015

XQuery, 2nd Edition, Updated! by Priscilla Walmsley.

The updated version of XQuery, 2nd Edition has hit the streets!

As a plug for the early release program at O’Reilly, yours truly appears in the acknowledgments (page xxii) from having submitted comments on the early release version of XQuery. You can too. Early release participation is yet another way to contribute back to the community.

There is one drawback to XQuery which I discuss below.

For anyone not fortunate enough to already have a copy of XQuery, 2nd Edition, here is the full description from the O’Reilly site:

The W3C XQuery 3.1 standard provides a tool to search, extract, and manipulate content, whether it’s in XML, JSON or plain text. With this fully updated, in-depth tutorial, you’ll learn to program with this highly practical query language.

Designed for query writers who have some knowledge of XML basics, but not necessarily advanced knowledge of XML-related technologies, this book is ideal as both a tutorial and a reference. You’ll find background information for namespaces, schemas, built-in types, and regular expressions that are relevant to writing XML queries.

This second edition provides:

  • A high-level overview and quick tour of XQuery
  • New chapters on higher-order functions, maps, arrays, and JSON
  • A carefully paced tutorial that teaches XQuery without being bogged down by the details
  • Advanced concepts for taking advantage of modularity, namespaces, typing, and schemas
  • Guidelines for working with specific types of data, such as numbers, strings, dates, URIs, maps and arrays
  • XQuery’s implementation-specific features and its relationship to other standards including SQL and XSLT
  • A complete alphabetical reference to the built-in functions, types, and error messages

Drawback to XQuery:

You know I hate to complain, but the brevity of XQuery is a real drawback to billing.

For example, I have a post pending on taking 604 lines of XSLT down to 35 lines of XQuery.

Granted the XQuery is easier to maintain, modify, extend, but all a client will see is the 35 lines of XQuery. At least 604 lines of XSLT looks like you really worked to produce something.

I know about XQueryX but I haven’t seen any automatic way to convert XQuery into XQueryX. Am I missing something obvious? If that’s possible, I could just bulk up the deliverable with an XQueryX expression of the work and keep the XQuery version for production use.

As excellent as I think XQuery and Walmsley’s book both are, I did want to warn you about the brevity of your XQuery deliverables.

I look forward to finish reading XQuery, 2nd Edition. I started doing so many things based on the first twelve or so chapters that I just read selectively from that point on. It merits a complete read. You won’t be sorry you did.

XQuery and XPath Full Text 3.0 (Recommendation)

Tuesday, November 24th, 2015

XQuery and XPath Full Text 3.0

From 1.1 Full-Text Search and XML:

As XML becomes mainstream, users expect to be able to search their XML documents. This requires a standard way to do full-text search, as well as structured searches, against XML documents. A similar requirement for full-text search led ISO to define the SQL/MM-FT [SQL/MM] standard. SQL/MM-FT defines extensions to SQL to express full-text searches providing functionality similar to that defined in this full-text language extension to XQuery 3.0 and XPath 3.0.

XML documents may contain highly structured data (fixed schemas, known types such as numbers, dates), semi-structured data (flexible schemas and types), markup data (text with embedded tags), and unstructured data (untagged free-flowing text). Where a document contains unstructured or semi-structured data, it is important to be able to search using Information Retrieval techniques such as
scoring and weighting.

Full-text search is different from substring search in many ways:

  1. A full-text search searches for tokens and phrases rather than substrings. A substring search for news items that contain the string “lease” will return a news item that contains “Foobar Corporation releases version 20.9 …”. A full-text search for the token “lease” will not.
  2. There is an expectation that a full-text search will support language-based searches which substring search cannot. An example of a language-based search is “find me all the news items that contain a token with the same linguistic stem as ‘mouse'” (finds “mouse” and “mice”). Another example based on token proximity is “find me all the news items that contain the tokens ‘XML’ and ‘Query’ allowing up to 3 intervening tokens”.
  3. Full-text search must address the vagaries and nuances of language. Search results are often of varying usefulness. When you search a web site for cameras that cost less than $100, this is an exact search. There is a set of cameras that matches this search, and a set that does not. Similarly, when you do a string search across news items for “mouse”, there is only 1 expected result set. When you do a full-text search for all the news items that contain the token “mouse”, you probably expect to find news items containing the token “mice”, and possibly “rodents”, or possibly “computers”. Not all results are equal. Some results are more “mousey” than others. Because full-text search may be inexact, we have the notion of score or relevance. We generally expect to see the most relevant results at the top of the results list.

Note:

As XQuery and XPath evolve, they may apply the notion of score to querying structured data. For example, when making travel plans or shopping for cameras, it is sometimes useful to get an ordered list of near matches in addition to exact matches. If XQuery and XPath define a generalized inexact match, we expect XQuery and XPath to utilize the scoring framework provided by XQuery and XPath Full Text 3.0.

Definition: Full-text queries are performed on tokens and phrases. Tokens and phrases are produced via tokenization.] Informally, tokenization breaks a character string into a sequence of tokens, units of punctuation, and spaces.

Tokenization, in general terms, is the process of converting a text string into smaller units that are used in query processing. Those units, called tokens, are the most basic text units that a full-text search can refer to. Full-text operators typically work on sequences of tokens found in the target text of a search. These tokens are characterized by integers that capture the relative position(s) of the token inside the string, the relative position(s) of the sentence containing the token, and the relative position(s) of the paragraph containing the token. The positions typically comprise a start and an end position.

Tokenization, including the definition of the term “tokens”, SHOULD be implementation-defined. Implementations SHOULD expose the rules and sample results of tokenization as much as possible to enable users to predict and interpret the results of tokenization. Tokenization operates on the string value of an item; for element nodes this does not include the content of attribute nodes, but for attribute nodes it does. Tokenization is defined more formally in 4.1 Tokenization.

[Definition: A token is a non-empty sequence of characters returned by a tokenizer as a basic unit to be searched. Beyond that, tokens are implementation-defined.] [Definition: A phrase is an ordered sequence of any number of tokens. Beyond that, phrases are implementation-defined.]

Not a fast read but a welcome one!

XQuery and XPath increase the value of all XML-encoded documents, at least down to the level of their markup. Beyond nodes, you are on your own.

XQuery and XPath Full Text 3.0 extend XQuery and XPath beyond existing markup in documents. Content that was too expensive or simply not of enough interest to encode, can still be reached in a robust and reliable way.

If you can “see” it with your computer, you can annotate it.

You might have to possess a copy of the copyrighted content, but still, it isn’t a closed box that resists annotation. Enabling you to sell the annotation as a value-add to the copyrighted content.

XQuery and XPath Full Text 3.0 says token and phrase are implementation defined.

Imagine the user (name) commented version of X movie, which is a driver file that has XQuery links into DVD playing on your computer (or rather to the data stream).

I rather like that idea.

PS: Check with a lawyer before you commercialize that annotation idea. I am not familiar with all EULAs and national laws.

XML Prague 2016 – Call for Papers [Looking for a co-author?]

Tuesday, November 3rd, 2015

XML Prague 2016 – Call for Papers

Important Dates:

  • November 30th – End of CFP (full paper or extended abstract)
  • January 4th – Notification of acceptance/rejection of paper to authors
  • January 25th – Final paper
  • February 11-13, XML Prague 2016

From the webpage:

XML Prague 2016 now welcomes submissions for presentations on the following topics:

  • Markup and the Extensible Web – HTML5, XHTML, Web Components, JSON and XML sharing the common space
  • Semantic visions and the reality – micro-formats, semantic data in business, linked data
  • Publishing for the 21th century – publishing toolchains, eBooks, EPUB, DITA, DocBook, CSS for print, …
  • XML databases and Big Data – XML storage, indexing, query languages, …
  • State of the XML Union – updates on specs, the XML community news, …

All proposals will be submitted for review by a peer review panel made up of the XML Prague Program Committee. Submissions will be chosen based on interest, applicability, technical merit, and technical correctness.

Accepted papers will be included in published conference proceedings.

Authors should strive to contain original material and belong in the topics previously listed. Submissions which can be construed as product or service descriptions (adverts) will likely be deemed inappropriate. Other approaches such as use case studies are welcome but must be clearly related to conference topics.

Accepted presenters must submit their full paper (on time) and give their presentation and answer questions in English, as well as follow the XML Prague 2016 conference guidelines.

I don’t travel but am interested in co-authoring a paper with someone who plans on attending XML Prague 2016. Contact me at patrick@durusau.net.

Turning the MS Battleship

Saturday, March 21st, 2015

Improving interoperability with DOM L3 XPath by Thomas Moore.

From the post:

As part of our ongoing focus on interoperability with the modern Web, we’ve been working on addressing an interoperability gap by writing an implementation of DOM L3 XPath in the Windows 10 Web platform. Today we’d like to share how we are closing this gap in Project Spartan’s new rendering engine with data from the modern Web.

Some History

Prior to IE’s support for DOM L3 Core and native XML documents in IE9, MSXML provided any XML handling and functionality to the Web as an ActiveX object. In addition to XMLHttpRequest, MSXML supported the XPath language through its own APIs, selectSingleNode and selectNodes. For applications based on and XML documents originating from MSXML, this works just fine. However, this doesn’t follow the W3C standards for interacting with XML documents or exposing XPath.

To accommodate a diversity of browsers, sites and libraries wrap XPath calls to switch to the right implementation. If you search for XPath examples or tutorials, you’ll immediately find results that check for IE-specific code to use MSXML for evaluating the query in a non-interoperable way:

It seems like a long time ago that a relatively senior Microsoft staffer told me that turning a battleship like MS takes time. No change, however important, is going to happen quickly. Just the way things are in a large organization.

The important thing to remember is that once change starts, that too takes on a certain momentum and so is more likely to continue, even though it was hard to get started.

Yes, I am sure the present steps towards greater interoperability could have gone further, in another direction, etc. but they didn’t. Rather than complain about the present change for the better, why not use that as a wedge to push for greater support for more recent XML standards?

For my part, I guess I need to get a copy of Windows 10 on a VM so I can volunteer as a beta tester for full XPath (XQuery?/XSLT?) support in a future web browser. MS as a full XML competitor and possible source of open source software would generate some excitement in the XML community!

W3C Invites Implementations of XQuery and XPath Full Text 3.0;…

Friday, March 13th, 2015

W3C Invites Implementations of XQuery and XPath Full Text 3.0; Supporting Requirements and Use Cases Draft Updated

From the post:

The XML Query Working Group and the XSLT Working Group invite implementation of the Candidate Recommendation of XQuery and XPath Full Text 3.0. The Full Text specification extends the XPath and XQuery languages to support fast and efficient full text searches over arbitrarily large collections of documents. This release brings the Full Text specification up to date with XQuery 3.0 and XPath 3.0; the language itself is unchanged.

Both groups also published an updated Working Draft of XQuery and XPath Full Text 3.0 Requirements and Use Cases. This document specifies requirements and use cases for Full-Text Search for use in XQuery 3.0 and XPath 3.0. The goal of XQuery and XPath Full Text 3.0 is to extend XQuery and XPath Full Text 1.0 with additional functionality in response to requests from users and implementors.

If you have comments that arise out of implementation experience, be advised that XQuery and XPath Full Text 3.0 will be a Candidate Recommendation until at least 26 March 2015.

Enjoy!

XPath/XQuery/FO/XDM 3.1 Comments Filed!

Friday, February 13th, 2015

I did manage to file seventeen (17) comments today on the XPath/XQuery/FO/XDM 3.1 drafts!

I haven’t mastered bugzilla well enough to create an HTML list of them to paste in here but no doubt will do so over the weekend.

Remember these are NOT “bugs” until they are accepted by the working group as “bugs.” Think of them as being suggestions on my part where the drafts were unclear or could be made clearer in my view.

Did you remember to post comments?

I will try to get a couple of short things posted tonight but getting the comments in was my priority today.

Balisage: The Markup Conference 2015

Wednesday, January 21st, 2015

Balisage: The Markup Conference 2015 – There is Nothing As Practical As A Good Theory

Key dates:
– 27 March 2015 — Peer review applications due
– 17 April 2015 — Paper submissions due
– 17 April 2015 — Applications for student support awards due
– 22 May 2015 — Speakers notified
– 17 July 2015 — Final papers due
– 10 August 2015 — Symposium on Cultural Heritage Markup
– 11–14 August 2015 — Balisage: The Markup Conference

Bethesda North Marriott Hotel & Conference Center, just outside Washington, DC (I know, no pool with giant head, etc. Do you think if we ask nicely they would put one in? And change the theme of the decorations about every 30 feet in the lobby?)

Balisage is the premier conference on the theory, practice, design, development, and application of markup. We solicit papers on any aspect of markup and its uses; topics include but are not limited to:

  • Cutting-edge applications of XML and related technologies
  • Integration of XML with other technologies (e.g., content management, XSLT, XQuery)
  • Web application development with XML
  • Performance issues in parsing, XML database retrieval, or XSLT processing
  • Development of angle-bracket-free user interfaces for non-technical users
  • Deployment of XML systems for enterprise data
  • Design and implementation of XML vocabularies
  • Case studies of the use of XML for publishing, interchange, or archiving
  • Alternatives to XML
  • Expressive power and application adequacy of XSD, Relax NG, DTDs, Schematron, and other schema languages
  • Detailed Call for Participation: http://balisage.net/Call4Participation.html
    About Balisage: http://balisage.net/
    Instructions for authors: http://balisage.net/authorinstructions.html

    For more information: info@balisage.net or +1 301 315 9631

    I wonder if the local authorities realize the danger in putting that many skilled markup people so close the source of so much content? (Washington) With attendees sparking off against each other, who knows?, could see an accountable and auditable legislative and rule making document flow arise. There may not be enough members of Congress in town to smother it.

    The revolution may not be televised but it will be powered by markup and its advocates. Come join the crowd with the tools to make open data transparent.

    pgcli [Inspiration for command line tool for XPath/XQuery?]

    Tuesday, January 20th, 2015

    pgcli

    From the webpage:

    Pgcli is a command line interface for Postgres with auto-completion and syntax highlighting.

    Postgres folks who don’t know about pgcli will be glad to see this post.

    But, having spent several days with XPath/XQuery/FO 3.1 syntax, I can only imagine the joy in XML circles for a similar utility for use with command line XML tools.

    Properly done, the increase in productivity would be substantial.

    The same applies for your favorite NoSQL query language. (Datomic?)

    Will SQL users be the only ones with such a command line tool?

    I first saw this in a tweet by elishowk.

    XPath/XQuery/FO/XDM 3.1 Definitions – Deduped/Sorted/Some Comments! Version 0.1

    Monday, January 19th, 2015

    My first set of the XPath/XQuery/FO/XDM 3.1 Definitions, deduped, sorted, along with some comments is now online!

    XPath, XQuery, XQuery and XPath Functions and Operators, XDM – 3.1 – Sorted Definitions Draft

    Let me emphasize this draft is incomplete and more comments are needed on the varying definitions.

    I have included all definitions, including those that are unique or uniform. This should help with your review of those definitions as well.

    I am continuing to work on this and other work products to assist in your review of these drafts.

    Reminder: Tentative deadline for comments at the W3C is 13 February 2015.

    Draft Sorted Definitions for XPath 3.1

    Thursday, January 15th, 2015

    I have uploaded a draft of sorted definitions for XPath 3.1. See: http://www.durusau.net/publications/xpath-alldefs-sorted.html

    I ran across an issue you may encounter in the future with W3C documents in general and these drafts in particular.

    While attempting to sort on the title attribute of the a elements that mark each definition, I got the following error:

    A sequence of more than one item is not allowed as the @select attribute of xsl:sort

    Really?

    The stylesheet was working with a subset of the items but not when I added more items to it.

    Turns out one of the items I added reads:

    <p>[<a name=”dt-focus” id=”dt-focus” title=”focus” shape=”rect”>Definition</a>: The first three components of the <a title=”dynamic context” href=”#dt-dynamic-context” shape=”rect”>dynamic context</a> (context item, context position, and context size) are called the <b>focus</b> of the expression. ] The focus enables the processor to keep track of which items are being processed by the expression. If any component in the focus is defined, all components of the focus are defined.</p>

    Ouch! The title attribute on the second a element was stepping into my sort select.

    The solution:

    <xsl:sort select=”a[position()=1]/@title” data-type=”text”/>

    As we have seen already, markup in W3C specifications varies from author to author so a fixed set of stylesheets may or may not be helpful. Some XSLT snippets on the other hand are likely to turn out to be quite useful.

    One of the requirements for the master deduped and sorted definitions is that I want to know the origin(s) of all the definitions. That is if the definition only occurs in XQuery, I want to know that as well was if the definition only occurs in XPath and XQuery, and so on.

    Still thinking about the best way to make that easy to replicate. Mostly because you are going to encounter definition issues in any standard you proof.

    Corrected Definitions Lists for XPath/XQuery/etc.

    Thursday, January 15th, 2015

    In my extraction of the definitions yesterday I produced files that had HTML <p> elements embedded in other HTML <p> elements.

    The corrected files are as follows:

    These lists are unsorted and the paragraphs with multiple definitions are repeated for each definition. Helps me spot where I have multiple definitions that may be followed by non-normative prose, applicable to one or more definitions.

    The XSLT code I used yesterday was incorrect:

    <xsl:for-each select=”//p/a[contains(@name, ‘dt’)]”>
    <p>
    <xsl:copy-of select=”ancestor::p”/>
    </p>
    </xsl:for-each>

    And results in:

    <p>
    <p>[<a name=”dt-expression-context” id=”dt-expression-context” title=”expression context” shape=”rect”>Definition</a>: The <b>expression
    context</b> for a given expression consists of all the information
    that can affect the result of the expression.]
    </p>
    </p>

    Which is both ugly and incorrect.

    When using xsl:copy-of for a p element, the surrounding p elements were unnecessary.

    Thus (correctly):

    &lt:xsl:for-each select=”//p/a[contains(@name, ‘dt’)]”>
    <xsl:copy-of select=”ancestor::p”/>
    </xsl:for-each>

    I reproduced the corrected definition files above. Apologies for any inconvenience.

    Work continues on the sorting and deduping.

    Building Definitions Lists for XPath/XQuery/etc.

    Wednesday, January 14th, 2015

    I have extracted the definitions from:

    These lists are unsorted and the paragraphs with multiple definitions are repeated for each definition. Helps me spot where I have multiple definitions that may be followed by non-normative prose, applicable to one or more definitions.

    Usual follies trying to extract the definitions.

    My first attempt (never successful in my experience but I have to try it so as to get to the second, third, etc.) resulted in:

    DefinitionDefinitionDefinitionDefinitionDefinitionDefinitionDefinitionDefinitionDefinition

    Which really wasn’t what I meant. Unfortunately it was what I had asked for. 😉

    Just in case you are curious, the guts to extracting the definitions reads:

    <xsl:for-each select=”//p/a[contains(@name, ‘dt’)]”>
    <p>
    <xsl:copy-of select=”ancestor::p”/>
    </p>
    </xsl:for-each>

    Each of the definitions is contained in a p element where the anchor for the definition is contained in an a element with the attribute name, “dt-(somename).”

    This didn’t work in all four (4) cases because XPath and XQuery Functions and Operators 3.1 records its “[Definition” elements as:

    <p><span class=”termdef”><a name=”character” id=”character” shape=”rect”></a>[Definition] A <b>character</b> is an instance of the <a href=”http://www.w3.org/TR/REC-xml/#NT-Char” shape=”rect”>Char</a><sup><small>XML</small></sup> production of <a href=”#xml” shape=”rect”>[Extensible Markup Language (XML) 1.0 (Fifth Edition)]</a>.</span></p>

    I’m sure there is some complex trickery you could use to account for that case but with four files, this is meatball XSLT, results over elegance.

    Multiple definitions in one paragraph must be broken out so they can be sorted along with the other definitions.

    The one thing I forgot to do in the XSLT that you should do when comparing multiple standards was to insert an identifier at the end of each paragraph for the text it was drawn from. Thus:

    [Definition: Every instance of the data model is a sequence. XDM]

    Where XDM is in a different color for each source.

    Proofing all these definitions across four (4) specifications (XQueryX has no additions definitions, aside from unnecessarily restating RFC 2119) is no trivial matter. Which is why I have extracted them and will be producing a deduped and sorted version.

    When you have long or complicated standards to proof, it helps to break them down in to smaller parts. Especially if the parts are out of their normal reading context. That helps avoid simply nodding along because you have read the material so many times.

    FYI, comments on errors most welcome! Producing the lists was trivial. Proofing the headers, footers, license language, etc. took longer than the lists.

    Enjoy!

    More on Definitions in XPath/XQuery/XDM 3.1

    Tuesday, January 13th, 2015

    I was thinking about the definitions I extracted from XPath 3.1 Definitions Here! Definitions There! Definitions Everywhere! XPath/XQuery 3.1 and since the XPath 3.1 draft says:

    Because these languages are so closely related, their grammars and language descriptions are generated from a common source to ensure consistency, and the editors of these specifications work together closely.

    We are very likely to find that the material contained in definitions and the paragraphs containing definitions are in fact the same.

    To make the best use of your time then, what is needed is a single set of the definitions from XPath 3.1, XQuery 3.1, XQueryX 3.1, XQuery and XPath Data Model 3.1, and XQuery Functions and Operators 3.1.

    I say that, but then on inspecting some of the definitions in XQuery and XPath Data Model 3.1, I read:

    [Definition: An atomic value is a value in
    the value space of an atomic type and is labeled with the name of
    that atomic type.]

    [Definition: An atomic type is a primitive
    simple type
    or a type derived by restriction from another
    atomic type.] (Types derived by list or union are not atomic.)

    But in the file of definitions from XPath 3.1, I read:

    [Definition: An atomic value is a value in the value space of an atomic type, as defined in [XML Schema 1.0] or [XML Schema 1.1].]

    Not the same are they?

    What happened to:

    and is labeled with the name of that atomic type.

    That seems rather important. Yes?

    The phrase “atomic type” occurs forty-six (46) times in the XPath 3.1 draft, none of which define “atomic type.”

    It does define “generalized atomic type:”

    [Definition: A generalized atomic type is a type which is either (a) an atomic type or (b) a pure union type ].

    Which would make you think it would have to define “atomic type” as well, to declare the intersection with “pure union type.” But it doesn’t.

    In case you are curious, XML Schema 1.1 doesn’t define “atomic type” either. Rather it defines “anyAtomicType.”

    In XML Schema 1.0 Part 1, the phrase “atomic type” is used once and only once in “3.14.1 (non-normative) The Simple Type Definition Schema Component,” saying:

    Each atomic type is ultimately a restriction of exactly one such built-in primitive datatype, which is its {primitive type definition}.

    There is no formal definition nor is there any further discussion of “atomic type” in XML Schema 1.0 Part 1.

    XML Schema Part 2 is completely free of any mention of “atomic type.”

    Summary of the example:

    At this point we have been told that XPath 3.1 relies on XQuery and XPath Data Model 3.1 but also XML Schema 1 and XML Schema 1.1, which have inconsistent definitions of “atomic type,” when it exists at all.

    Moreover, XPath 3.1 relies upon undefined term (atomic value) to define another term (generalized atomic type), which is surely an error in any reading.

    This is a good illustration of what happens when definitions are not referenced from other documents with specific and resolvable references. Anyone checking such a definition would have noticed it missing in the referenced location.

    Summary on next steps:

    I was going to say a deduped set of definitions would serve for proofing all the drafts and now, despite the “production from a common source,” I’m not so sure.

    Probably the best course is to simply extract all the definitions and check them for duplication rather than assuming it.

    The questions of what should be note material and other issues will remain to be explored.

    Definitions Here! Definitions There! Definitions Everywhere! XPath/XQuery 3.1

    Monday, January 12th, 2015

    Would you believe there are one hundred and forty-eight definitions embedded in XPath 3.1?

    What strikes me as odd is that the same one hundred and forty-eight definitions appear in a non-normative glossary, sans what looks like the note material that follows some definitions in the normative prose.

    The first issue is why have definitions in both normative and non-normative prose? Particularly when the versions in non-normative prose lack the note type material found in the main text.

    Speaking of normative, did you now that normatively, document order is defined as:

    Informally, document order is the order in which nodes appear in the XML serialization of a document.

    So we have formal definitions that are giving us informal definitions.

    That may sound like being picky but haven’t we seen definitions of “document order” before?

    Grepping the current XML specifications from the W3C, I found 147 mentions of “document order” outside of the current drafts.

    I really don’t think we have gotten this far with XML without a definition of “document order.”

    Or “node,” “implementation defined,” “implementation dependent,” “type,” “digit,” “literal,” “map,” “item,” “axis step,” in those words or ones very close to them.

    • My first puzzle is why redefine terms that already exist in XML?
    • My second puzzle is the one I mentioned above, why repeat shorter versions of the definitions in an explicitly non-normative appendix to the text?

    For a concrete example of the second puzzle:

    For example:

    [Definition: The built-in functions
    supported by XPath 3.1 are defined in [XQuery and XPath Functions and Operators
    3.1]
    .] Additional functions may be provided
    in the static
    context
    . XPath per se does not provide a way to declare named
    functions, but a host language may provide such a
    mechanism.

    First, you are never told what section of XQuery and XPath Functions and Operators 3.1 has this definition so we are back to the 5,000 x N problem.

    Second, what part of:

    XPath per se does not provide a way to declare named functions, but a host language may provide such a mechanism.

    Does not look like a note to you?

    Does it announce some normative requirement for XPath?

    Proofing is made more difficult because of the overlap of these definitions, verbatim, in XQuery 3.1. Whether it is a complete overlap or not I can’t say because I haven’t extracted all the definitions from XQuery 3.1. The XQuery draft reports one hundred and sixty-five (165) definitions, so it introduces additional definitions. Just spot checking, the overlap looks substantial. Add to that the same repetition of terms as shorter entries in the glossary.

    There is the accomplice XQuery and XPath Data Model 3.1, which is alleged to be the source of many definitions but not well known enough to specify particular sections. In truth, many of the things it defines have no identifiers so precise reference (read hyperlinking to a particular entry) may not even be possible.

    I make that to be at least six sets of definitions, mostly repeated because one draft won’t or can’t refer to prior XML definitions of the same terms or the lack of anchors in these drafts, prevents cross-referencing by section number for the convenience of the reader.

    I can ease your burden to some extent, I have created an HTML file with all the definitions in XPath 3.1, the full definitions, for your use in proofing these drafts.

    I make no warranty about the quality of the text as I am a solo shop so have no one to proof copy other than myself. If you spot errors, please give a shout.


    I will see what I can do about extracting other material for your review.

    What we actually need is a concordance of all these materials, sans the digrams and syntax productions. KWIC concordances don’t do so well with syntax productions. Or tables. Still, it might be worth the effort.

    Structural Issues in XPath/XQuery/XPath-XQuery F&O Drafts

    Friday, January 9th, 2015

    Apologies as I thought I was going to be further along in demonstrating some proofing techniques for XPath 3.1, XQuery 3.1, XPath and XQuery Functions and Operations 3.1 by today.

    Instead, I encountered structural issues that are common to all three drafts that I didn’t anticipate but that need to be noted before going further with proofing. I will be using sample material to illustrate the problems and will not always have a sample from all three drafts or even note every occurrence of the issues. They are too numerous for that treatment and it would be repetition for repetition’s sake.

    First, consider these passages from XPath 3.1, 1 Introduction:

    [Definition: XPath 3.1 operates on the abstract, logical structure of an XML document, rather than its surface syntax. This logical structure, known as the data model, is defined in [XQuery and XPath Data Model (XDM) 3.1].]

    [Definition: An XPath 3.0 Processor processes a query according to the XPath 3.0 specification.] [Definition: An XPath 2.0 Processor processes a query according to the XPath 2.0 specification.] [Definition: An XPath 1.0 Processor processes a query according to the XPath 1.0 specification.]

    1. Unnumbered Definitions – Unidentified Cross-References

    The first structural issue that you will note with the “[Definition…” material is that all such definitions are unnumbered and appear throughout all three texts. The lack of numbering means that it is difficult to refer with any precision to a particular definition. How would I draw your attention to the third definition of the second grouping? Searching for XPath 1.0 turns up 79 occurrences in XPath 3.1 so that doesn’t sound satisfactory. (FYI, “Definition” turns up 193 instances.)

    While the “Definitions” have anchors that allow them to be addressed by cross-references, you should note that the cross-references are text hyperlinks that have no identifier by which a reader can find the definition without using the hyperlink. That is to say when I see:

    A lexical QName with a prefix can be converted into an expanded QName by resolving its namespace prefix to a namespace URI, using the statically known namespaces. [These are fake links to draw your attention to the text in question.]

    The hyperlinks in the original will take me to various parts of the document where these definitions occur, but if I have printed the document, I have no clue where to look for these definitions.

    The better practice is to number all the definitions and since they are all self-contained, to put them in a single location. Additionally, all interlinear references to those definitions (or other internal cross-references) should have a visible reference that enables a reader to find the definition or cross-reference, without use of an internal hyperlink.

    Example:

    A lexical QName Def-21 with a prefix can be converted into an expanded QName Def-19 by resolving its namespace prefix to a namespace URI, using the statically known namespaces. Def-99 [These are fake links to draw your attention to the text in question. The Def numbers are fictitious in this example. Actual references would have the visible definition numbers assigned to the appropriate definition.]

    2. Vague references – $N versus 5000 x $N

    Another problem I encountered was what I call “vague references,” or less generously, $N versus 5,000 x $N.

    For example:

    [Definition: An atomic value is a value in the value space of an atomic type, as defined in [XML Schema 1.0] or [XML Schema 1.1].] [Definition: A node is an instance of one of the node kinds defined in [XQuery and XPath Data Model (XDM) 3.1].

    Contrary to popular opinion, standards don’t write themselves and every jot and tittle was placed in a draft at the expense of someone’s time and resources. Let’s call that $N.

    In the example, you and I both know somewhere in XML Schema 1.0 and XML Schema 1.1 that the “value space of the atomic type” is defined. The same is true for nodes and XQuery and XPath Data Model (XDM) 3.1. But where? The authors of these specifications could insert that information at a cost of $N.

    What is the cost of not inserting that information in the current drafts? I estimate the number of people interested in reading these drafts to be 5,000. So each of those person will have to find the same information omitted from these specifications, which is a cost of 5,000 x $N. In terms of convenience to readers and reducing their costs of reading these specifications, references to exact locations in other materials are a necessity.

    In full disclosure, I have no more or less reason to think 5,000 people are interested in these drafts than the United States has for positing the existence of approximately 5,000 terrorists in the world. I suspect the number of people interested in XML is actually higher but the number works to make the point. Editors can either convenience themselves or their readers.

    Vague references are also problematic in terms of users finding the correct reference. The citation above, [XML Schema 1.0] for “value space of an atomic type,” refers to all three parts of XML Schema 1.0.

    Part 1, at 3.14.1 (non-normative) The Simple Type Definition Schema Component, has the only reference to “atomic type.”

    Part 2, actually has “0” hits for “atomic type.” True enough, “2.5.1.1 Atomic datatypes” is likely the intended reference but that isn’t what the specification says to look for.

    Bottom line is that any external reference needs to include in the inline citation the precise internal reference in the work being cited. If you want to inconvenience readers by pointing to internal bibliographies rather than online HTML documents, where available, that’s an editorial choice. But in any event, for every external reference, give the internal reference in the work being cited.

    Your readers will appreciate it and it could make your work more accurate as well.

    3. Normative vs. Non-Normative Text

    Another structural issue which is important for proofing is the distinction between normative and non-normative text.

    In XPath 3.1, still in the Introduction we read:

    This document normatively defines the static and dynamic semantics of XPath 3.1. In this document, examples and material labeled as “Note” are provided for explanatory purposes and are not normative.

    OK, and under 2.2.3.1 Static Analysis Phase (XPath 3.1), we find:

    Examples of inferred static types might be:

    Which is followed by a list so at least we know where the examples end.

    However, there are numerous cases of:

    For example, with the expression substring($a, $b, $c), $a must be of type xs:string (or something that can be converted to xs:string by the function calling rules), while $b and $c must be of type xs:double. [also in 2.2.3.1 Static Analysis Phase (XPath 3.1)]

    So, is that a non-normative example? If so, what is the nature of the “must” that occurs in it? Is that normative?

    Moreover, the examples (XPath 3.1 has 283 occurrences of that term, XQuery has 455 occurrences of that term, XPath and XQuery Functions and Operators have 537 occurrences of that term) are unnumbered, which makes referencing the examples by other materials very imprecise and wordy. For the use of authors creating secondary literature on these materials, to promote adoption, etc., number of all examples should be the default case.

    Oh, before anyone protests that XPath and XQuery Functions and Operators has separated its examples into lists, that is true but only partially. There remain 199 occurrences of “for example” which do not occur in lists. Where lists are used, converting to numbered examples should be trivial. The elimination of “for example” material may be more difficult. Hard to say without a good sampling of the cases.

    Conclusion:

    As I said at the outset, apologies for not reaching more substantive proofing techniques but structural issues are important for the readability and usability of specifications for readers. Being correct and unreadable isn’t a useful goal.

    It may seem like some of the changes I suggest are a big “ask” this late in the processing of these specifications. If this were a hand edited document, I would quickly agree with you. But it’s not. Or at least it shouldn’t be. I don’t know where the source is held but the HTML you read is an generated artifact.

    Gathering and numbering the definitions and inserting those numbers into the internal cross-references are a matter of applying a different style sheet to the source. Fixing the vague references and unnumbered example texts would take more editorial work but readers would greatly benefit from precise references and a clear separation of normative from non-normative text.

    I will try again over the weekend to reach aids for substantive proofing on these drafts. With luck, I will return to these drafts on Monday of next week (12 January 2014).

    MUST in XPath 3.1/XQuery 3.1/XQueryX 3.1

    Wednesday, January 7th, 2015

    I mentioned the problems with redefining may and must in XPath and XQuery Functions and Operators 3.1 in Redefining RFC 2119? Danger! Danger! Will Robinson! last Monday.

    Requirements language is one of the first things to check for any specification so I thought I should round that issue out by looking at the requirement language in XPath 3.1, XQuery 3.1, and, XQueryX 3.1.

    XPath 3.1

    XPath 3.1 includes RFC 2119 as a normative reference but then never cites RFC 2119 in the document or use the uppercase MUST.

    I suspect that is the case because of Appendix F Conformance:

    XPath is intended primarily as a component that can be used by other specifications. Therefore, XPath relies on specifications that use it (such as [XPointer] and [XSL Transformations (XSLT) Version 3.0]) to specify conformance criteria for XPath in their respective environments. Specifications that set conformance criteria for their use of XPath must not change the syntactic or semantic definitions of XPath as given in this specification, except by subsetting and/or compatible extensions.

    The specification of such a language may describe it as an extension of XPath provided that every expression that conforms to the XPath grammar behaves as described in this specification. (Edited on include the actual links to XPointer and XSLT, pointing internally to a bibliography defeats the purpose of hyperlinking.)

    Personally I would simply remove the RFC 2119 reference since XPath 3.1 is a set of definitions to which conformance is mandated or not, by other specifications.

    XQuery 3.1 and XQueryX 3.1

    XQuery 3.1 5 Conformance reads in part:

    This section defines the conformance criteria for an XQuery processor. In this section, the following terms are used to indicate the requirement levels defined in [RFC 2119]. [Definition: MUST means that the item is an absolute requirement of the specification.] [Definition: MUST NOT means that the item isan absolute prohibition of the specification.] [Definition: MAY means that an item is truly optional.] [Definition: SHOULD means that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.] (Emphasis in the original)

    XQueryX 3.1 5 Conformance reads in part:

    This section defines the conformance criteria for an XQueryX
    processor (see Figure 1, “Processing Model Overview”, in [XQuery 3.1: An XML Query Language] , Section 2.2 Processing Model XQ31.

    In this section, the following terms are used to indicate the requirement levels defined in [RFC 2119]. [Definition: MUST means that the item is an absolute requirement of the specification.] [Definition: SHOULD means that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.] [Definition: MAY means that an item is truly optional.]

    First, the better practice is not to repeat definitions found elsewhere (a source of error and misstatement) but to cite RFC 2119 as follows:

    The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in [RFC2119].

    Second, the bolding found in XQuery 3.1 of MUST, etc., is unnecessary, particularly when not then followed by bolding in the use of MUST in the conformance clauses. Best practice to simply use UPPERCASE in both cases.

    Third, and really my principal reason for mentioning XQuery 3.1 and XQueryX 3.1 is to call attention to their use of RFC 2119 keywords. That is to say you will find the keywords in the conformance clauses and not any where else in the specification.

    Both use the word “must” in their texts but only as would normally appear in prose and implementers don’t have to pour through a sprinkling of MUST as you see in some drafts, which makes for stilted writing and traps for the unwary.

    The usage of RFC 2119 keywords in XQuery 3.1 and XQueryX 3.1 make the job of writing in declarative prose easier, eliminates the need to distinguish MUST and must in the normative text, and gives clear guidance to implementers as to the requirements to be met for conformance.

    I was quick to point out an error in my last post so it is only proper that I be quick to point out a best practice in XQuery 3.1 and XQueryX 3.1 as well.

    This coming Friday, 9 January 2015, I will have a post on proofing content proper for this bundle of specifications.

    PS: I am encouraging you to take on this venture into proofing specifications because this particular bundle of W3C specification work is important for pointing into data. If we don’t have reliable and consistent pointing, your topic maps will suffer.