Archive for the ‘XPath’ Category

XML Prague 2018 – Apology to Procrastinators

Thursday, October 12th, 2017

Apology to all procrastinators, I just saw the Call for Proposals for XML Prague 2018

You only have 50 days (until November 30, 2017) to submit your proposals for XML Prague 2018.

Efficient people don’t realize that 50 days is hardly enough time to put off thinking about a proposal topic, much less fail to write down anything for a proposal. Completely unreasonable demand but, do try to procrastinate quickly and get a proposal done for XML Prague 2018.

The suggestion of doing a “…short video…” seems rife with potential for humor and/or NSFW images. Perhaps XML Prague will post the best “…short videos…” to YouTube?

From the webpage:

XML Prague 2018 now welcomes submissions for presentations on the following topics:

• Markup and the Extensible Web – HTML5, XHTML, Web Components, JSON and XML sharing the common space
• Semantic visions and the reality – micro-formats, semantic data in business, linked data
• Publishing for the 21th century – publishing toolchains, eBooks, EPUB, DITA, DocBook, CSS for print, …
• XML databases and Big Data – XML storage, indexing, query languages, …
• State of the XML Union – updates on specs, the XML community news, …
• XML success stories – real-world use cases of successful XML deployments

There are several different types of slots available during the conference and you can indicate your preferred slot during submission:

30 minutes
15 minutes
These slots are suitable for normal conference talks.
90 minutes (unconference)
Ideal for holding users meeting or workshop during the unconference day (Thursday).

All proposals will be submitted for review by a peer review panel made up of the XML Prague Program Committee. Submissions will be chosen based on interest, applicability, technical merit, and technical correctness.

Authors should strive to contain original material and belong in the topics previously listed. Submissions which can be construed as product or service descriptions (adverts) will likely be deemed inappropriate. Other approaches such as use case studies are welcome but must be clearly related to conference topics.

Proposals can have several forms:

full paper
In our opinion still ideal and classical way of proposing presentation. Full paper gives reviewers enough information to properly asses your proposal.
extended abstract
Concise 1-4 page long description of your topic. If you do not have time to write full paper proposal this is one possible way to go. Try to make your extended abstract concrete and specific. Too short or vague abstract will not convince reviewers that it is worth including into the conference schedule.
short video (max. 5 minutes)
If you are not writing person but you still have something interesting to present. Simply capture short video (no longer then 5 minutes) containing part of your presentation. Video can capture you or it can be screen cast.

I mentioned XSLT security attacks recently, perhaps you could do something similar on XQuery? Other ways to use XML and related technologies to breach cybersecurity?

Do submit proposals and enjoy XML Prague 2018!

Procrastinators – Dates/Location for Balisage: The Markup Conference 2018

Wednesday, October 4th, 2017

Procrastinators can be caught short, without enough time for proper procrastination on papers and slides.

To insure ample time for procrastination, Balisage: The Markup Conference 2018 has published its dates and location.

31 July 2018–3 August 2018 … Balisage: The Markup Conference
30 July 2018 … Symposium – topic to be announced
CAMBRiA Hotel & Suites
1 Helen Heneghan Way
Rockville, Maryland 20850
USA

For indecisive procrastinators, Balisage offers suggestions for your procrastination:

The 2017 program included papers discussing XML vocabularies, cutting-edge digital humanities, lossless JSON/XML roundtripping, reflections on concrete syntax and abstract syntax, parsing and generation, web app development using the XML stack, managing test cases, pipelining and micropipelinging, electronic health records, rethinking imperative algorithms for XSLT and XQuery, markup and intellectual property, digitiziging Ethiopian and Eritrean manuscripts, exploring “shapes” in RDF and their relationship to schema validation, exposing XML data to users of varying technical skill, test-suite management, and use case studies about large conversion applications, DITA, and SaxonJS.

Innovative procrastinators can procrastinate on other related topics, including any they find on the Master Topic List (ideas procrastinated on for prior Balisage conferences).

Take advantage of this opportunity to procrastinate early and long on your Balisage submissions. You and your audience will be glad you did!

PS: Don’t procrastinate on saying thank you to Tommie Usdin and company for another year of Balisage. Balisage improves XML theory and practice every year it is held.

Balisage: The Markup Conference 2017 Program Now Available

Wednesday, May 17th, 2017

Balisage: The Markup Conference 2017 Program Now Available

An email from Tommie Usdin, Chair, Chief Organizer and herder of markup cats for Balisage advises:

Balisage: where serious markup practitioners and theoreticians meet every August.

The 2017 program includes papers discussing XML vocabularies, cutting-edge digital humanities, lossless JSON/XML roundtripping, reflections on concrete syntax and abstract syntax, parsing and generation, web app development using the XML stack, managing test cases, pipelining and micropipelinging, electronic health records, rethinking imperative algorithms for XSLT and XQuery, markup and intellectual property, why YOU should use (my favorite XML vocabulary), developing a system to aid in studying manuscripts in the tradition of the Ethiopian and Eritrean Highlands, exploring “shapes” in RDF and their relationship to schema validation, exposing XML data to users of varying technical skill, test-suite management, and use case studies about large conversion applications, DITA, and SaxonJS.

Up-Translation and Up-Transformation: A one-day Symposium on the goals, challenges, solutions, and workflows for significant XML enhancements, including approaches, tools, and techniques that may potentially be used for a variety of other tasks. The symposium will be of value not only to those facing up-translation and transformation but also to general XML practitioners seeking to get the most out of their data.

Are you interested in open information, reusable documents, and vendor and application independence? Then you need descriptive markup, and Balisage is your conference. Balisage brings together document architects, librarians, archivists, computer scientists, XML practitioners, XSLT and XQuery programmers, implementers of XSLT and XQuery engines and other markup-related software, semantic-Web evangelists, standards developers, academics, industrial researchers, government and NGO staff, industrial developers, practitioners, consultants, and the world’s greatest concentration of markup theorists. Some participants are busy designing replacements for XML while other still use SGML (and know why they do).

Discussion is open, candid, and unashamedly technical.

Balisage 2017 Program:
http://www.balisage.net/2017/Program.html

Symposium Program:
https://www.balisage.net/UpTransform

NOTE: Members of the TEI and their employees are eligible for discount Balisage registration.

You need to see the program for yourself but the highlights (for me) include: Ethiopic manuscripts (ok, so I have odd tastes), Earley parsers (of particular interest), English Majors (my wife was an English major), and a number of other high points.

Mark your calendar for July 31 – August 4, 2017 – It’s Balisage!

Pure CSS crossword – CSS Grid

Wednesday, April 19th, 2017

The UI is slick, although creating the puzzle remains on you.

Certainly suitable for string answers, XQuery/XPath/XSLT expressions, etc.

Enjoy!

XQuery 3.1 and Company! (Deriving New Versions?)

Wednesday, March 22nd, 2017

XQuery 3.1: An XML Query Language W3C Recommendation 21 March 2017

Hurray!

XML Path Language (XPath) 3.1

XPath and XQuery Functions and Operators 3.1

XQuery and XPath Data Model 3.1

No right to create modifications or derivatives of W3C documents is granted pursuant to this license, except as follows: To facilitate implementation of the technical specifications set forth in this document, anyone may prepare and distribute derivative works and portions of this document in software, in supporting materials accompanying software, and in documentation of software, PROVIDED that all such works include the notice below. HOWEVER, the publication of derivative works of this document for use as a technical specification is expressly prohibited.

You know I think the organization of XQuery 3.1 and friends could be improved but deriving and distributing “improved” versions is expressly prohibited.

Hmmm, but we are talking about XML and languages to query and transform XML.

Consider the potential of an query that calls XQuery 3.1: An XML Query Language and materials cited in it, then returns a version of XQuery 3.1 that has definitions from other standards off-set in the XQuery 3.1 text.

Or than inserts into the text examples or other materials.

For decades XML enthusiasts have bruited about dynamic texts but have produced damned few of them (as in zero) as their standards.

Let’s use the “no derivatives” language of the W3C as an incentive to not create another static document but a dynamic one that can grow or contract according to the wishes of its reader.

Suggestions for first round features?

Up-Translation and Up-Transformation … [Balisage Rocks!]

Sunday, January 29th, 2017

Up-Translation and Up-Transformation: Tasks, Challenges, and Solutions (a Balisage pre-conference symposium)

When & Where:

Monday July 31, 2017
CAMBRiA Hotel, Rockville, MD USA

Chair: Evan Owens, Cenveo

You need more details than that?

Ok, from the webpage:

Increasing the granularity and/or specificity of markup is an important task in many different content and information workflows. Markup transformations might involve tasks such as high-level structuring, detailed component structuring, or enhancing information by matching or linking to external vocabularies or data. Enhancing markup presents numerous secondary challenges including lack of structure of the inputs or inconsistency of input data down to the level of spelling, punctuation, and vocabulary. Source data for up-translation may be XML, word processing documents, plain text, scanned & OCRed text, or databases; transformation goals may be content suitable for page makeup, search, or repurposing, in XML, JSON, or any other markup language.

The range of approaches to up-transformation is as varied as the variety of specifics of the input and required outputs. Solutions may combine automated processing with human review or could be 100% software implementations. With the potential for requirements to evolve over time, tools may have to be actively maintained and enhanced.

The presentations in this pre-conference symposium will include goals, challenges, solutions, and workflows for significant XML enhancements, including approaches, tools, and techniques that may potentially be used for a variety of other tasks. The symposium will be of value not only to those facing up-translation and transformation but also to general XML practitioners seeking to get the most out of their data.

If I didn’t know better, up-translation and up-transformation sound suspiciously like conferred properties of topic maps fame.

Well, modulo that conferred properties could be predicated on explicit subject identity and not hidden in the personal knowledge of the author.

There are two categories of up-translation and up-transformation:

1. Ones that preserve jobs like spaghetti Cobol code, and
2. Ones that support easy long term maintenance.

While writing your paper for the pre-conference, which category fits yours the best?

XQuery/XSLT Proposals – Comments by 28 February 2017

Tuesday, January 24th, 2017

From the webpage:

The XML Query Working Group and XSLT Working Group have published a Proposed Recommendation for four documents:

• XQuery and XPath Data Model 3.1: This document defines the XQuery and XPath Data Model 3.1, which is the data model of XML Path Language (XPath) 3.1, XSL Transformations (XSLT) Version 3.0, and XQuery 3.1: An XML Query Language. The XQuery and XPath Data Model 3.1 (henceforth “data model”) serves two purposes. First, it defines the information contained in the input to an XSLT or XQuery processor. Second, it defines all permissible values of expressions in the XSLT, XQuery, and XPath languages.
• XPath and XQuery Functions and Operators 3.1: The purpose of this document is to catalog the functions and operators required for XPath 3.1, XQuery 3.1, and XSLT 3.0. It defines constructor functions, operators, and functions on the datatypes defined in XML Schema Part 2: Datatypes Second Edition and the datatypes defined in XQuery and XPath Data Model (XDM) 3.1. It also defines functions and operators on nodes and node sequences as defined in the XQuery and XPath Data Model (XDM) 3.1.
• XML Path Language (XPath) 3.1: XPath 3.1 is an expression language that allows the processing of values conforming to the data model defined in XQuery and XPath Data Model (XDM) 3.1. The name of the language derives from its most distinctive feature, the path expression, which provides a means of hierarchic addressing of the nodes in an XML tree. As well as modeling the tree structure of XML, the data model also includes atomic values, function items, and sequences.
• XSLT and XQuery Serialization 3.1: This document defines serialization of an instance of the data model as defined in XQuery and XPath Data Model (XDM) 3.1 into a sequence of octets. Serialization is designed to be a component that can be used by other specifications such as XSL Transformations (XSLT) Version 3.0 or XQuery 3.1: An XML Query Language.

Comments are welcome through 28 February 2017.

Unlike political flame wars on social media, comments on these proposed recommendatons could make a useful difference.

Enjoy!

XML.com Relaunch!

Monday, January 16th, 2017

XML.com

Lauren Wood posted this note about the relaunch of XML.com recently:

I’ve relaunched XML.com (for some background, Tim Bray wrote an article here: https://www.xml.com/articles/2017/01/01/xmlcom-redux/). I’m hoping it will become part of the community again, somewhere for people to post their news (submit your news here: https://www.xml.com/news/submit-news-item/) and articles (see the guidelines at https://www.xml.com/about/contribute/). I added a job board to the site as well (if you’re in Berlin, Germany, or able to
move there, look at the job currently posted; thanks LambdaWerk!); if your employer might want to post XML-related jobs please email me.

The old content should mostly be available but some articles were previously available at two (or more) locations and may now only be at one; try the archive list (https://www.xml.com/pub/a/archive/) if you’re looking for something. Please let me know if something major is missing from the archives.

XML is used in a lot of areas, and there is a wealth of knowledge in this community. If you’d like to write an article, send me your ideas. If you have comments on the site, let me know that as well.

Just in time as President Trump is about to stir, vigorously, that big pot of crazy known as federal data.

Mapping, processing, transformation demands will grow at an exponential rate.

Notice the emphasis on demand.

Taking a two weeks to write custom software to sort files (you know the Weiner/Abedin laptop story, yes?) won’t be acceptable quite soon.

How are your on-demand XML chops?

XQuery/XPath CRs 3.1! [#DisruptJ20 Twitter Game]

Tuesday, December 13th, 2016

Just in time for the holidays, new CRs for XQuery/XPath hit the street! Comments due by 2017-01-10.

XQuery and XPath Data Model 3.1 https://www.w3.org/TR/2016/CR-xpath-datamodel-31-20161213/

XML Path Language (XPath) 3.1 https://www.w3.org/TR/2016/CR-xpath-31-20161213/

XQuery 3.1: An XML Query Language https://www.w3.org/TR/2016/CR-xquery-31-20161213/

XPath and XQuery Functions and Operators 3.1 https://www.w3.org/TR/2016/CR-xpath-functions-31-20161213/

#DisruptJ20 is too late for comments to the W3C but you can break the boredom of indeterminate waiting to protest excitedly for TV cameras and/or to be arrested.

How?

Play the XQuery/XPath 3.1 Twitter Game!

Definitions litter the drafts and appear as:

[Definition: A sequence is an ordered collection of zero or more items.]

You Tweet:

An ordered collection of zero or more items? #xquery

Correct response:

A sequence.

Some definitions are too long to be tweeted in full:

An expanded-QName is a value in the value space of the xs:QName datatype as defined in the XDM data model (see [XQuery and XPath Data Model (XDM) 3.1]): that is, a triple containing namespace prefix (optional), namespace URI (optional), and local name. (xpath-functions)

Suggest you tweet:

A triple containing namespace prefix (optional), namespace URI (optional), and local name.

or

A value in the value space of the xs:QName datatype as defined in the XDM data model (see [XQuery and XPath Data Model (XDM) 3.1].

In both cases, the correct response:

An expanded-QName.

Use a $10 burner phone and it unlocked at protests. If your phone is searched, imagine the attempts to break the “code.” You could agree on definitions/responses as instructions for direct action. But I digress. 4 Days Left – Submission Alert – XML Prague Sunday, December 11th, 2016 A tweet by Jirka Kosek reminded me there are only 4 days left for XML Prague submissions! • December 15th – End of CFP (full paper or extended abstract) • January 8th – Notification of acceptance/rejection of paper to authors • January 29th – Final paper From the call for papers: XML Prague 2017 now welcomes submissions for presentations on the following topics: • Markup and the Extensible Web – HTML5, XHTML, Web Components, JSON and XML sharing the common space • Semantic visions and the reality – micro-formats, semantic data in business, linked data • Publishing for the 21th century – publishing toolchains, eBooks, EPUB, DITA, DocBook, CSS for print, … • XML databases and Big Data – XML storage, indexing, query languages, … • State of the XML Union – updates on specs, the XML community news, … All proposals will be submitted for review by a peer review panel made up of the XML Prague Program Committee. Submissions will be chosen based on interest, applicability, technical merit, and technical correctness. Accepted papers will be included in published conference proceedings. I don’t travel but if you need a last-minute co-author or proofer, you know where to find me! Balisage 2016 Program Posted! (Newcomers Welcome!) Monday, May 23rd, 2016 Tommie Usdin wrote today to say: Balisage: The Markup Conference 2016 Program Now Available http://www.balisage.net/2016/Program.html Balisage: where serious markup practitioners and theoreticians meet every August. The 2016 program includes papers discussing reducing ambiguity in linked-open-data annotations, the visualization of XSLT execution patterns, automatic recognition of grant- and funding-related information in scientific papers, construction of an interactive interface to assist cybersecurity analysts, rules for graceful extension and customization of standard vocabularies, case studies of agile schema development, a report on XML encoding of subtitles for video, an extension of XPath to file systems, handling soft hyphens in historical texts, an automated validity checker for formatted pages, one no-angle-brackets editing interface for scholars of German family names and another for scholars of Roman legal history, and a survey of non-XML markup such as Markdown. XML In, Web Out: A one-day Symposium on the sub rosa XML that powers an increasing number of websites will be held on Monday, August 1. http://balisage.net/XML-In-Web-Out/ If you are interested in open information, reusable documents, and vendor and application independence, then you need descriptive markup, and Balisage is the conference you should attend. Balisage brings together document architects, librarians, archivists, computer scientists, XML practitioners, XSLT and XQuery programmers, implementers of XSLT and XQuery engines and other markup-related software, Topic-Map enthusiasts, semantic-Web evangelists, standards developers, academics, industrial researchers, government and NGO staff, industrial developers, practitioners, consultants, and the world’s greatest concentration of markup theorists. Some participants are busy designing replacements for XML while other still use SGML (and know why they do). Discussion is open, candid, and unashamedly technical. Balisage 2016 Program: http://www.balisage.net/2016/Program.html Symposium Program: http://balisage.net/XML-In-Web-Out/symposiumProgram.html Even if you don’t eat RELAX grammars at snack time, put Balisage on your conference schedule. Even if a bit scruffy looking, the long time participants like new document/information problems or new ways of looking at old ones. Not to mention they, on occasion, learn something from newcomers as well. It is a unique opportunity to meet the people who engineered the tools and specs that you use day to day. Be forewarned that most of them have difficulty agreeing what controversial terms mean, like “document,” but that to one side, they are a good a crew as you are likely to meet. Enjoy! Balisage 2016, 2–5 August 2016 [XML That Makes A Difference!] Tuesday, February 2nd, 2016 Call for Participation Dates: • 25 March 2016 — Peer review applications due • 22 April 2016 — Paper submissions due • 21 May 2016 — Speakers notified • 10 June 2016 — Late-breaking News submissions due • 16 June 2016 — Late-breaking News speakers notified • 8 July 2016 — Final papers due from presenters of peer reviewed papers • 8 July 2016 — Short paper or slide summary due from presenters of late-breaking news • 1 August 2016 — Pre-conference Symposium • 2–5 August 2016 — Balisage: The Markup Conference From the call: Balisage is the premier conference on the theory, practice, design, development, and application of markup. We solicit papers on any aspect of markup and its uses; topics include but are not limited to: • Web application development with XML • Informal data models and consensus-based vocabularies • Integration of XML with other technologies (e.g., content management, XSLT, XQuery) • Performance issues in parsing, XML database retrieval, or XSLT processing • Development of angle-bracket-free user interfaces for non-technical users • Semistructured data and full text search • Deployment of XML systems for enterprise data • Web application development with XML • Design and implementation of XML vocabularies • Case studies of the use of XML for publishing, interchange, or archiving • Alternatives to XML • the role(s) of XML in the application lifecycle • the role(s) of vocabularies in XML environments Full papers should be submitted by the deadline given below. All papers are peer-reviewed — we pride ourselves that you will seldom get a more thorough, skeptical, or helpful review than the one provided by Balisage reviewers. Whether in theory or practice, let’s make Balisage 2016 the one people speak of in hushed tones at future markup and information conferences. Useful semantics continues to flounder about, cf. Vice-President Biden’s interest in “one cancer research language.” Easy enough to say. How hard could it be? Documents are commonly thought of and processed as if from BOM to EOF is the definition of a document. Much to our impoverishment. Silo dissing has gotten popular. What if we could have our silos and eat them too? Let’s set our sights on a Balisage 2016 where non-technicals come away saying “I want that!” Have your first drafts done well before the end of February, 2016! Facets for Christmas! Friday, December 25th, 2015 Facet Module From the introduction: Faceted search has proven to be enormously popular in the real world applications. Faceted search allows user to navigate and access information via a structured facet classification system. Combined with full text search, it provides user with enormous power and flexibility to discover information. This proposal defines a standardized approach to support the Faceted search in XQuery. It has been designed to be compatible with XQuery 3.0, and is intended to be used in conjunction with XQuery and XPath Full Text 3.0. Imagine my surprise when after opening Christmas presents with family to see a tweet by XQuery announcing yet another Christmas present: “Facets”: A new EXPath spec w/extension functions & data models to enable faceted navigation & search in XQuery http://expath.org/spec/facet The EXPath homepage says: XPath is great. XPath-based languages like XQuery, XSLT, and XProc, are great. The XPath recommendation provides a foundation for writing expressions that evaluate the same way in a lot of processors, written in different languages, running in different environments, in XML databases, in in-memory processors, in servers or in clients. Supporting so many different kinds of processor is wonderful thing. But this also contrains which features are feasible at the XPath level and which are not. In the years since the release of XPath 2.0, experience has gradually revealed some missing features. EXPath exists to provide specifications for such missing features in a collaborative- and implementation-independent way. EXPath also provides facilities to help and deliver implementations to as many processors as possible, via extensibility mechanisms from the XPath 2.0 Recommendation itself. Other projects exist to define extensions for XPath-based languages or languages using XPath, as the famous EXSLT, and the more recent EXQuery and EXProc projects. We think that those projects are really useful and fill a gap in the XML core technologies landscape. Nevertheless, working at the XPath level allows common solutions when there is no sense in reinventing the wheel over and over again. This is just following the brilliant idea of the W3C’s XSLT and XQuery working groups, which joined forces to define XPath 2.0 together. EXPath purpose is not to compete with other projects, but collaborate with them. Be sure to visit the resources page. It has a manageable listing of processors that handle extensions. What would you like to see added to XPath? Enjoy! My Bad – You Are Not! 747 Edits Away From Using XML Tools Thursday, December 17th, 2015 The original, unedited post is below but in response to comments, I checked the XQuery, XPath, XSLT and XQuery Serialization 3.1 files in Chrome (CNTR-U) before saving them. All the empty elements were properly closed. I then saved the files and re-opened in Emacs, to discover that Chrome had stripped the “/” from the empty elements, which then caused BaseX to complain. It was an accurate complaint but the files I was tossing against BaseX were not the files as published by the W3C. So now I need to file a bug report on Chrome, Version 47.0.2526.80 (64-bit) on Ubuntu, for mangling closed empty elements. You could tell in XQuery, XPath, XSLT and XQuery Serialization 3.1, New Candidate Recommendations! that I was really excited to see the new drafts hit the street. Me and my big mouth. I grabbed copies of all three and tossed the XQuery draft against an xquery to create a list of all the paths in it. Simple enough. The result weren’t. Here is the first error message: [FODC0002] “file:/home/patrick/working/w3c/XQuery3.1.html” (Line 68): The element type “link” must be terminated by the matching end-tag “</link>”. Ouch! I corrected that and running the query a second time I got: [FODC0002] “file:/home/patrick/working/w3c/XQuery3.1.html” (Line 68): The element type “meta” must be terminated by the matching end-tag “</meta>”. The <meta> elements appear on lines three and four. On the third try: [FODC0002] “file:/home/patrick/working/w3c/XQuery3.1.html” (Line 69): The element type “img” must be terminated by the matching end-tag “</img>”. There are 3 <img> elements that are not closed. I’m getting fairly annoyed at this point. Fourth try: [FODC0002] “file:/home/patrick/working/w3c/XQuery3.1.html” (Line 78): The element type “br” must be terminated by the matching end-tag “</br>”. Of course at this point I revert to grep and discover there are 353 elements that are not closed. Sigh, nothing to do but correct and soldier on. Fifth attempt. [FODC0002] “file:/home/patrick/working/w3c/XQuery3.1.html” (Line 17618): The element type “hr” must be terminated by the matching end-tag “</hr>”. There are 2 <hr> elements that are not closed. A total of 361 edits in order to use XML based tools with the most recent XQuery 3.1 Candidate draft. The most recent XPath 3.1 has 238 empty elements that aren’t closed (same elements as XQuery 3.1). The XSLT and XQuery Serialization 3.1 draft has 149 empty elements that aren’t closed, same as the other but with the addition of four <col> elements that weren’t closed. Grand total: 747 edits in order to use XML tools. Not an editorial but a production problem. A rather severe one it seems to me. Anyone who wants to use XML tools on these drafts will have to perform the same edits. XQuery, XPath, XSLT and XQuery Serialization 3.1, New Candidate Recommendations! Thursday, December 17th, 2015 As I forecast 😉 earlier this week, new Candidate Recommendations for: XQuery 3.1: An XML Query Language XSLT and XQuery Serialization 3.1 have hit the streets for your review and comments! Comments due by 2016-01-31. That’s forty-five days, minus the ones spent with drugs/sex/rock-n-roll over the holidays and recovering from same. Say something shy of forty-four actual working days (my endurance isn’t what it once was) for the review process. What tools, techniques are you going to use to review this latest set of candidates? BTW, some people review software and check only fixes, for standards I start at the beginning, go to the end, then stop. (Or the reverse for backward proofing.) My estimates on days spent with drugs/sex/rock-n-rock are approximate only and your experience may vary. XQuery, XPath, XSLT and XQuery Serialization 3.1 (Back-to-Front) Drafts (soon!) Monday, December 14th, 2015 XQuery, XPath, XSLT and XQuery Serialization 3.1 (Back-to-Front) Drafts will be published quite soon so I wanted to give you a heads up on your holiday reading schedule. This is deep enough in the review cycle that a back-to-front reading is probably your best approach. You have read the drafts and corrections often enough by this point that you read the first few words of a paragraph and you “know” what it says so you move on. (At the very least I can report that happens to me.) By back-to-front reading I mean to start at the end of each draft and read the last sentence and then the next to last sentence and so on. The back-to-front process does two things: 1. You are forced to read each sentence on its own. 2. It prevents skimming and filling in errors with silent corrections (unknown to your conscious mind). The back-to-front method is quite time consuming so its fortunate these drafts are due to appear just before a series of holidays in a large number of places. I hesitate to mention it but there is another way to proof these drafts. If you have XML experienced visitors, you could take turns reading the drafts to each other. It was a technique used by copyists many years ago where one person read and two others took down the text. The two versions were then compared to each other and the original. Even with a great reading voice, I’m not certain many people would be up to that sort of exercise. PS: I will post on the new drafts as soon as they are published. XQuery, 2nd Edition, Updated! (A Drawback to XQuery) Tuesday, December 8th, 2015 XQuery, 2nd Edition, Updated! by Priscilla Walmsley. The updated version of XQuery, 2nd Edition has hit the streets! As a plug for the early release program at O’Reilly, yours truly appears in the acknowledgments (page xxii) from having submitted comments on the early release version of XQuery. You can too. Early release participation is yet another way to contribute back to the community. There is one drawback to XQuery which I discuss below. For anyone not fortunate enough to already have a copy of XQuery, 2nd Edition, here is the full description from the O’Reilly site: The W3C XQuery 3.1 standard provides a tool to search, extract, and manipulate content, whether it’s in XML, JSON or plain text. With this fully updated, in-depth tutorial, you’ll learn to program with this highly practical query language. Designed for query writers who have some knowledge of XML basics, but not necessarily advanced knowledge of XML-related technologies, this book is ideal as both a tutorial and a reference. You’ll find background information for namespaces, schemas, built-in types, and regular expressions that are relevant to writing XML queries. This second edition provides: • A high-level overview and quick tour of XQuery • New chapters on higher-order functions, maps, arrays, and JSON • A carefully paced tutorial that teaches XQuery without being bogged down by the details • Advanced concepts for taking advantage of modularity, namespaces, typing, and schemas • Guidelines for working with specific types of data, such as numbers, strings, dates, URIs, maps and arrays • XQuery’s implementation-specific features and its relationship to other standards including SQL and XSLT • A complete alphabetical reference to the built-in functions, types, and error messages Drawback to XQuery: You know I hate to complain, but the brevity of XQuery is a real drawback to billing. For example, I have a post pending on taking 604 lines of XSLT down to 35 lines of XQuery. Granted the XQuery is easier to maintain, modify, extend, but all a client will see is the 35 lines of XQuery. At least 604 lines of XSLT looks like you really worked to produce something. I know about XQueryX but I haven’t seen any automatic way to convert XQuery into XQueryX. Am I missing something obvious? If that’s possible, I could just bulk up the deliverable with an XQueryX expression of the work and keep the XQuery version for production use. As excellent as I think XQuery and Walmsley’s book both are, I did want to warn you about the brevity of your XQuery deliverables. I look forward to finish reading XQuery, 2nd Edition. I started doing so many things based on the first twelve or so chapters that I just read selectively from that point on. It merits a complete read. You won’t be sorry you did. XQuery and XPath Full Text 3.0 (Recommendation) Tuesday, November 24th, 2015 XQuery and XPath Full Text 3.0 From 1.1 Full-Text Search and XML: As XML becomes mainstream, users expect to be able to search their XML documents. This requires a standard way to do full-text search, as well as structured searches, against XML documents. A similar requirement for full-text search led ISO to define the SQL/MM-FT [SQL/MM] standard. SQL/MM-FT defines extensions to SQL to express full-text searches providing functionality similar to that defined in this full-text language extension to XQuery 3.0 and XPath 3.0. XML documents may contain highly structured data (fixed schemas, known types such as numbers, dates), semi-structured data (flexible schemas and types), markup data (text with embedded tags), and unstructured data (untagged free-flowing text). Where a document contains unstructured or semi-structured data, it is important to be able to search using Information Retrieval techniques such as scoring and weighting. Full-text search is different from substring search in many ways: 1. A full-text search searches for tokens and phrases rather than substrings. A substring search for news items that contain the string “lease” will return a news item that contains “Foobar Corporation releases version 20.9 …”. A full-text search for the token “lease” will not. 2. There is an expectation that a full-text search will support language-based searches which substring search cannot. An example of a language-based search is “find me all the news items that contain a token with the same linguistic stem as ‘mouse'” (finds “mouse” and “mice”). Another example based on token proximity is “find me all the news items that contain the tokens ‘XML’ and ‘Query’ allowing up to 3 intervening tokens”. 3. Full-text search must address the vagaries and nuances of language. Search results are often of varying usefulness. When you search a web site for cameras that cost less than$100, this is an exact search. There is a set of cameras that matches this search, and a set that does not. Similarly, when you do a string search across news items for “mouse”, there is only 1 expected result set. When you do a full-text search for all the news items that contain the token “mouse”, you probably expect to find news items containing the token “mice”, and possibly “rodents”, or possibly “computers”. Not all results are equal. Some results are more “mousey” than others. Because full-text search may be inexact, we have the notion of score or relevance. We generally expect to see the most relevant results at the top of the results list.

Note:

As XQuery and XPath evolve, they may apply the notion of score to querying structured data. For example, when making travel plans or shopping for cameras, it is sometimes useful to get an ordered list of near matches in addition to exact matches. If XQuery and XPath define a generalized inexact match, we expect XQuery and XPath to utilize the scoring framework provided by XQuery and XPath Full Text 3.0.

Definition: Full-text queries are performed on tokens and phrases. Tokens and phrases are produced via tokenization.] Informally, tokenization breaks a character string into a sequence of tokens, units of punctuation, and spaces.

Tokenization, in general terms, is the process of converting a text string into smaller units that are used in query processing. Those units, called tokens, are the most basic text units that a full-text search can refer to. Full-text operators typically work on sequences of tokens found in the target text of a search. These tokens are characterized by integers that capture the relative position(s) of the token inside the string, the relative position(s) of the sentence containing the token, and the relative position(s) of the paragraph containing the token. The positions typically comprise a start and an end position.

Tokenization, including the definition of the term “tokens”, SHOULD be implementation-defined. Implementations SHOULD expose the rules and sample results of tokenization as much as possible to enable users to predict and interpret the results of tokenization. Tokenization operates on the string value of an item; for element nodes this does not include the content of attribute nodes, but for attribute nodes it does. Tokenization is defined more formally in 4.1 Tokenization.

[Definition: A token is a non-empty sequence of characters returned by a tokenizer as a basic unit to be searched. Beyond that, tokens are implementation-defined.] [Definition: A phrase is an ordered sequence of any number of tokens. Beyond that, phrases are implementation-defined.]

Not a fast read but a welcome one!

XQuery and XPath increase the value of all XML-encoded documents, at least down to the level of their markup. Beyond nodes, you are on your own.

XQuery and XPath Full Text 3.0 extend XQuery and XPath beyond existing markup in documents. Content that was too expensive or simply not of enough interest to encode, can still be reached in a robust and reliable way.

If you can “see” it with your computer, you can annotate it.

You might have to possess a copy of the copyrighted content, but still, it isn’t a closed box that resists annotation. Enabling you to sell the annotation as a value-add to the copyrighted content.

XQuery and XPath Full Text 3.0 says token and phrase are implementation defined.

Imagine the user (name) commented version of X movie, which is a driver file that has XQuery links into DVD playing on your computer (or rather to the data stream).

I rather like that idea.

PS: Check with a lawyer before you commercialize that annotation idea. I am not familiar with all EULAs and national laws.

XML Prague 2016 – Call for Papers [Looking for a co-author?]

Tuesday, November 3rd, 2015

XML Prague 2016 – Call for Papers

Important Dates:

• November 30th – End of CFP (full paper or extended abstract)
• January 4th – Notification of acceptance/rejection of paper to authors
• January 25th – Final paper
• February 11-13, XML Prague 2016

From the webpage:

XML Prague 2016 now welcomes submissions for presentations on the following topics:

• Markup and the Extensible Web – HTML5, XHTML, Web Components, JSON and XML sharing the common space
• Semantic visions and the reality – micro-formats, semantic data in business, linked data
• Publishing for the 21th century – publishing toolchains, eBooks, EPUB, DITA, DocBook, CSS for print, …
• XML databases and Big Data – XML storage, indexing, query languages, …
• State of the XML Union – updates on specs, the XML community news, …

All proposals will be submitted for review by a peer review panel made up of the XML Prague Program Committee. Submissions will be chosen based on interest, applicability, technical merit, and technical correctness.

Accepted papers will be included in published conference proceedings.

Authors should strive to contain original material and belong in the topics previously listed. Submissions which can be construed as product or service descriptions (adverts) will likely be deemed inappropriate. Other approaches such as use case studies are welcome but must be clearly related to conference topics.

Accepted presenters must submit their full paper (on time) and give their presentation and answer questions in English, as well as follow the XML Prague 2016 conference guidelines.

I don’t travel but am interested in co-authoring a paper with someone who plans on attending XML Prague 2016. Contact me at patrick@durusau.net.

Turning the MS Battleship

Saturday, March 21st, 2015

Improving interoperability with DOM L3 XPath by Thomas Moore.

From the post:

As part of our ongoing focus on interoperability with the modern Web, we’ve been working on addressing an interoperability gap by writing an implementation of DOM L3 XPath in the Windows 10 Web platform. Today we’d like to share how we are closing this gap in Project Spartan’s new rendering engine with data from the modern Web.

Some History

Prior to IE’s support for DOM L3 Core and native XML documents in IE9, MSXML provided any XML handling and functionality to the Web as an ActiveX object. In addition to XMLHttpRequest, MSXML supported the XPath language through its own APIs, selectSingleNode and selectNodes. For applications based on and XML documents originating from MSXML, this works just fine. However, this doesn’t follow the W3C standards for interacting with XML documents or exposing XPath.

To accommodate a diversity of browsers, sites and libraries wrap XPath calls to switch to the right implementation. If you search for XPath examples or tutorials, you’ll immediately find results that check for IE-specific code to use MSXML for evaluating the query in a non-interoperable way:

It seems like a long time ago that a relatively senior Microsoft staffer told me that turning a battleship like MS takes time. No change, however important, is going to happen quickly. Just the way things are in a large organization.

The important thing to remember is that once change starts, that too takes on a certain momentum and so is more likely to continue, even though it was hard to get started.

Yes, I am sure the present steps towards greater interoperability could have gone further, in another direction, etc. but they didn’t. Rather than complain about the present change for the better, why not use that as a wedge to push for greater support for more recent XML standards?

For my part, I guess I need to get a copy of Windows 10 on a VM so I can volunteer as a beta tester for full XPath (XQuery?/XSLT?) support in a future web browser. MS as a full XML competitor and possible source of open source software would generate some excitement in the XML community!

W3C Invites Implementations of XQuery and XPath Full Text 3.0;…

Friday, March 13th, 2015

W3C Invites Implementations of XQuery and XPath Full Text 3.0; Supporting Requirements and Use Cases Draft Updated

From the post:

The XML Query Working Group and the XSLT Working Group invite implementation of the Candidate Recommendation of XQuery and XPath Full Text 3.0. The Full Text specification extends the XPath and XQuery languages to support fast and efficient full text searches over arbitrarily large collections of documents. This release brings the Full Text specification up to date with XQuery 3.0 and XPath 3.0; the language itself is unchanged.

Both groups also published an updated Working Draft of XQuery and XPath Full Text 3.0 Requirements and Use Cases. This document specifies requirements and use cases for Full-Text Search for use in XQuery 3.0 and XPath 3.0. The goal of XQuery and XPath Full Text 3.0 is to extend XQuery and XPath Full Text 1.0 with additional functionality in response to requests from users and implementors.

If you have comments that arise out of implementation experience, be advised that XQuery and XPath Full Text 3.0 will be a Candidate Recommendation until at least 26 March 2015.

Enjoy!

Friday, February 13th, 2015

I did manage to file seventeen (17) comments today on the XPath/XQuery/FO/XDM 3.1 drafts!

I haven’t mastered bugzilla well enough to create an HTML list of them to paste in here but no doubt will do so over the weekend.

Remember these are NOT “bugs” until they are accepted by the working group as “bugs.” Think of them as being suggestions on my part where the drafts were unclear or could be made clearer in my view.

Did you remember to post comments?

I will try to get a couple of short things posted tonight but getting the comments in was my priority today.

Balisage: The Markup Conference 2015

Wednesday, January 21st, 2015

Balisage: The Markup Conference 2015 – There is Nothing As Practical As A Good Theory

Key dates:
– 27 March 2015 — Peer review applications due
– 17 April 2015 — Paper submissions due
– 17 April 2015 — Applications for student support awards due
– 22 May 2015 — Speakers notified
– 17 July 2015 — Final papers due
– 10 August 2015 — Symposium on Cultural Heritage Markup
– 11–14 August 2015 — Balisage: The Markup Conference

Bethesda North Marriott Hotel & Conference Center, just outside Washington, DC (I know, no pool with giant head, etc. Do you think if we ask nicely they would put one in? And change the theme of the decorations about every 30 feet in the lobby?)

Balisage is the premier conference on the theory, practice, design, development, and application of markup. We solicit papers on any aspect of markup and its uses; topics include but are not limited to:

• Cutting-edge applications of XML and related technologies
• Integration of XML with other technologies (e.g., content management, XSLT, XQuery)
• Web application development with XML
• Performance issues in parsing, XML database retrieval, or XSLT processing
• Development of angle-bracket-free user interfaces for non-technical users
• Deployment of XML systems for enterprise data
• Design and implementation of XML vocabularies
• Case studies of the use of XML for publishing, interchange, or archiving
• Alternatives to XML
• Expressive power and application adequacy of XSD, Relax NG, DTDs, Schematron, and other schema languages
• Detailed Call for Participation: http://balisage.net/Call4Participation.html
Instructions for authors: http://balisage.net/authorinstructions.html

I wonder if the local authorities realize the danger in putting that many skilled markup people so close the source of so much content? (Washington) With attendees sparking off against each other, who knows?, could see an accountable and auditable legislative and rule making document flow arise. There may not be enough members of Congress in town to smother it.

The revolution may not be televised but it will be powered by markup and its advocates. Come join the crowd with the tools to make open data transparent.

pgcli [Inspiration for command line tool for XPath/XQuery?]

Tuesday, January 20th, 2015

pgcli

From the webpage:

Pgcli is a command line interface for Postgres with auto-completion and syntax highlighting.

Postgres folks who don’t know about pgcli will be glad to see this post.

But, having spent several days with XPath/XQuery/FO 3.1 syntax, I can only imagine the joy in XML circles for a similar utility for use with command line XML tools.

Properly done, the increase in productivity would be substantial.

The same applies for your favorite NoSQL query language. (Datomic?)

Will SQL users be the only ones with such a command line tool?

I first saw this in a tweet by elishowk.

XPath/XQuery/FO/XDM 3.1 Definitions – Deduped/Sorted/Some Comments! Version 0.1

Monday, January 19th, 2015

My first set of the XPath/XQuery/FO/XDM 3.1 Definitions, deduped, sorted, along with some comments is now online!

XPath, XQuery, XQuery and XPath Functions and Operators, XDM – 3.1 – Sorted Definitions Draft

Let me emphasize this draft is incomplete and more comments are needed on the varying definitions.

I have included all definitions, including those that are unique or uniform. This should help with your review of those definitions as well.

I am continuing to work on this and other work products to assist in your review of these drafts.

Reminder: Tentative deadline for comments at the W3C is 13 February 2015.

Draft Sorted Definitions for XPath 3.1

Thursday, January 15th, 2015

I have uploaded a draft of sorted definitions for XPath 3.1. See: http://www.durusau.net/publications/xpath-alldefs-sorted.html

I ran across an issue you may encounter in the future with W3C documents in general and these drafts in particular.

While attempting to sort on the title attribute of the a elements that mark each definition, I got the following error:

A sequence of more than one item is not allowed as the @select attribute of xsl:sort

Really?

The stylesheet was working with a subset of the items but not when I added more items to it.

<p>[<a name=”dt-focus” id=”dt-focus” title=”focus” shape=”rect”>Definition</a>: The first three components of the <a title=”dynamic context” href=”#dt-dynamic-context” shape=”rect”>dynamic context</a> (context item, context position, and context size) are called the <b>focus</b> of the expression. ] The focus enables the processor to keep track of which items are being processed by the expression. If any component in the focus is defined, all components of the focus are defined.</p>

Ouch! The title attribute on the second a element was stepping into my sort select.

The solution:

<xsl:sort select=”a[position()=1]/@title” data-type=”text”/>

As we have seen already, markup in W3C specifications varies from author to author so a fixed set of stylesheets may or may not be helpful. Some XSLT snippets on the other hand are likely to turn out to be quite useful.

One of the requirements for the master deduped and sorted definitions is that I want to know the origin(s) of all the definitions. That is if the definition only occurs in XQuery, I want to know that as well was if the definition only occurs in XPath and XQuery, and so on.

Still thinking about the best way to make that easy to replicate. Mostly because you are going to encounter definition issues in any standard you proof.

Corrected Definitions Lists for XPath/XQuery/etc.

Thursday, January 15th, 2015

In my extraction of the definitions yesterday I produced files that had HTML <p> elements embedded in other HTML <p> elements.

The corrected files are as follows:

These lists are unsorted and the paragraphs with multiple definitions are repeated for each definition. Helps me spot where I have multiple definitions that may be followed by non-normative prose, applicable to one or more definitions.

The XSLT code I used yesterday was incorrect:

<xsl:for-each select=”//p/a[contains(@name, ‘dt’)]”>
<p>
<xsl:copy-of select=”ancestor::p”/>
</p>
</xsl:for-each>

And results in:

<p>
<p>[<a name=”dt-expression-context” id=”dt-expression-context” title=”expression context” shape=”rect”>Definition</a>: The <b>expression
context</b> for a given expression consists of all the information
that can affect the result of the expression.]
</p>
</p>

Which is both ugly and incorrect.

When using xsl:copy-of for a p element, the surrounding p elements were unnecessary.

Thus (correctly):

&lt:xsl:for-each select=”//p/a[contains(@name, ‘dt’)]”>
<xsl:copy-of select=”ancestor::p”/>
</xsl:for-each>

I reproduced the corrected definition files above. Apologies for any inconvenience.

Work continues on the sorting and deduping.

Building Definitions Lists for XPath/XQuery/etc.

Wednesday, January 14th, 2015

I have extracted the definitions from:

These lists are unsorted and the paragraphs with multiple definitions are repeated for each definition. Helps me spot where I have multiple definitions that may be followed by non-normative prose, applicable to one or more definitions.

Usual follies trying to extract the definitions.

My first attempt (never successful in my experience but I have to try it so as to get to the second, third, etc.) resulted in:

DefinitionDefinitionDefinitionDefinitionDefinitionDefinitionDefinitionDefinitionDefinition

Which really wasn’t what I meant. Unfortunately it was what I had asked for. 😉

Just in case you are curious, the guts to extracting the definitions reads:

<xsl:for-each select=”//p/a[contains(@name, ‘dt’)]”>
<p>
<xsl:copy-of select=”ancestor::p”/>
</p>
</xsl:for-each>

Each of the definitions is contained in a p element where the anchor for the definition is contained in an a element with the attribute name, “dt-(somename).”

This didn’t work in all four (4) cases because XPath and XQuery Functions and Operators 3.1 records its “[Definition” elements as:

<p><span class=”termdef”><a name=”character” id=”character” shape=”rect”></a>[Definition] A <b>character</b> is an instance of the <a href=”http://www.w3.org/TR/REC-xml/#NT-Char” shape=”rect”>Char</a><sup><small>XML</small></sup> production of <a href=”#xml” shape=”rect”>[Extensible Markup Language (XML) 1.0 (Fifth Edition)]</a>.</span></p>

I’m sure there is some complex trickery you could use to account for that case but with four files, this is meatball XSLT, results over elegance.

Multiple definitions in one paragraph must be broken out so they can be sorted along with the other definitions.

The one thing I forgot to do in the XSLT that you should do when comparing multiple standards was to insert an identifier at the end of each paragraph for the text it was drawn from. Thus:

[Definition: Every instance of the data model is a sequence. XDM]

Where XDM is in a different color for each source.

Proofing all these definitions across four (4) specifications (XQueryX has no additions definitions, aside from unnecessarily restating RFC 2119) is no trivial matter. Which is why I have extracted them and will be producing a deduped and sorted version.

When you have long or complicated standards to proof, it helps to break them down in to smaller parts. Especially if the parts are out of their normal reading context. That helps avoid simply nodding along because you have read the material so many times.

FYI, comments on errors most welcome! Producing the lists was trivial. Proofing the headers, footers, license language, etc. took longer than the lists.

Enjoy!

More on Definitions in XPath/XQuery/XDM 3.1

Tuesday, January 13th, 2015

I was thinking about the definitions I extracted from XPath 3.1 Definitions Here! Definitions There! Definitions Everywhere! XPath/XQuery 3.1 and since the XPath 3.1 draft says:

Because these languages are so closely related, their grammars and language descriptions are generated from a common source to ensure consistency, and the editors of these specifications work together closely.

We are very likely to find that the material contained in definitions and the paragraphs containing definitions are in fact the same.

To make the best use of your time then, what is needed is a single set of the definitions from XPath 3.1, XQuery 3.1, XQueryX 3.1, XQuery and XPath Data Model 3.1, and XQuery Functions and Operators 3.1.

I say that, but then on inspecting some of the definitions in XQuery and XPath Data Model 3.1, I read:

[Definition: An atomic value is a value in
the value space of an atomic type and is labeled with the name of
that atomic type.]

[Definition: An atomic type is a primitive
simple type
or a type derived by restriction from another
atomic type.] (Types derived by list or union are not atomic.)

But in the file of definitions from XPath 3.1, I read:

[Definition: An atomic value is a value in the value space of an atomic type, as defined in [XML Schema 1.0] or [XML Schema 1.1].]

Not the same are they?

What happened to:

and is labeled with the name of that atomic type.

That seems rather important. Yes?

The phrase “atomic type” occurs forty-six (46) times in the XPath 3.1 draft, none of which define “atomic type.”

It does define “generalized atomic type:”

[Definition: A generalized atomic type is a type which is either (a) an atomic type or (b) a pure union type ].

Which would make you think it would have to define “atomic type” as well, to declare the intersection with “pure union type.” But it doesn’t.

In case you are curious, XML Schema 1.1 doesn’t define “atomic type” either. Rather it defines “anyAtomicType.”

In XML Schema 1.0 Part 1, the phrase “atomic type” is used once and only once in “3.14.1 (non-normative) The Simple Type Definition Schema Component,” saying:

Each atomic type is ultimately a restriction of exactly one such built-in primitive datatype, which is its {primitive type definition}.

There is no formal definition nor is there any further discussion of “atomic type” in XML Schema 1.0 Part 1.

XML Schema Part 2 is completely free of any mention of “atomic type.”

Summary of the example:

At this point we have been told that XPath 3.1 relies on XQuery and XPath Data Model 3.1 but also XML Schema 1 and XML Schema 1.1, which have inconsistent definitions of “atomic type,” when it exists at all.

Moreover, XPath 3.1 relies upon undefined term (atomic value) to define another term (generalized atomic type), which is surely an error in any reading.

This is a good illustration of what happens when definitions are not referenced from other documents with specific and resolvable references. Anyone checking such a definition would have noticed it missing in the referenced location.

Summary on next steps:

I was going to say a deduped set of definitions would serve for proofing all the drafts and now, despite the “production from a common source,” I’m not so sure.

Probably the best course is to simply extract all the definitions and check them for duplication rather than assuming it.

The questions of what should be note material and other issues will remain to be explored.

Definitions Here! Definitions There! Definitions Everywhere! XPath/XQuery 3.1

Monday, January 12th, 2015

Would you believe there are one hundred and forty-eight definitions embedded in XPath 3.1?

What strikes me as odd is that the same one hundred and forty-eight definitions appear in a non-normative glossary, sans what looks like the note material that follows some definitions in the normative prose.

The first issue is why have definitions in both normative and non-normative prose? Particularly when the versions in non-normative prose lack the note type material found in the main text.

Speaking of normative, did you now that normatively, document order is defined as:

Informally, document order is the order in which nodes appear in the XML serialization of a document.

So we have formal definitions that are giving us informal definitions.

That may sound like being picky but haven’t we seen definitions of “document order” before?

Grepping the current XML specifications from the W3C, I found 147 mentions of “document order” outside of the current drafts.

I really don’t think we have gotten this far with XML without a definition of “document order.”

Or “node,” “implementation defined,” “implementation dependent,” “type,” “digit,” “literal,” “map,” “item,” “axis step,” in those words or ones very close to them.

• My first puzzle is why redefine terms that already exist in XML?
• My second puzzle is the one I mentioned above, why repeat shorter versions of the definitions in an explicitly non-normative appendix to the text?

For a concrete example of the second puzzle:

For example:

[Definition: The built-in functions
supported by XPath 3.1 are defined in [XQuery and XPath Functions and Operators
3.1]
.] Additional functions may be provided
in the static
context
. XPath per se does not provide a way to declare named
functions, but a host language may provide such a
mechanism.

First, you are never told what section of XQuery and XPath Functions and Operators 3.1 has this definition so we are back to the 5,000 x N problem.

Second, what part of:

XPath per se does not provide a way to declare named functions, but a host language may provide such a mechanism.

Does not look like a note to you?

Does it announce some normative requirement for XPath?

Proofing is made more difficult because of the overlap of these definitions, verbatim, in XQuery 3.1. Whether it is a complete overlap or not I can’t say because I haven’t extracted all the definitions from XQuery 3.1. The XQuery draft reports one hundred and sixty-five (165) definitions, so it introduces additional definitions. Just spot checking, the overlap looks substantial. Add to that the same repetition of terms as shorter entries in the glossary.

There is the accomplice XQuery and XPath Data Model 3.1, which is alleged to be the source of many definitions but not well known enough to specify particular sections. In truth, many of the things it defines have no identifiers so precise reference (read hyperlinking to a particular entry) may not even be possible.

I make that to be at least six sets of definitions, mostly repeated because one draft won’t or can’t refer to prior XML definitions of the same terms or the lack of anchors in these drafts, prevents cross-referencing by section number for the convenience of the reader.

I can ease your burden to some extent, I have created an HTML file with all the definitions in XPath 3.1, the full definitions, for your use in proofing these drafts.

I make no warranty about the quality of the text as I am a solo shop so have no one to proof copy other than myself. If you spot errors, please give a shout.

I will see what I can do about extracting other material for your review.

What we actually need is a concordance of all these materials, sans the digrams and syntax productions. KWIC concordances don’t do so well with syntax productions. Or tables. Still, it might be worth the effort.