Saxon « Another Word For It

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 13, 2016

XQuery/XPath CRs 3.1! [#DisruptJ20 Twitter Game]

Filed under: Saxon,XML,XPath,XQuery — Patrick Durusau @ 1:57 pm

Just in time for the holidays, new CRs for XQuery/XPath hit the street! Comments due by 2017-01-10.

XQuery and XPath Data Model 3.1 https://www.w3.org/TR/2016/CR-xpath-datamodel-31-20161213/

XML Path Language (XPath) 3.1 https://www.w3.org/TR/2016/CR-xpath-31-20161213/

XQuery 3.1: An XML Query Language https://www.w3.org/TR/2016/CR-xquery-31-20161213/

XPath and XQuery Functions and Operators 3.1 https://www.w3.org/TR/2016/CR-xpath-functions-31-20161213/

XQueryX 3.1 https://www.w3.org/TR/2016/CR-xqueryx-31-20161213/

#DisruptJ20 is too late for comments to the W3C but you can break the boredom of indeterminate waiting to protest excitedly for TV cameras and/or to be arrested.

How?

Play the XQuery/XPath 3.1 Twitter Game!

Definitions litter the drafts and appear as:

[Definition: A sequence is an ordered collection of zero or more items.]

You Tweet:

An ordered collection of zero or more items? #xquery

Correct response:

A sequence.

Some definitions are too long to be tweeted in full:

An expanded-QName is a value in the value space of the xs:QName datatype as defined in the XDM data model (see [XQuery and XPath Data Model (XDM) 3.1]): that is, a triple containing namespace prefix (optional), namespace URI (optional), and local name. (xpath-functions)

Suggest you tweet:

A triple containing namespace prefix (optional), namespace URI (optional), and local name.

A value in the value space of the xs:QName datatype as defined in the XDM data model (see [XQuery and XPath Data Model (XDM) 3.1].

In both cases, the correct response:

An expanded-QName.

Use a $10 burner phone and it unlocked at protests. If your phone is searched, imagine the attempts to break the “code.”

You could agree on definitions/responses as instructions for direct action. But I digress.

Comments Off

December 14, 2015

35 Lines XQuery versus 604 of XSLT: A List of W3C Recommendations

Filed under: BaseX,Saxon,XML,XQuery,XSLT — Patrick Durusau @ 10:16 pm

Use Case

You should be familiar with the W3C Bibliography Generator. You can insert one or more URLs and the generator produces correctly formatted citations for W3C work products.

It’s quite handy but requires a URL to produce a useful response. I need authors to use correctly formatted W3C citations and asking them to find URLs and to generate correct citations was a bridge too far. Simply didn’t happen.

My current attempt is to produce a list of correctly W3C citations in HTML. Authors can use CTRL-F in their browsers to find citations. (Time will tell if this is a successful approach or not.)

Goal: An HTML page of correctly formatted W3C Recommendations, sorted by title (ignoring case because W3C Recommendations are not consistent in their use of case in titles). “Correctly formatted” meaning that it matches the output from the W3C Bibliography Generator.

Resources

As a starting point, I viewed the source of http://www.w3.org/2002/01/tr-automation/tr-biblio.xsl, the XSLT script that generates the XHTML page with its responses.

The first XSLT script imports two more XSLT scripts, http://www.w3.org/2001/08/date-util.xslt and http://www.w3.org/2001/10/str-util.xsl.

I’m not going to reproduce the XSLT here, but can say that starting with <stylesheet> and ending with </stylesheet>, inclusive, I came up with 604 lines.

You will need to download the file used by the W3C Bibliography Generator, tr.rdf.

XQuery Script

I have used the XQuery script successfully with: BaseX 8.3, eXide 2.1.3 and SaxonHE-6-07J.

Here’s the prolog:

declare default element namespace "http://www.w3.org/2001/02pd/rec54#";
declare namespace rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#";
declare namespace dc = "http://purl.org/dc/elements/1.1/"; 
declare namespace doc = "http://www.w3.org/2000/10/swap/pim/doc#";
declare namespace contact = "http://www.w3.org/2000/10/swap/pim/contact#";
declare namespace functx = "http://www.functx.com";
declare function functx:substring-after-last
($string as xs:string?, $delim as xs:string) as xs:string?
{
if (contains ($string, $delim))
then functx:substring-after-last(substring-after($string, $delim), $delim)
else $string
};

Declaring the namespaces and functx:substring-after-last from Patricia Walmsley’s excellent FunctX XQuery Functions site and in particular, functx:substring-after-last.

<html>
<head>XQuery Generated W3C Recommendation List</head>
<body>
<ul class="ul">

Start the HTML page and the unordered list that will contain the list items.

{
for $rec in doc("tr.rdf")//REC
    order by upper-case($rec/dc:title)

If you sort W3C Recommendations by dc:title and don’t specify upper-case, rdf:PlainLiteral: A Datatype for RDF Plain Literals,
rdf:PlainLiteral: A Datatype for RDF Plain Literals (Second Edition), and xml:id Version 1.0, appear at the end of the list sorted by title. Dirty data isn’t limited to databases.

return <li class="li">
  <a href="{string($rec/@rdf:about)}"> {string($rec/dc:title)} </a>, 
   { for $auth in $rec/editor
   return
   if (contains(string($auth/contact:fullName), "."))
   then (concat(string($auth/contact:fullName), ","))
   else (concat(concat(concat(substring(substring-before(string($auth/\
   contact:fullName), ' '), 0, 2), ". "), (substring-after(string\
   ($auth/contact:fullName), ' '))), ","))}

Watch for the line continuation marker “\”.

We begin by grabbing the URL and title for an entry and then confront dirty author data. The standard author listing by the W3C creates an initial plus a period for the author’s first name and then concatenates the rest of the author’s name to that initial plus period.

Problem: There is one entry for authors that already has initials, T.V. Raman, so I had to account for that one entry (as does the XSLT).

{if (count ($rec/editor) >= 2) then " Editors," else " Editor,"}
W3C Recommendation, 
{fn:format-date(xs:date(string($rec/dc:date)), "[MNn] [D], [Y]") }, 
{string($rec/@rdf:about)}. <a href="{string($rec/doc:versionOf/\
@rdf:resource)}">Latest version</a> \
available at {string($rec/doc:versionOf/@rdf:resource)}.
<br/>[Suggested label: <strong>{functx:substring-after-last(uppercase\
(replace(string($rec/doc:versionOf/@rdf:resource), '/$', '')), "/")}\
</strong>]<br/></li>} </ul></body></html>

Nothing remarkable here, except that I snipped the concluding “/” off of the values from doc:versionOf/@rdf:resource so I could use functx:substring-after-last to create the token for a suggested label.

Comments / Omissions

I depart from the XSLT in one case. It calls http://www.w3.org/2002/01/tr-automation/known-tr-editors.rdf here:

<!-- Special casing for when we have the name in Original Script (e.g. in \
Japanese); currently assume that the order is inversed in this case... -->

<:xsl:when test="document('http://www.w3.org/2002/01/tr-automation/\
known-tr-editors.rdf')/rdf:RDF/*[contact:lastNameInOriginalScript=\
substring-before(current(),' ')]">

But that refers to only one case:

<REC rdf:about="http://www.w3.org/TR/2003/REC-SVG11-20030114/">
<dc:date>2003-01-14</dc:date>
<dc:title>Scalable Vector Graphics (SVG) 1.1 Specification</dc:title>

Where Jun Fujisawa appears as an editor.

Recalling my criteria for “correctness” being the output of the W3C Bibliography Generator:

Preparing for this post made me discover at least one bug in the XSLT that was supposed to report the name in original script:

&lt:xsl:when test=”document(‘http://www.w3.org/2002/01/tr-automation/\
known-tr-editors.rdf’)/rdf:RDF/*[contact:lastNameInOriginalScript=\
substring-before(current(),’ ‘)]”>

Whereas the entry in http://www.w3.org/2002/01/tr-automation/known-tr-editors.rdf reads:

<rdf:Description>
<rdf:type rdf:resource=”http://www.w3.org/2000/10/swap/pim/contact#Person”/>
<firstName>Jun</firstName>
<firstNameInOriginalScript>藤沢淳</firstNameInOriginalScript>
<lastName>Fujisawa</lastName>
<sortName>Fujisawa</sortName>
</rdf:Description>

Since the W3C Bibliography Generator doesn’t produce the name in original script, neither do I. When the W3C fixes its output, I will have to amend this script to pick up that entry.

String

While writing this query I found text(), fn:string() and fn:data() by Dave Cassels. Recommended reading. The weakness of text() is that if markup is inserted inside your target element after you write the query, you will get unexpected results. The use of fn:string() avoids that sort of surprise.

Recommendations Only

Unlike the W3C Bibliography Generator, my script as written only generates entries for Recommendations. It would be trivial to modify the script to include drafts, notes, etc., but I chose to not include material that should not be used as normative citations.

I can see the usefulness of the bibliography generator for works in progress but external to the W3C, citing Recommendations is the better course.

Contra Search

The SpecRef project has a searchable interface to all the W3C documents. If you search for XQuery, the interface returns 385 “hits.”

Contrast that with using CNTR-F with the list of recommendations generated from the XQuery script, controlling for case, XQuery produced only 23 “hits.”

There are reasons for using search, but users repeatedly mining results of searches that could be captured (it was called curation once upon a time) is wasteful.

Reading

I can’t recommend Patricia Walmsley’s XQuery 2nd Edition strongly enough.

There is one danger to Walmsley’s book. You will be so ready to start using XQuery after the first ten chapters it’s hard to find the time to read the remaining ones. Great stuff!

You can download the XQuery file, tr.rdf and the resulting html file at: 35LinesOfXQuery.zip.

Comments Off

November 28, 2015

Saxon 9.7 Release!

Filed under: Saxon,XQuery,XSLT — Patrick Durusau @ 4:29 pm

I saw a tweet from Michael Kay announcing that Saxon 9.7 has been released!

Saxon 9.7 is up to date with the XSLT 3.0 Candidate Recommendation released from the W3C November 19, 2015.

From the details page:

…

XSLT 3.0 implementation largely complete (requires Saxon-PE or Saxon-EE): The new XSLT 3.0 Candidate Recommendation was published on 19 November 2015, and Saxon 9.7 is a complete implementation with a very small number of exceptions. Apart from general updating of Saxon as the spec has developed, the main area of new functionality is in packaging, which allows stylesheet modules to be independently compiled and distributed, and provides much more “software engineering” control over public and private interfaces, and the like. The ability to save packages in compiled form gives much faster loading of frequently used stylesheets.

Schema validation: improved error reporting. The schema validator now offers customised error reporting, with an option to create an XML report detailing all validation errors found. This has structured information about each error so the report can readily be customised; it has been developed in conjunction with some of our IDE partners who can use this information to provide an improved interactive presentation of the validation report.

Arrays, Maps, and JSON: Arrays are implemented as a new data type (defined in XPath 3.1). Along with maps, which were already available in Saxon 9.6, this provides the infrastructure for full support of JSON, including functions such as parse-json() which converts JSON to a structure of maps and arrays, and the JSON serialization method which does the inverse.

Miscellaneous new functions: two of the most interesting are random-number-generator(), and parse-ietf-date().

Streaming: further improvements to the set of constructs that can be streamed, and the diagnostics when constructs cannot be streamed.

Collections: In line with XPath 3.1 changes, a major overhaul of the way collections work. They can now contain any kind of item, and new abstractions are provided to give better control over asynchronous and parallel resource fetching, parsing, and validation.

Concurrency improvements: Saxon 9.6 already offered various options for executing stylesheets in parallel to take advantage of multi-code processors. These facilities have now been tuned for performance and made more robust, by taking advantage of more advanced concurrency features in the JDK platform. The Saxon NamePool, which could be a performance bottleneck in high throughput workloads, has been completely redesigned to allow much higher concurrency.

Cost-based optimization: Saxon’s optimizer now makes cost estimates in order to decide the best execution strategy. Although the estimates are crude, they have been found to make a vast difference to the execution speed of some stylesheets. Saxon 9.7 particularly addresses the performance of XSLT pattern matching.

…

There was no indication of when these features will appear in the free “home edition.”

In the meantime, you can go the Online Shop for Saxonica.

Currency conversion rates vary but as of today, Saxon-PE (Professional Edition) is about $75 U.S. and some change.

I’m considering treating myself to Saxon-PE as a Christmas present to myself.

And you?

Comments Off

October 2, 2014

XSLT 3.0 Draft – Saxon 9.6

Filed under: Saxon,XSLT — Patrick Durusau @ 7:24 pm

I saw a tweet by Michael Kay announcing:

XSL Transformations (XSLT) Version 3.0 – W3C Last Call Working Draft 2 October 2014,

and,

Saxon 9.6 released!

Now that is a great Thursday!

PS: Deadline for comments on the working draft is 26 November 2014.

Comments Off

June 4, 2014

Sharing key indexes

Filed under: Saxon,XPath,XSLT — Patrick Durusau @ 7:15 pm

Sharing key indexes by Michael Kay.

From the post:

For ever and a day, Saxon has tried to ensure that when several transformations are run using the same stylesheet and the same source document(s), any indexes built for those documents are reused across transformations. This has always required some careful juggling of Java weak references to ensure that the indexes are dropped from memory as soon as either the executable stylesheet or the source document are no longer needed.

I’ve now spotted a flaw in this design. It wasn’t reported by a user, and it didn’t arise from a test case, it simply occurred to me as a theoretical possibility, and I have now written a test case that shows it actually happens. The flaw is this: if the definition of the key includes a reference to a global variable or a stylesheet parameter, then the content of the index depends on the values of global variables, and these are potentially different in different transformations using the same stylesheet.
…

Michael discovers a very obscure bug entirely on his own and yet resolves to fix it.

That is so unusual that I thought it merited mentioning.

It should give you great confidence in Saxon.

How that impacts your confidence on other software I cannot say.

Comments Off

March 15, 2014

Saxon/C

Filed under: Saxon,XML,XQuery,XSLT — Patrick Durusau @ 9:41 pm

Saxon/C by Michael Kay.

From the webpage:

Saxon/C is an alpha release of Saxon-HE on the C/C++ programming platform. APIs are offered currently to run XSLT 2.0 and XQuery 1.0 from C/C++ or PHP applications.

BTW, Micheal’s Why XSLT and XQuery? page reports that Saxon is passing more than 25,000 tests for XQuery 3.0.

If you are unfamiliar with either Saxon or Michael Kay, you need to change to a department that is using XML.

Comments Off

March 17, 2013

XMLQuire Web Edition

Filed under: HTML5,Saxon,XML,XSLT — Patrick Durusau @ 5:50 am

XMLQuire Web Edition: A Free XSLT 2.0 Editor for the Web

From the webpage:

XSLT 2.0 processing within the browser is now a reality with the introduction of the open source Saxon-CE from Saxonica. This processor runs as a JavaScript app and supports JavaScript interoperability and user-event handling for the era of HTML5 and the dynamic web.

This Windows product, XMLQuire, is an XSLT edtior specially extended to integrate with Saxon-CE and support the Saxon-CE language extensions that make interactive XSLT possible. Saxon-CE is not included with this product, but is available from Saxonica here.

*nix folks will have to install Windows 7 or 8 on a VM to take advantage of this software.

Worth the effort if for no other reason than to see how the market majority lives.

I first saw this in a tweet by Michael Kay.

Comments (2)

March 16, 2013

Lux

Filed under: Lucene,Saxon,Solr,XQuery,XSLT — Patrick Durusau @ 7:51 pm

Lux

From the readme:

Lux is an open source XML search engine formed by fusing two excellent technologies: the Apache Lucene/Solr search index and the Saxon XQuery/XSLT processor.

At its core, Lux provides XML-aware indexing, an XQuery 1.0 optimizer that rewrites queries to use the indexes, and a function library for interacting with Lucene via XQuery. These capabilities are tightly integrated with Solr, and leverage its application framework in order to deliver a REST service and application server.

The REST service is accessible to applications written in almost any language, but it will be especially convenient for developers already using Solr, for whom Lux operates as a Solr plugin that provides query services using the same REST APIs as other Solr search plugins, but using a different query language (XQuery). XML documents may be inserted (and updated) using standard Solr REST calls: XML-aware indexing is triggered by the presence of an XML-aware field in a document. This means that existing application frameworks written in many different languages are positioned to use Lux as a drop-in capability for indexing and querying semi-structured content.

The application server is a great way to get started with Lux: it provides the ability to write a complete application in XQuery and XSLT with data storage backed by Lucene.

If you are looking for experience with XQuery and Lucene/Solr, look no further!

May be a good excuse for me to look at defining equivalence statements using XQuery.

I first saw this in a tweet by Michael Kay.

Comments Off