Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 2, 2016

Sorting Slightly Soiled Data (Or The Danger of Perfect Example Data) – XQuery (Part 2)

Filed under: XML,XQuery — Patrick Durusau @ 7:30 pm

Despite heavy carousing during the holidays, you may still remember Great R packages for data import, wrangling & visualization [+ XQuery], where I re-sorted the table by Sharon Machlis, to present the R packages in package name order.

I followed that up with: Sorting Slightly Soiled Data (Or The Danger of Perfect Example Data) – XQuery, where I detailed the travails of trying to sort the software packages by their short descriptions, again in alphabetical order. My assumption in that post was that either the spaces or the “,” commas in the descriptions were fouling the sort by.

That wasn’t the case, which I should have known because the string operator always returns a string. That is the spaces and “,” inside are just parts of a string, nothing more.

The up-side of the problem was that I spent more than a little while with Walmsley’s XQuery book, searching for ever more esoteric answers.

Here’s the failing XQuery:

<html>
<body>
<table>{
for $row in doc("/home/patrick/working/favorite-R-packages.xml")/table/tr
order by lower-case(string($row/td[2]/a))
return <tr>{$row/td[2]} {$row/td[1]}</tr>
}</table>
</body>
</html>

And here is the working XQuery:

<html>
<body>
<table>{
for $row in doc("/home/patrick/working/favorite-R-packages.xml")/table/tr
order by lower-case(string($row/td[2]))
return <tr>{$row/td[2]} {$row/td[1]}</tr>
}</table>
</body>
</html>

Here is the mistake highlighted:

order by lower-case"(string($row/td[2]/a))"

My first mistake was the inclusion of “/a” in the path. Using string on ($row/td[1]), that is without having /a at the end of the path, gives the original script the same result. (Run that for yourself on favorite-R-packages.xml).

Make any path as long as required and no longer!

My second mistake was not checking the XPath immediately upon the failure of the sort. (The simplest answer is usually the correct one.)

Enjoy!


Update: Removed the quotes marks around (string($row/td[2])) in both queries, they were part of an explanation that did not make the cut. Thanks to XQuery for the catch!

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress