Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 16, 2019

Brzozowski derivatives – Invisible XML – Thinking, Wishing, Saying – Must be … Balisage 2019!

Filed under: Conferences,XML,XQuery,XSLT — Patrick Durusau @ 1:20 pm

Balisage 2019 Program Announced!

An awesome lineup of topics and speakers await Balisage 2019 goers. From the expected, standoff markup in browsers (yes, that usual fare at Balisage) to re-invention of markup “seen” when looking at a file with no markup (HyTime) and beyond, you are in for a real treat.

I saw several slots for late-breaking news so if you have something really profound and coherent to say, you’d best be polishing it now. Just looking at the current program gives you an idea of the competition for slots.

Why attend? General Eric Shinseki said it best:

If you dislike change, you’re going to dislike irrelevance even more.

Don’t risk irrelevance! Attend Balisage 2019!

February 24, 2019

eXist-db 5.0.0 RC6

Filed under: eXist,XML,XML Database,XPath,XQuery — Patrick Durusau @ 4:35 pm

eXist-db 5.0.0 RC6

RC5 was released on November 21, 2018 so there are a number of new features and bug fixes to grab your interest in RC 6.

Features:

  • New De-duplicating BLOB store for binary documents – see https://blog.adamretter.org.uk/blob-deduplication/
  • More elaborate XPath expressions in the Lucene index config of collection.xconf are now supported
  • New non-blocking lock-free implementation of the Transaction Manager
  • CData serialization now respects the output:cdata-section-elements option
  • New XQuery function util:eval-and-serialize for dynamic XQuery evaluation and serialization.
  • New XQuery function util:binary-doc-content-digest to retrieve a digest of a Binary Document
  • … and others.

Bug fixes:

  • Fixed Lucene term range queries
  • Copying an XML Resource now correctly removes any nodes that it replaces
  • Fixed a memory leak with XQuery serializers
  • Fixed Garbage Collection churn issue with serialization
  • Fixed Backup/Restore progress reporting
  • XQuery Library Modules on the Java Classpath are now correctly resolved from the importing XQuery module
  • … and others.

Although not ready for production, these new features and bug fixes should have you scurrying to download eXist-db 5.0.0 RC6!

PS: Remember there are only 48 days left for paper submissions to Balisage 2019! Are you going to be using the latest RC for eXist?

January 9, 2019

Summer is Coming! Balisage is Coming! Papers Due April 12, 2019!

Filed under: Conferences,XML,XML Database,XML Query Rewriting,XML Schema,XPath,XProc,XQuery,XSLT — Patrick Durusau @ 7:52 pm

From a recent email about Balisage 2019:

Some “Balisage: The Markup Conference 2019” dates are coming soon:

March 29, 2019 — Peer-review applications due
April 12, 2019 — Paper submissions due
July 30 — August 2, 2019 — Balisage: The Markup Conference
July 29, 2019 — Pre-conference Symposium – Topic to be announced https://www.balisage.net/

Balisage: where serious markup practitioners and theoreticians meet every August.

A colleague recently asked me to share the program for Balisage 2019 to help support a request to attend. What, I was asked, will we talk about at Balisage 2019. I replied “It will be a variety of topics relating to markup, but we won’t know the specifics until May.” “Why? It seems like you should know that now.” was the response. “Why don’t you just decide who you want to talk about what and assign topics?” “Because that would not be a contributed paper conference, it would be some other sort of event!”

Balisage *is* a contributed paper conference, and the submissions from people who want to speak drive the program, the hallway conversations, and the whole tone of Balisage!

If you want to speak at Balisage 2019, if you want to help shape the conversation, if you have an idea, experience, opinion, or question relating to markup, please submit a paper to Balisage 2019!

We solicit papers on any aspect of markup and its uses; topics include but ARE NOT LIMITED TO:

• Cutting-edge applications of XML and related technologies
• Integration of XML with other technologies (e.g., content management, XSLT, XQuery)
• Performance issues in parsing, XML database retrieval, or XSLT processing
• Development of angle-bracket-free user interfaces for non-technical users
• Deployment of XML systems for enterprise data
• Design and implementation of XML vocabularies
• Case studies of the use of XML for publishing, interchange, or archiving
• Alternatives to XML/JSON/whatever
• Expressive power and application adequacy of XSD, Relax NG, DTDs, Schematron, and other schema languages
• Invisible XML

Detailed Call for Participation: https://www.balisage.net/Call4Participation.html
Call for Peer Reviewers: https://www.balisage.net/peer/ReviewAppForm.html
About Balisage: https://www.balisage.net/

For more information: info@balisage.net or +1 301 315 9631

Papers are due for Balisage in a little more than 90 days.

Anyone doing a topic map paper this year?

“If you can point to it, we can identify it. If we can identify it, we can map it. If we can map it, …,” well, you know how the rest of it goes.

Data silos continue to exist because they are armor. Armor that protects some stakeholders from prying eyes. Up for a little peeping?

December 4, 2018

Bulk US Congress Bills, Laws in XML

Filed under: Government,Government Data,Law,Legal Informatics,XML — Patrick Durusau @ 8:47 am

GPO Makes Documents Easy To Download and Repurpose in New XML Format

From the news release:

The U.S. Government Publishing Office (GPO) makes available a subset of enrolled bills, public and private laws, and the Statutes at Large in Beta United States Legislative Markup (USLM) XML, a format that makes documents easier to download and repurpose. The documents available in the Beta USLM XML format include enrolled bills and public laws beginning with the 113th Congress (2013) and the Statutes at Large beginning with the 108th Congress (2003). They are available on govinfo, GPO’s one-stop site to authentic, published Government information. https://www.govinfo.gov/bulkdata.

The conversion of legacy formats into Beta USML XML will provide a uniform set of laws for the public to download. This new format maximizes the number of ways the information can be used or repurposed for mobile apps or other digital or print projects. The public will now be able to download large sets of data in one click rather than downloading each file individually, saving significant time for developers and others who seek to repurpose the data.

GPO is collaborating with various legislative and executive branch organizations on this project, including the Office of the Clerk of the House, the Office of the Secretary of the Senate, and the Office of the Federal Register. The project is being done in support of the Legislative Branch Bulk Data Task Force which was established to examine the increased dissemination of Congressional information via bulk data download by non-Governmental groups for the purpose of supporting openness and transparency in the legislative process.

“Making these documents available in Beta USLM XML is another example of how GPO is meeting the technological needs of Congress and the public,“ said GPO Acting Deputy Director Herbert H. Jackson, Jr. “GPO is committed to working with Congress on new formats that provide the American people easy access to legislative information.“

GPO is the Federal Government’s official, digital, secure resource for producing, procuring, cataloging, indexing, authenticating, disseminating, and preserving the official information products of the U.S. Government. The GPO is responsible for the production and distribution of information products and services for all three branches of the Federal Government, including U.S. passports for the Department of State as well as the official publications of Congress, the White House, and other Federal agencies in digital and print formats. GPO provides for permanent public access to Federal Government information at no charge through www.govinfo.gov and partnerships with approximately 1,140 libraries nationwide participating in the Federal Depository Library Program. For more information, please visit www.gpo.gov.

Not that I have lost any of my disdain and distrust for government, but when any government does something good, they should be praised.

Making “enrolled bills, public and private laws, and the Statues at Large in Beta United States Legislative markup (USML) XML” is a step towards to tracing and integrating legislation with those it benefits.

I’m not convinced that if you could trace specific legislation to a set of donations that the outcomes on legislation would be any different. It’s like tracing payments made to a sex worker. That’s their trade, why should they be ashamed of it?

The same holds true for most members of Congress, save that the latest election has swept non-sex worker types into office. It remains to be seen how many will resist the temptation to sell their offices and which will not.

In either case, kudos to the GPO and Lauren Wood, who I understand has been a major driver in this project!

November 21, 2018

pugixml 1.9 quick start guide

Filed under: Parsers,XML,XPath — Patrick Durusau @ 4:20 pm

pugixml 1.9 quick start guide

From the webpage:

pugixml is a light-weight C++ XML processing library. It consists of a DOM-like interface with rich traversal/modification capabilities, an extremely fast XML parser which constructs the DOM tree from an XML file/buffer, and an XPath 1.0 implementation for complex data-driven tree queries. Full Unicode support is also available, with two Unicode interface variants and conversions between different Unicode encodings (which happen automatically during parsing/saving). The library is extremely portable and easy to integrate and use. pugixml is developed and maintained since 2006 and has many users. All code is distributed under the MIT license, making it completely free to use in both open-source and proprietary applications.

pugixml enables very fast, convenient and memory-efficient XML document processing. However, since pugixml has a DOM parser, it can’t process XML documents that do not fit in memory; also the parser is a non-validating one, so if you need DTD/Schema validation, the library is not for you.

This is the quick start guide for pugixml, which purpose is to enable you to start using the library quickly. Many important library features are either not described at all or only mentioned briefly; for more complete information you should read the complete manual.

Despite the disappointing lack of document/email leaks during the 2018 mid-terms, I am hopeful the same will not be true in 2020. The 2020 elections will include a presidential race.

I encountered pugixml today in another context and thought I should mention it as a possible addition to your toolkit.

The repository: http://github.com/zeux/pugixml.

Enjoy!

October 31, 2018

BaseX 9.1: The Autumn Edition [No Weaponized Email Leaks for Mid-Term Elections to Report]

Filed under: BaseX,XML,XQuery — Patrick Durusau @ 3:41 pm

Christian Gruin writes in an email:

Dear XML and XQuery aficionados,

It’s been exactly 5 months ago when BaseX 9 was released, and we are happy to announce version 9.1 of our XML framework, database system
and XQuery 3.1 processor! The latest release is online:

http://basex.org

The most exciting addition is support for WebSockets, which enable you to do bidirectional (full-duplex) client/server communication with
XQuery web applications:

http://docs.basex.org/wiki/WebSockets

Moreover, we have added convenient syntax extensions (ternary if, Elvis operator, if without else) to XQuery. Some of them may be made available in other implementations of XQuery as well (we’ll keep you updated):

http://docs.basex.org/wiki/XQuery_Extensions#Expressions

Other new features are as follows:

XQuery:
– set local locks via pragmas and function annotations
– Database Module: faster processing of value index functions
– Jobs Module: record and return registration times
– ENFORCEINDEX option: support for predicates with dynamic values
– Update Module, update:output: support for caching maps and arrays

GUI:
– Mac, Windows: Improved rendering support for latest Java versions
– XQuery editor: choose and display current query context

Visit http://docs.basex.org to get more information on the added features.

Your feedback is welcome! Have fun,

Christian
BaseX Team

I know of no examples of weaponized email leaks using BaseX for the mid-term elections in less than a week.

That absence is more than a little disappointing because industrial strength weapons are available, such as BaseX, and computer security remains on a Hooterville level of robustness.

Despite this missed opportunity, there are elections scheduled (still) for 2020.

September 21, 2018

Senate GMail Attack – eXist-db 5.0.0 RC 4 Release – Coincidence?

Filed under: Cybersecurity,eXist,Government,XML,XML Database,XQuery — Patrick Durusau @ 6:16 pm

First I see Senators’ Gmail accounts targeted by foreign hackers from today that reads in part:

The personal Gmail accounts of an unspecified number of US senators and Senate staff have been targeted by foreign government hackers, a Google spokesperson confirmed to CNN on Thursday.

then I see in my Twitter feed:

[eXist-db] v5.0.0-RC4 – September 21, 2018.

The campaign season has been devoid of any Clinton-like email leaks, which is both disappointing and a little surprising.

It worked so well last time, taking no news office gossip and by timed release, make back-biting chatter into widely reported news.

You should grab a copy of eXist-db v.5.0.0-RC4 or the current stable version. Practicing now will keep you in shape for any flood of congressional emails.

eXistDB is NOT in league with any hackers anywhere.

I like feeding the paranoid delusions of the IC with groundless gossip. They will write it down, talk about it, do research, all the while they are not out harming US citizens and/or hopefully citizens of any other countries.

August 2, 2018

eXist-db 5.0.0 RC 3 [Prepping for Assange Data Tsunami]

Filed under: .Net,eXist,XML,XML Database,XQuery — Patrick Durusau @ 10:40 am

eXist-db 5.0.0 RC 3

One new feature and several bugs fixes over RC 2, but thought I should mention it for Assange Data Tsunami preppers.

I have deliberately avoided contact with any such preppers but you can read my advice at: username: 4julian password: $etJulianFree!2Day.

The gist is that sysadmins should, with appropriate cautions, create accounts with “username: 4julian password: $etJulianFree!2Day,” in the event that Julian Assange is taken into custory (a likely event).

If one truth teller (no Wikileaks release has ever been proven false or modified) disturbs the world, creating a tsunami of secret, classified, restricted, proprietary data, may shock it to its senses.

Start prepping for the Assange Data Tsunami today!

PS: Yes, there are a variety of social media events, broadcasts, etc. being planned. Wish them all well but governments respond to bleeding more than pleading. In this case, bleeding data seems appropriate.

August 1, 2018

Developing SGML DTDs From Text To Model To Markup

Filed under: XML,XPath — Patrick Durusau @ 8:06 pm

Developing SGML DTDs: From Text To Model To Markup by Eve Maler and Jeanne El Andaloussi.

Maler and El Andaloussi summarize (1.2.4) the benefits of SGML this way:

To summarize, SGML markup is unique in that it combines several design strengths:

  • It is declarative, which helps document producers “write once, use many”—putting the same document data to multiple uses, such as delivery of documents in a variety of online and paper formats and interchange with others who wish to use the documents in different ways.
  • It is generic across systems and has a nonproprietary design, which helps make documents vendor and platform independent and “future-proof”—protecting them against changes in computer hardware and software.
  • It is contextual, which heightens the quality and completeness of processing by allowing documents to be structurally validated and by enabling logical collections of data to be manipulated intelligently.

The characteristics of being declarative, generic, nonproprietary, and contextual make the Standard Generalized Markup Language “standard” and “generalized.”

A truly remarkable work that is as relevant today as it was twenty-three years ago.

Most important lesson: Understanding your document comes before designing markup. Every time.

May 29, 2018

Balisage Late-Breaking News Deadline – 6 July 2018 – Attract/Spot a Fed!

Filed under: Conferences,XML,XML Schema,XPath,XQuery,XSLT — Patrick Durusau @ 7:10 pm

Balisage 2018 Call for Late-breaking News

From the post:


Proposals for late-breaking slots must be received at info@balisage.net by July 6, 2018. Selection of late-breaking proposals will be made by the Balisage conference committee, instead of being made in the course of the regular peer-review process. (emphasis in original)

The Def Con conference attendees play spot the fed.

But spot the fed requires some feds in order to play.

Feds show up at hacker conferences. For content or the company of people with poor personal hygiene.

Let’s assume it’s the content.

What content for a markup paper will attract undercover federal agents?

Success means playing spot the fed at Balisage 2018.

Topics anyone?

May 23, 2018

Balisage 2018 Program!

Filed under: Conferences,XML,XPath,XQuery,XSLT — Patrick Durusau @ 12:40 pm

The Balisage 2018 program has hit the Web!

Among the goodies on the agenda:

  • Implementing and using concurrent document structures
  • White-hat web crawling: Industrial strength web crawling for serious content acquisition
  • Easing the road to declarative programming in XSLT for imperative programmers
  • Fractal information is
  • Scaling XML using a Beowulf cluster

That’s a random sampling from the talk already scheduled!

Even more intriguing are the open spots left for “late-breaking” news.

Perhaps you have some “late-breaking” XML related news to share?

I haven’t seen the 2018 Call for Late-Breaking papers but if the 2017 Call for Late-Breaking papers is any guide, time is running out!

Enjoy!

April 24, 2018

BaseX 9.0.1 (tool maintenance)

Filed under: BaseX,XML,XQuery — Patrick Durusau @ 3:17 pm

BaseX 9.0.1 (Maintenance Release):

Welcome to our BaseX 9.0.1 maintenance release. An update is highly recommended: The major release had a critical bug, regarding the storage of short non-ASCII Unicode strings.

This is the changelog:

Critical Bug Fixes

  • Storage: Short strings with extended Unicode characters fixed
  • XQuery: Nested path optimizations reenabled (e.g. in functions)
  • XQuery: map:merge, size computation fixed
  • XQuery: node ordering across multiple database instances fixed

Improvements

  • GUI: Better Java 9 support (DPI scaling, font rendering)
  • XQuery, collections: faster document root tests
  • New R client. Thanks Ben Engbers!
  • Linux: exec command used in startup scripts

Minor Bug Fixes

  • XQuery: Allow interruption of tail-call function calls
  • XQuery, HTTP parsing of content-type parameters
  • XQuery, restrict rewriting of filter to path expression
  • GUI: progress feedback when creating databases via double-click

If you want to interfere with, influence, change the outcome of, any of the US 2018 mid-term elections and/or the 2020 Presidential election, you need the latest and greatest in tools, as well as skill at using them.

Upgrade to BaseX 9.0.1 today!

March 23, 2018

BaseX 9.0 – The Spring Edition – 229 Days to US Mid-Term Elections

Filed under: BaseX,Politics,XML,XQuery — Patrick Durusau @ 7:32 pm

Christian Grün writes:

We are very happy to announce the release of BaseX 9.0!

The new version of our XML database system and XQuery 3.1 processor includes some great new features and a vast number of minor improvements and optimizations. It’s both the usage of BaseX in productive environments as well as the valuable feedback of our open source users that make BaseX better and better, and that allow and motivate us to keep going. Thanks to all of you!

Along with the new release, we invite you to visit our relaunched homepage: http://basex.org/.

Java 8 is now required to run BaseX. The most prominent features of Version 9.0 are:

Sorry! No spoilers here! Grab a copy of BaseX 9.0 and read Christian’s post for the details.

Take 229 days until the US mid-term elections (November 6, 2018) as fair warning that email leaks are possible (likely?) between now and election day.

The better your skills with BaseX, the better change you have to interfere with, sorry, participate in the 2018 election cycle.

Good luck to us all!

February 10, 2018

XML periodic table

Filed under: XML — Patrick Durusau @ 8:09 pm

XML periodic table

It’s a visual thing and my small blog format style won’t do it justice. Follow the link.

XML grouped by Business language, QA, Document format, Internet format, Graphic format, Metadata standard, Transformation.

What a cool listing!

Lots of old friends but some potential new ones as well!

Enjoy!

February 9, 2018

XML Prague 2018 Conference Proceedings – Weekend Reading!

Filed under: Conferences,XML,XML Database,XPath,XQuery,XSLT — Patrick Durusau @ 9:13 pm

XML Prague 2018 Conference Proceedings

Two Hundred and Sixty (260) pages of high quality content on XML!

From the table of contents:

  • Assisted Structured Authoring using Conditional Random Fields – Bert Willems
  • XML Success Story: Creating and Integrating Collaboration Solutions to Improve the Documentation Process – Steven Higgs
  • xqerl: XQuery 3.1 Implementation in Erlang – Zachary N. Dean
  • XML Tree Models for Efficient Copy Operations – Michael Kay
  • Using Maven with XML development projects – Christophe Marchand and Matthieu Ricaud-Dussarget
  • Varieties of XML Merge: Concurrent versus Sequential – Tejas Pradip Barhate and Nigel Whitaker
  • Including XML Markup in the Automated Collation of Literary Text – Elli Bleeker, Bram Buitendijk, Ronald Haentjens Dekker, and Astrid Kulsdom
  • Multi-Layer Content Modelling to the Rescue – Erik Siegel
  • Combining graph and tree – Hans-Juergen Rennau
  • SML – A simpler and shorter representation of XML – Jean-François Larvoire
  • Can we create a real world rich Internet application using Saxon-JS? – Pieter Masereeuw
  • Implementing XForms using interactive XSLT 3.0 – O’Neil Delpratt and Debbie Lockett
  • Life, the Universe, and CSS Tests – Tony Graham
  • Form, and Content – Steven Pemberton
  • tokenized-to-tree – Gerrit Imsieke

I just got a refurbished laptop for reading in bed. Now I have to load XML parsers, etc. on it to use along with reading these proceedings!

Enjoy!

PS: Be sure to thank Jirka Kosek for his tireless efforts promoting XML and XML Prague!

February 5, 2018

Balisage: The Markup Conference 2018 – 77 Days To Paper Submission Deadline!

Filed under: Conferences,XML,XML Schema,XPath,XQuery — Patrick Durusau @ 8:46 pm

Call for Participation

Submission dates/instructions have dropped!

When:
Dates:

  • 22 March 2018 — Peer review applications due
  • 22 April 2018 — Paper submissions due
  • 21 May 2018 — Speakers notified
  • 8 June 2018 — Late-breaking News submissions due
  • 15 June 2018 — Late-breaking News speakers notified
  • 6 July 2018 — Final papers due from presenters of peer reviewed papers
  • 6 July 2018 — Short paper or slide summary due from presenters of late-breaking news
  • 30 July 2018 — Pre-conference Symposium
  • 31 July –3 August 2018 — Balisage: The Markup Conference
How:
Submit full papers in XML to info@balisage.net
See the pages Instructions for Authors and
Tag Set and Submission Guidelines for details.
Apply to the Peer Review panel

I’ve heard inability to submit valid markup counts in the judging of papers. That may just be rumor or it may be true. I suggest validating your submission.

You should be on the fourth or fifth draft of your paper by now, but be aware the paper submission deadline is April 22, 2018, or 77 days from today!

Looking forward to seeing exceptionally strong papers in the review process and being presented at Balisage!

January 10, 2018

eXist-db – First Upgrade for 2018

Filed under: eXist,XML,XML Database,XQuery — Patrick Durusau @ 2:06 pm

I usually update from notices of a new version and so rarely visit the eXist-db homepage. My loss.

There’s a cool homepage image. With links to documentation, community, references, but not overwhelmingly so.

Kudos! Oh, the upgrade:

eXist-db v3.6.1 – January 03, 2018

From the release notes:

eXist-db v3.6.1 has just been released. This is a hotfix release, which contains bug fixes for several important issues discovered since eXist-db v3.6.0.

We recommend that all users of eXist 3.6.0 should upgrade to eXist 3.6.1.

Bug fixes

  • Fixed issue where the package manager wrote non-well-formed XML that caused problems during backup/restore. #1620
  • Fixed namespace prefix for attributes and namespace nodes.
  • Made sure the localName of a in memory element is correctly obtained under various namespace declaration conditions
  • Fix for NPE in org.exist.xquery.functions.fn.FunId #1642
  • Several atomic comparisons raise wrong error code #1638
  • General comparison to empty sequence sometimes raises an error #1639
  • Warn if no <target> is found in an EXPath packages’s repo.xml

Backwards Compatibility

  • eXist-db v3.6.1 is backwards binary-compatible as far as v3.0, but not with earlier versions. Users upgrading from previous versions should perform a full backup and restore to migrate their data.

Downloading This Version

eXist-db v3.6.1 is available for download from Bintray. Maven artifacts for eXist-db v3.6.1 are available from our mvn-repo. Mac users of the Homebrew package repository may acquire eXist 3.6.1 directly from there.

Downloading This Version

eXist-db v3.6.1 is available for download from Bintray. Maven artifacts for eXist-db v3.6.1 are available from our mvn-repo. Mac users of the Homebrew package repository may acquire eXist 3.6.1 directly from there.

When 2018 congressional candidate (U.S.) inboxes start dropping, will eXist-db be your tool of choice?

Enjoy!

January 9, 2018

Sessions for XML Prague 2018 – January 10th, Early Bird Deadline!

Filed under: Conferences,XML,XQuery,XSLT — Patrick Durusau @ 8:03 pm

List of sessions for XML Prague 2018

The range of great presentations is no surprise.

That early registration is still open, with this list of presentations, well, that is a surprise!

January 10, 2018 is the deadline for early birds!

From the post:

Unconference day

Schematron Users Meetup
XSL-FO, CSS and Paged Output – hosted by Antenna House
Introduction to CSS for Paged Media
XSpec Users Meetup
oXygen Users Meeup
Creating beautiful documents with the speedata Publisher
eXist-db Community Meetup
XML with Emacs workshop

Friday and Saturday sessions

Bert Willems: Assisted Structured Authoring using Conditional Random Fields
Christophe Marchand and Matthieu Ricaud-Dussarget: Using Maven with XML Projects
Elli Bleeker, Bram Buitendijk, Ronald Haentjens Dekker and Astrid Kulsdom: Including XML Markup in the Automated Collation of Literary Texts
Erik Siegel: Multi-layered content modelling to the rescue
Francis Cave: Does the world need more XML standards?
Gerrit Imsieke: tokenized-to-tree – An XProc/XSLT Library For Patching Back Tokenization/Analysis Results Into Marked-up Text
Hans-Juergen Rennau: Combining graph and tree: writing SHAX, obtaining SHACL, XSD and more
James Fuller: Diff with XQuery
Jean-François Larvoire: SML – A simpler and shorter representation of XML
Johannes Kolbe and Manuel Montero: XML periodic table, XML repository and XSLT checker
Michael Kay: XML Tree Models for Efficient Copy Operations
O’Neil Delpratt and Debbie Lockett: Implementing XForms using interactive XSLT 3:0
Pieter Masereeuw: Can we create a real world rich Internet application using Saxon-JS?
Radu Coravu: A short story about XML encoding and opening very large documents in an XML editing application
Steven Higgs: XML Success Story: Creating and Integrating Collaboration Solutions to Improve the Documentation Process
Steven Pemberton: Form, and Content
Tejas Barhate and Nigel Whitaker: Varieties of XML Merge: Concurrent versus Sequential
Tony Graham: Life, the Universe, and CSS Tests
Vasu Chakkera: Effective XSLT Documentation and its separation from XSLT code:
Zachary Dean: xqerl: XQuery 3:1 Implementation in Erlang

I’m expecting lots of tweets and posts about these presentations!

December 26, 2017

xsd2json – XML Schema to JSON Schema Transform

Filed under: JSON,XML,XML Schema — Patrick Durusau @ 9:12 pm

xsd2json by Loren Cahlander.

From the webpage:

XML Schema to JSON Schema Transform – Development and Test Environment

The options that are supported are:

‘keepNamespaces’ – set to true if keeping prefices in the property names is required otherwise prefixes are eliminated.

‘schemaId’ – the name of the schema

#xs:short { “type”: “integer”, “xsdType”: “xs:short”, “minimum”: -32768, “maximum”: 32767, “exclusiveMinimum”: false, “exclusiveMaximum”: false }

To be honest, I can’t imagine straying from Relax-NG, much less converting an XSD schema into a JSON schema.

But, it’s not possible to predict all needs and futures (hint to AI fearests). It will be easier to find xsd2json here than with adware burdened “modern” search engines, should the need arise.

November 27, 2017

eXist-db v3.6.0 [Prediction for 2018: Multiple data/document leak tsunamis. Are You Ready?]

Filed under: eXist,Government,Government Data,XML,XPath,XQuery — Patrick Durusau @ 9:28 pm

eXist-db v3.6.0

From the post:

Features

  • Switched Collation support to use ICU4j.
  • Implemented XQuery 3.1 UCA (Unicode Collation Algorithm).
  • Implemented map type parameters for XQuery F&O 3.1 fn:serialize.
  • Implemented declare context item for XQuery 3.0.
  • Implemented XQuery 3.0 Regular Expression’s support for non-capturing groups.
  • Implemented a type-safe DSL for describing and testing transactional operations upon the database.
  • Implemented missing node kind tests in the XQuery parser when using @ on an AbbrevForwardStep.
  • Added AspectJ support to the IntelliJ project files (IntelliJ Ultimate only).
  • Repaired the dependencies in the NetBeans project files.
  • Added support for Travis macOS CI.
  • Added support for AppVeyor Windows CI.
  • Updated third-party dependencies:
    • Apache Commons Codec 1.11
    • Apache Commons Compress 1.15
    • Apache Commons Lang 3.7
    • Eclipse AspectJ 1.9.0.RC1
    • Eclipse Jetty 9.4.7.v20170914
    • EXPath HTTP Client 20171116
    • Java 8 Functional Utilities 1.11
    • JCTools 2.1.1
    • XML Unit 2.4.0

Performance Improvements

  • Compiled XQuery cache is now multi-threaded; concurrency is now per-source.
  • RESTXQ compiled XQuery cache is now multi-threaded; concurrency is now per-query URI.
  • STX Templates Cache is now multithreaded.
  • XML-RPC Server will now use Streaming and GZip compression if supported by the client; enabled in eXist’s Java Admin Client.
  • Reduced object creation overhead in the XML-RPC Server.

Apps

The bundled applications of the Documentation, eXide, and Monex have all been updated to the latest versions.

Prediction for 2018: Multiple data/document leak tsunamis.

Are you prepared?

How are your XQuery skills and tools?

Or do you plan on regurgitating news wire summaries?

November 13, 2017

XML Prague 2017 – 21 Reasons to Attend 2018 – Offensive Use of XQuery

Filed under: Conferences,XML,XPath,XQuery,XSLT — Patrick Durusau @ 8:41 pm

XML Prague 2017 Videos

Need reasons for your attending XML Prague 2018?

The XML Prague 2017 YouTube playlist has twenty-one (21) very good reasons (videos). (You may have to hold the hands of c-suite types if you share the videos with them.)

Two things that I see missing from the presentations, security and offensive use of XQuery.

XML Security

You may have noticed that corporations, governments and others have been hemorrhaging data in 2017 (and before). While legislators wail ineffectually and wish for a 18th century world, the outlook for cybersecurity looks grim for 2018.

XML and XML applications exist in a law of the jungle security context. But there weren’t any presentations on security related issues at XML Prague in 2017. Are you going to break the ice in 2018?

Offensive use of XQuery

XQuery has the power to extract, enhance and transform data to serve your interests, not those of its authors.

I’ve heard the gospel that technologists should disarm themselves and righteously await a better day. Meanwhile, governments, military forces, banks, and their allies loot and rape the Earth and its peoples.

Are data scientists at the NSA, FSB, MSS, MI6, Mossad, CIA, etc., constrained by your “do no evil” creeds?

Present governments or their successors, can move towards more planet and people friendly policies, but they require, ahem, encouragement.

XQuery, which doesn’t depend upon melting data centers, supercomputers, global data vacuuming, etc., can help supply that encouragement.

How would you use XQuery to transform government data to turn it against its originator?

November 11, 2017

eXist-db Docker Image Builder

Filed under: eXist,XML,XQuery — Patrick Durusau @ 9:15 pm

eXist-db Docker Image Builder

From the webpage:

Pre-built eXist-db Docker images have been published on Docker Hub. You can skip to Running an eXist-db Docker Image if you just want to use the provided Docker images.

To ease your use of eXist-db or create a customized distribution of eXist-db, complete with additional resources, this rocks.

October 12, 2017

XML Prague 2018 – Apology to Procrastinators

Filed under: Conferences,Cybersecurity,Security,XML,XPath,XQuery,XSLT — Patrick Durusau @ 10:49 am

Apology to all procrastinators, I just saw the Call for Proposals for XML Prague 2018

You only have 50 days (until November 30, 2017) to submit your proposals for XML Prague 2018.

Efficient people don’t realize that 50 days is hardly enough time to put off thinking about a proposal topic, much less fail to write down anything for a proposal. Completely unreasonable demand but, do try to procrastinate quickly and get a proposal done for XML Prague 2018.

The suggestion of doing a “…short video…” seems rife with potential for humor and/or NSFW images. Perhaps XML Prague will post the best “…short videos…” to YouTube?

From the webpage:

XML Prague 2018 now welcomes submissions for presentations on the following topics:

  • Markup and the Extensible Web – HTML5, XHTML, Web Components, JSON and XML sharing the common space
  • Semantic visions and the reality – micro-formats, semantic data in business, linked data
  • Publishing for the 21th century – publishing toolchains, eBooks, EPUB, DITA, DocBook, CSS for print, …
  • XML databases and Big Data – XML storage, indexing, query languages, …
  • State of the XML Union – updates on specs, the XML community news, …
  • XML success stories – real-world use cases of successful XML deployments

There are several different types of slots available during the conference and you can indicate your preferred slot during submission:

30 minutes
15 minutes
These slots are suitable for normal conference talks.
90 minutes (unconference)
Ideal for holding users meeting or workshop during the unconference day (Thursday).

All proposals will be submitted for review by a peer review panel made up of the XML Prague Program Committee. Submissions will be chosen based on interest, applicability, technical merit, and technical correctness.

Authors should strive to contain original material and belong in the topics previously listed. Submissions which can be construed as product or service descriptions (adverts) will likely be deemed inappropriate. Other approaches such as use case studies are welcome but must be clearly related to conference topics.

Proposals can have several forms:

full paper
In our opinion still ideal and classical way of proposing presentation. Full paper gives reviewers enough information to properly asses your proposal.
extended abstract
Concise 1-4 page long description of your topic. If you do not have time to write full paper proposal this is one possible way to go. Try to make your extended abstract concrete and specific. Too short or vague abstract will not convince reviewers that it is worth including into the conference schedule.
short video (max. 5 minutes)
If you are not writing person but you still have something interesting to present. Simply capture short video (no longer then 5 minutes) containing part of your presentation. Video can capture you or it can be screen cast.

I mentioned XSLT security attacks recently, perhaps you could do something similar on XQuery? Other ways to use XML and related technologies to breach cybersecurity?

Do submit proposals and enjoy XML Prague 2018!

October 6, 2017

XSLT Server Side Injection Attacks

Filed under: Cybersecurity,Security,XML,XSLT — Patrick Durusau @ 12:02 pm

XSLT Server Side Injection Attacks by David Turco.

From the post:

Extensible Stylesheet Language Transformations (XSLT) vulnerabilities can have serious consequences for the affected applications, often resulting in remote code execution. Examples of XSLT remote code execution vulnerabilities with public exploits are CVE-2012-5357 affecting the .Net Ektron CMS; CVE-2012-1592 affecting Apache Struts 2.0; and CVE-2005-3757 which affected the Google Search Appliance.

From the examples above it is clear that XSLT vulnerabilities have been around for a long time and, although they are less common than other similar vulnerabilities such as XML Injection, we regularly find them in our security assessments. Nonetheless the vulnerability and the exploitation techniques are not widely known.

In this blog post we present a selection of attacks against XSLT to show the risks of using this technology in an insecure way.

We demonstrate how it is possible to execute arbitrary code remotely; exfiltrate data from remote systems; perform network scans; and access resources on the victim’s internal network.

We also make available a simple .NET application vulnerable to the described attacks and provide recommendations on how to mitigate them.

A great post for introducing XML and XSLT to potential hackers!

Equally great potential for a workshop at a markup conference.

Enjoy!

October 4, 2017

Procrastinators – Dates/Location for Balisage: The Markup Conference 2018

Filed under: Conferences,JSON,XML,XPath,XQuery,XSLT — Patrick Durusau @ 12:48 pm

Procrastinators can be caught short, without enough time for proper procrastination on papers and slides.

To insure ample time for procrastination, Balisage: The Markup Conference 2018 has published its dates and location.

31 July 2018–3 August 2018 … Balisage: The Markup Conference
30 July 2018 … Symposium – topic to be announced
CAMBRiA Hotel & Suites
1 Helen Heneghan Way
Rockville, Maryland 20850
USA

For indecisive procrastinators, Balisage offers suggestions for your procrastination:

The 2017 program included papers discussing XML vocabularies, cutting-edge digital humanities, lossless JSON/XML roundtripping, reflections on concrete syntax and abstract syntax, parsing and generation, web app development using the XML stack, managing test cases, pipelining and micropipelinging, electronic health records, rethinking imperative algorithms for XSLT and XQuery, markup and intellectual property, digitiziging Ethiopian and Eritrean manuscripts, exploring “shapes” in RDF and their relationship to schema validation, exposing XML data to users of varying technical skill, test-suite management, and use case studies about large conversion applications, DITA, and SaxonJS.

Innovative procrastinators can procrastinate on other related topics, including any they find on the Master Topic List (ideas procrastinated on for prior Balisage conferences).

Take advantage of this opportunity to procrastinate early and long on your Balisage submissions. You and your audience will be glad you did!

PS: Don’t procrastinate on saying thank you to Tommie Usdin and company for another year of Balisage. Balisage improves XML theory and practice every year it is held.

September 19, 2017

XQuery (Walmsley – Updated15 Sept. 2017) – Pagination Differences

Filed under: XML,XQuery — Patrick Durusau @ 6:20 pm

For those of you smart enough to own a copy of XQuery by Priscilla Walmsley, it was updated as of 15 September 2017.

There’s a four (4) page difference in length between the original edition (758 pages) and the updated version (762 pages).

One two (2) page addition is the new section “Specifying Serialization Parameters by Using a Map” plus an unnecessary page break following the introduction to example 13-4 (of the updated version).

Chapter 13, Inputs and Outputs, now ends on page 228 instead of 226.

The other two pages arise from the insertion of array:put following prefix-from-QName and before map:put, in Appendix A. Built-in Function Reference.

I haven’t found any mention of the pagination difference, which will be confusing for students consulting Walmsley.

Since the edition is not being updated, putting the added four pages in an Appendix D or even in preface material numbered i, ii, …, would have preserved references across the first and second versions.

XQuery should be widely used. Creating unnecessary friction for using XQuery resources doesn’t advance that goal.

August 5, 2017

Overlap – Attacking on Machine Learning Models

Filed under: Machine Learning,XML — Patrick Durusau @ 4:48 pm

Robust Physical-World Attacks on Machine Learning Models by Ivan Evtimov, et al.

Abstract:

Deep neural network-based classifiers are known to be vulnerable to adversarial examples that can fool them into misclassifying their input through the addition of small-magnitude perturbations. However, recent studies have demonstrated that such adversarial examples are not very effective in the physical world–they either completely fail to cause misclassification or only work in restricted cases where a relatively complex image is perturbed and printed on paper. In this paper we propose a new attack algorithm–Robust Physical Perturbations (RP2)– that generates perturbations by taking images under different conditions into account. Our algorithm can create spatially-constrained perturbations that mimic vandalism or art to reduce the likelihood of detection by a casual observer. We show that adversarial examples generated by RP2 achieve high success rates under various conditions for real road sign recognition by using an evaluation methodology that captures physical world conditions. We physically realized and evaluated two attacks, one that causes a Stop sign to be misclassified as a Speed Limit sign in 100% of the testing conditions, and one that causes a Right Turn sign to be misclassified as either a Stop or Added Lane sign in 100% of the testing conditions.

I was struck by the image used for this paper in a tweet:

I recognized this as an “overlapping” markup problem before discovering the authors were attacking machine learning models. On overlapping markup, see: Towards the unification of formats for overlapping markup by Paolo Marinelli, Fabio Vitali, Stefano Zacchiroli, or more recently, It’s more than just overlap: Text As Graph – Refining our notion of what text really is—this time for sure! by Ronald Haentjens Dekker and David J. Birnbaum.

From the conclusion:


In this paper, we introduced Robust Physical Perturbations (RP2), an algorithm that generates robust, physically realizable adversarial perturbations. Previous algorithms assume that the inputs of DNNs can be modified digitally to achieve misclassification, but such an assumption is infeasible, as an attacker with control over DNN inputs can simply replace it with an input of his choice. Therefore, adversarial attack algorithms must apply perturbations physically, and in doing so, need to account for new challenges such as a changing viewpoint due to distances, camera angles, different lighting conditions, and occlusion of the sign. Furthermore, fabrication of a perturbation introduces a new source of error due to a limited color gamut in printers.

We use RP2 to create two types of perturbations: subtle perturbations, which are small, undetectable changes to the entire sign, and camouflage perturbations, which are visible perturbations in the shape of graffiti or art. When the Stop sign was overlayed with a print out, subtle perturbations fooled the classifier 100% of the time under different physical conditions. When only the perturbations were added to the sign, the classifier was fooled by camouflage graffiti and art perturbations 66.7% and 100% of the time respectively under different physical conditions. Finally, when an untargeted poster-printed camouflage perturbation was overlayed on a Right Turn sign, the classifier was fooled 100% of the time. In future work, we plan to test our algorithm further by varying some of the other conditions we did not consider in this paper, such as sign occlusion.

Excellent work but my question: Is the inability of the classifier to recognize overlapping images similar to the issues encountered as overlapping markup?

To be sure overlapping markup is in part an artifice of unimaginative XML rules, since overlapping texts are far more common than non-overlapping texts. Especially when talking about critical editions or even differing analysis of the same text.

But beyond syntax, there is the subtlety of treating separate “layers” or stacks of a text as separate and yet tracking the relationship between two or more such stacks, when arbitrary additions or deletions can occur in any of them. Additions and deletions that must be accounted for across all layers/stacks.

I don’t have a solution to offer but pose the question of layers of recognition in hopes that machine learning models can capitalize on the lessons learned about a very similar problem with overlapping markup.

August 2, 2017

It’s more than just overlap: Text As Graph

Filed under: Graphs,Humanities,Hyperedges,Hypergraphs,Texts,XML — Patrick Durusau @ 12:57 pm

It’s more than just overlap: Text As Graph – Refining our notion of what text really is—this time for sure! by Ronald Haentjens Dekker and David J. Birnbaum.

Abstract:

The XML tree paradigm has several well-known limitations for document modeling and processing. Some of these have received a lot of attention (especially overlap), and some have received less (e.g., discontinuity, simultaneity, transposition, white space as crypto-overlap). Many of these have work-arounds, also well known, but—as is implicit in the term “work-around”—these work-arounds have disadvantages. Because they get the job done, however, and because XML has a large user community with diverse levels of technological expertise, it is difficult to overcome inertia and move to a technology that might offer a more comprehensive fit with the full range of document structures with which researchers need to interact both intellectually and programmatically. A high-level analysis of why XML has the limitations it has can enable us to explore how an alternative model of Text as Graph (TAG) might address these types of structures and tasks in a more natural and idiomatic way than is available within an XML paradigm.

Hyperedges, texts and XML, what more could you need? 😉

This paper merits a deep read and testing by everyone interested in serious text modeling.

You can’t read the text but here is a hypergraph visualization of an excerpt from Lewis Carroll’s “The hunting of the Snark:”

The New Testament, the Hebrew Bible, to say nothing of the Rabbinic commentaries on the Hebrew Bible and centuries of commentary on other texts could profit from this approach.

Put your text to the test and share how to advance this technique!

June 8, 2017

XSL Transformations (XSLT) Version 3.0 (That’s a Wrap!)

Filed under: XML,XSLT — Patrick Durusau @ 10:02 am

XSL Transformations (XSLT) Version 3.0 W3C Recommendation 8 June 2017

Abstract:

This specification defines the syntax and semantics of XSLT 3.0, a language designed primarily for transforming XML documents into other XML documents.

XSLT 3.0 is a revised version of the XSLT 2.0 Recommendation [XSLT 2.0] published on 23 January 2007.

The primary purpose of the changes in this version of the language is to enable transformations to be performed in streaming mode, where neither the source document nor the result document is ever held in memory in its entirety. Another important aim is to improve the modularity of large stylesheets, allowing stylesheets to be developed from independently-developed components with a high level of software engineering robustness.

XSLT 3.0 is designed to be used in conjunction with XPath 3.0, which is defined in [XPath 3.0]. XSLT shares the same data model as XPath 3.0, which is defined in [XDM 3.0], and it uses the library of functions and operators defined in [Functions and Operators 3.0]. XPath 3.0 and the underlying function library introduce a number of enhancements, for example the availability of higher-order functions.

As an implementer option, XSLT 3.0 can also be used with XPath 3.1. All XSLT 3.0 processors provide maps, an addition to the data model which is specified (identically) in both XSLT 3.0 and XPath 3.1. Other features from XPath 3.1, such as arrays, and new functions such as random-number-generatorFO31 and sortFO31, are available in XSLT 3.0 stylesheets only if the implementer chooses to support XPath 3.1.

Some of the functions that were previously defined in the XSLT 2.0 specification, such as the format-dateFO30 and format-numberFO30 functions, are now defined in the standard function library to make them available to other host languages.

XSLT 3.0 also includes optional facilities to serialize the results of a transformation, by means of an interface to the serialization component described in [XSLT and XQuery Serialization]. Again, the new serialization capabilities of [XSLT and XQuery Serialization 3.1] are available at the implementer’s option.

This document contains hyperlinks to specific sections or definitions within other documents in this family of specifications. These links are indicated visually by a superscript identifying the target specification: for example XP30 for XPath 3.0, DM30 for the XDM data model version 3.0, FO30 for Functions and Operators version 3.0.

A special shout out to Michael Kay for, in his words, “Done and dusted: ten years’ work.”

Thanks from an appreciative audience!

May 17, 2017

Balisage: The Markup Conference 2017 Program Now Available

Filed under: Conferences,XML,XML Schema,XPath,XQuery,XSLT — Patrick Durusau @ 3:42 pm

Balisage: The Markup Conference 2017 Program Now Available

An email from Tommie Usdin, Chair, Chief Organizer and herder of markup cats for Balisage advises:

Balisage: where serious markup practitioners and theoreticians meet every August.

The 2017 program includes papers discussing XML vocabularies, cutting-edge digital humanities, lossless JSON/XML roundtripping, reflections on concrete syntax and abstract syntax, parsing and generation, web app development using the XML stack, managing test cases, pipelining and micropipelinging, electronic health records, rethinking imperative algorithms for XSLT and XQuery, markup and intellectual property, why YOU should use (my favorite XML vocabulary), developing a system to aid in studying manuscripts in the tradition of the Ethiopian and Eritrean Highlands, exploring “shapes” in RDF and their relationship to schema validation, exposing XML data to users of varying technical skill, test-suite management, and use case studies about large conversion applications, DITA, and SaxonJS.

Up-Translation and Up-Transformation: A one-day Symposium on the goals, challenges, solutions, and workflows for significant XML enhancements, including approaches, tools, and techniques that may potentially be used for a variety of other tasks. The symposium will be of value not only to those facing up-translation and transformation but also to general XML practitioners seeking to get the most out of their data.

Are you interested in open information, reusable documents, and vendor and application independence? Then you need descriptive markup, and Balisage is your conference. Balisage brings together document architects, librarians, archivists, computer scientists, XML practitioners, XSLT and XQuery programmers, implementers of XSLT and XQuery engines and other markup-related software, semantic-Web evangelists, standards developers, academics, industrial researchers, government and NGO staff, industrial developers, practitioners, consultants, and the world’s greatest concentration of markup theorists. Some participants are busy designing replacements for XML while other still use SGML (and know why they do).

Discussion is open, candid, and unashamedly technical.

Balisage 2017 Program:
http://www.balisage.net/2017/Program.html

Symposium Program:
https://www.balisage.net/UpTransform

NOTE: Members of the TEI and their employees are eligible for discount Balisage registration.

You need to see the program for yourself but the highlights (for me) include: Ethiopic manuscripts (ok, so I have odd tastes), Earley parsers (of particular interest), English Majors (my wife was an English major), and a number of other high points.

Mark your calendar for July 31 – August 4, 2017 – It’s Balisage!

Older Posts »

Powered by WordPress