Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 24, 2019

eXist-db 5.0.0 RC6

Filed under: eXist,XML,XML Database,XPath,XQuery — Patrick Durusau @ 4:35 pm

eXist-db 5.0.0 RC6

RC5 was released on November 21, 2018 so there are a number of new features and bug fixes to grab your interest in RC 6.

Features:

  • New De-duplicating BLOB store for binary documents – see https://blog.adamretter.org.uk/blob-deduplication/
  • More elaborate XPath expressions in the Lucene index config of collection.xconf are now supported
  • New non-blocking lock-free implementation of the Transaction Manager
  • CData serialization now respects the output:cdata-section-elements option
  • New XQuery function util:eval-and-serialize for dynamic XQuery evaluation and serialization.
  • New XQuery function util:binary-doc-content-digest to retrieve a digest of a Binary Document
  • … and others.

Bug fixes:

  • Fixed Lucene term range queries
  • Copying an XML Resource now correctly removes any nodes that it replaces
  • Fixed a memory leak with XQuery serializers
  • Fixed Garbage Collection churn issue with serialization
  • Fixed Backup/Restore progress reporting
  • XQuery Library Modules on the Java Classpath are now correctly resolved from the importing XQuery module
  • … and others.

Although not ready for production, these new features and bug fixes should have you scurrying to download eXist-db 5.0.0 RC6!

PS: Remember there are only 48 days left for paper submissions to Balisage 2019! Are you going to be using the latest RC for eXist?

January 9, 2019

Summer is Coming! Balisage is Coming! Papers Due April 12, 2019!

Filed under: Conferences,XML,XML Database,XML Query Rewriting,XML Schema,XPath,XProc,XQuery,XSLT — Patrick Durusau @ 7:52 pm

From a recent email about Balisage 2019:

Some “Balisage: The Markup Conference 2019” dates are coming soon:

March 29, 2019 — Peer-review applications due
April 12, 2019 — Paper submissions due
July 30 — August 2, 2019 — Balisage: The Markup Conference
July 29, 2019 — Pre-conference Symposium – Topic to be announced https://www.balisage.net/

Balisage: where serious markup practitioners and theoreticians meet every August.

A colleague recently asked me to share the program for Balisage 2019 to help support a request to attend. What, I was asked, will we talk about at Balisage 2019. I replied “It will be a variety of topics relating to markup, but we won’t know the specifics until May.” “Why? It seems like you should know that now.” was the response. “Why don’t you just decide who you want to talk about what and assign topics?” “Because that would not be a contributed paper conference, it would be some other sort of event!”

Balisage *is* a contributed paper conference, and the submissions from people who want to speak drive the program, the hallway conversations, and the whole tone of Balisage!

If you want to speak at Balisage 2019, if you want to help shape the conversation, if you have an idea, experience, opinion, or question relating to markup, please submit a paper to Balisage 2019!

We solicit papers on any aspect of markup and its uses; topics include but ARE NOT LIMITED TO:

• Cutting-edge applications of XML and related technologies
• Integration of XML with other technologies (e.g., content management, XSLT, XQuery)
• Performance issues in parsing, XML database retrieval, or XSLT processing
• Development of angle-bracket-free user interfaces for non-technical users
• Deployment of XML systems for enterprise data
• Design and implementation of XML vocabularies
• Case studies of the use of XML for publishing, interchange, or archiving
• Alternatives to XML/JSON/whatever
• Expressive power and application adequacy of XSD, Relax NG, DTDs, Schematron, and other schema languages
• Invisible XML

Detailed Call for Participation: https://www.balisage.net/Call4Participation.html
Call for Peer Reviewers: https://www.balisage.net/peer/ReviewAppForm.html
About Balisage: https://www.balisage.net/

For more information: info@balisage.net or +1 301 315 9631

Papers are due for Balisage in a little more than 90 days.

Anyone doing a topic map paper this year?

“If you can point to it, we can identify it. If we can identify it, we can map it. If we can map it, …,” well, you know how the rest of it goes.

Data silos continue to exist because they are armor. Armor that protects some stakeholders from prying eyes. Up for a little peeping?

November 21, 2018

pugixml 1.9 quick start guide

Filed under: Parsers,XML,XPath — Patrick Durusau @ 4:20 pm

pugixml 1.9 quick start guide

From the webpage:

pugixml is a light-weight C++ XML processing library. It consists of a DOM-like interface with rich traversal/modification capabilities, an extremely fast XML parser which constructs the DOM tree from an XML file/buffer, and an XPath 1.0 implementation for complex data-driven tree queries. Full Unicode support is also available, with two Unicode interface variants and conversions between different Unicode encodings (which happen automatically during parsing/saving). The library is extremely portable and easy to integrate and use. pugixml is developed and maintained since 2006 and has many users. All code is distributed under the MIT license, making it completely free to use in both open-source and proprietary applications.

pugixml enables very fast, convenient and memory-efficient XML document processing. However, since pugixml has a DOM parser, it can’t process XML documents that do not fit in memory; also the parser is a non-validating one, so if you need DTD/Schema validation, the library is not for you.

This is the quick start guide for pugixml, which purpose is to enable you to start using the library quickly. Many important library features are either not described at all or only mentioned briefly; for more complete information you should read the complete manual.

Despite the disappointing lack of document/email leaks during the 2018 mid-terms, I am hopeful the same will not be true in 2020. The 2020 elections will include a presidential race.

I encountered pugixml today in another context and thought I should mention it as a possible addition to your toolkit.

The repository: http://github.com/zeux/pugixml.

Enjoy!

August 1, 2018

Developing SGML DTDs From Text To Model To Markup

Filed under: XML,XPath — Patrick Durusau @ 8:06 pm

Developing SGML DTDs: From Text To Model To Markup by Eve Maler and Jeanne El Andaloussi.

Maler and El Andaloussi summarize (1.2.4) the benefits of SGML this way:

To summarize, SGML markup is unique in that it combines several design strengths:

  • It is declarative, which helps document producers “write once, use many”—putting the same document data to multiple uses, such as delivery of documents in a variety of online and paper formats and interchange with others who wish to use the documents in different ways.
  • It is generic across systems and has a nonproprietary design, which helps make documents vendor and platform independent and “future-proof”—protecting them against changes in computer hardware and software.
  • It is contextual, which heightens the quality and completeness of processing by allowing documents to be structurally validated and by enabling logical collections of data to be manipulated intelligently.

The characteristics of being declarative, generic, nonproprietary, and contextual make the Standard Generalized Markup Language “standard” and “generalized.”

A truly remarkable work that is as relevant today as it was twenty-three years ago.

Most important lesson: Understanding your document comes before designing markup. Every time.

May 29, 2018

Balisage Late-Breaking News Deadline – 6 July 2018 – Attract/Spot a Fed!

Filed under: Conferences,XML,XML Schema,XPath,XQuery,XSLT — Patrick Durusau @ 7:10 pm

Balisage 2018 Call for Late-breaking News

From the post:


Proposals for late-breaking slots must be received at info@balisage.net by July 6, 2018. Selection of late-breaking proposals will be made by the Balisage conference committee, instead of being made in the course of the regular peer-review process. (emphasis in original)

The Def Con conference attendees play spot the fed.

But spot the fed requires some feds in order to play.

Feds show up at hacker conferences. For content or the company of people with poor personal hygiene.

Let’s assume it’s the content.

What content for a markup paper will attract undercover federal agents?

Success means playing spot the fed at Balisage 2018.

Topics anyone?

May 23, 2018

Balisage 2018 Program!

Filed under: Conferences,XML,XPath,XQuery,XSLT — Patrick Durusau @ 12:40 pm

The Balisage 2018 program has hit the Web!

Among the goodies on the agenda:

  • Implementing and using concurrent document structures
  • White-hat web crawling: Industrial strength web crawling for serious content acquisition
  • Easing the road to declarative programming in XSLT for imperative programmers
  • Fractal information is
  • Scaling XML using a Beowulf cluster

That’s a random sampling from the talk already scheduled!

Even more intriguing are the open spots left for “late-breaking” news.

Perhaps you have some “late-breaking” XML related news to share?

I haven’t seen the 2018 Call for Late-Breaking papers but if the 2017 Call for Late-Breaking papers is any guide, time is running out!

Enjoy!

February 9, 2018

XML Prague 2018 Conference Proceedings – Weekend Reading!

Filed under: Conferences,XML,XML Database,XPath,XQuery,XSLT — Patrick Durusau @ 9:13 pm

XML Prague 2018 Conference Proceedings

Two Hundred and Sixty (260) pages of high quality content on XML!

From the table of contents:

  • Assisted Structured Authoring using Conditional Random Fields – Bert Willems
  • XML Success Story: Creating and Integrating Collaboration Solutions to Improve the Documentation Process – Steven Higgs
  • xqerl: XQuery 3.1 Implementation in Erlang – Zachary N. Dean
  • XML Tree Models for Efficient Copy Operations – Michael Kay
  • Using Maven with XML development projects – Christophe Marchand and Matthieu Ricaud-Dussarget
  • Varieties of XML Merge: Concurrent versus Sequential – Tejas Pradip Barhate and Nigel Whitaker
  • Including XML Markup in the Automated Collation of Literary Text – Elli Bleeker, Bram Buitendijk, Ronald Haentjens Dekker, and Astrid Kulsdom
  • Multi-Layer Content Modelling to the Rescue – Erik Siegel
  • Combining graph and tree – Hans-Juergen Rennau
  • SML – A simpler and shorter representation of XML – Jean-François Larvoire
  • Can we create a real world rich Internet application using Saxon-JS? – Pieter Masereeuw
  • Implementing XForms using interactive XSLT 3.0 – O’Neil Delpratt and Debbie Lockett
  • Life, the Universe, and CSS Tests – Tony Graham
  • Form, and Content – Steven Pemberton
  • tokenized-to-tree – Gerrit Imsieke

I just got a refurbished laptop for reading in bed. Now I have to load XML parsers, etc. on it to use along with reading these proceedings!

Enjoy!

PS: Be sure to thank Jirka Kosek for his tireless efforts promoting XML and XML Prague!

February 5, 2018

Balisage: The Markup Conference 2018 – 77 Days To Paper Submission Deadline!

Filed under: Conferences,XML,XML Schema,XPath,XQuery — Patrick Durusau @ 8:46 pm

Call for Participation

Submission dates/instructions have dropped!

When:
Dates:

  • 22 March 2018 — Peer review applications due
  • 22 April 2018 — Paper submissions due
  • 21 May 2018 — Speakers notified
  • 8 June 2018 — Late-breaking News submissions due
  • 15 June 2018 — Late-breaking News speakers notified
  • 6 July 2018 — Final papers due from presenters of peer reviewed papers
  • 6 July 2018 — Short paper or slide summary due from presenters of late-breaking news
  • 30 July 2018 — Pre-conference Symposium
  • 31 July –3 August 2018 — Balisage: The Markup Conference
How:
Submit full papers in XML to info@balisage.net
See the pages Instructions for Authors and
Tag Set and Submission Guidelines for details.
Apply to the Peer Review panel

I’ve heard inability to submit valid markup counts in the judging of papers. That may just be rumor or it may be true. I suggest validating your submission.

You should be on the fourth or fifth draft of your paper by now, but be aware the paper submission deadline is April 22, 2018, or 77 days from today!

Looking forward to seeing exceptionally strong papers in the review process and being presented at Balisage!

November 27, 2017

eXist-db v3.6.0 [Prediction for 2018: Multiple data/document leak tsunamis. Are You Ready?]

Filed under: eXist,Government,Government Data,XML,XPath,XQuery — Patrick Durusau @ 9:28 pm

eXist-db v3.6.0

From the post:

Features

  • Switched Collation support to use ICU4j.
  • Implemented XQuery 3.1 UCA (Unicode Collation Algorithm).
  • Implemented map type parameters for XQuery F&O 3.1 fn:serialize.
  • Implemented declare context item for XQuery 3.0.
  • Implemented XQuery 3.0 Regular Expression’s support for non-capturing groups.
  • Implemented a type-safe DSL for describing and testing transactional operations upon the database.
  • Implemented missing node kind tests in the XQuery parser when using @ on an AbbrevForwardStep.
  • Added AspectJ support to the IntelliJ project files (IntelliJ Ultimate only).
  • Repaired the dependencies in the NetBeans project files.
  • Added support for Travis macOS CI.
  • Added support for AppVeyor Windows CI.
  • Updated third-party dependencies:
    • Apache Commons Codec 1.11
    • Apache Commons Compress 1.15
    • Apache Commons Lang 3.7
    • Eclipse AspectJ 1.9.0.RC1
    • Eclipse Jetty 9.4.7.v20170914
    • EXPath HTTP Client 20171116
    • Java 8 Functional Utilities 1.11
    • JCTools 2.1.1
    • XML Unit 2.4.0

Performance Improvements

  • Compiled XQuery cache is now multi-threaded; concurrency is now per-source.
  • RESTXQ compiled XQuery cache is now multi-threaded; concurrency is now per-query URI.
  • STX Templates Cache is now multithreaded.
  • XML-RPC Server will now use Streaming and GZip compression if supported by the client; enabled in eXist’s Java Admin Client.
  • Reduced object creation overhead in the XML-RPC Server.

Apps

The bundled applications of the Documentation, eXide, and Monex have all been updated to the latest versions.

Prediction for 2018: Multiple data/document leak tsunamis.

Are you prepared?

How are your XQuery skills and tools?

Or do you plan on regurgitating news wire summaries?

November 13, 2017

XML Prague 2017 – 21 Reasons to Attend 2018 – Offensive Use of XQuery

Filed under: Conferences,XML,XPath,XQuery,XSLT — Patrick Durusau @ 8:41 pm

XML Prague 2017 Videos

Need reasons for your attending XML Prague 2018?

The XML Prague 2017 YouTube playlist has twenty-one (21) very good reasons (videos). (You may have to hold the hands of c-suite types if you share the videos with them.)

Two things that I see missing from the presentations, security and offensive use of XQuery.

XML Security

You may have noticed that corporations, governments and others have been hemorrhaging data in 2017 (and before). While legislators wail ineffectually and wish for a 18th century world, the outlook for cybersecurity looks grim for 2018.

XML and XML applications exist in a law of the jungle security context. But there weren’t any presentations on security related issues at XML Prague in 2017. Are you going to break the ice in 2018?

Offensive use of XQuery

XQuery has the power to extract, enhance and transform data to serve your interests, not those of its authors.

I’ve heard the gospel that technologists should disarm themselves and righteously await a better day. Meanwhile, governments, military forces, banks, and their allies loot and rape the Earth and its peoples.

Are data scientists at the NSA, FSB, MSS, MI6, Mossad, CIA, etc., constrained by your “do no evil” creeds?

Present governments or their successors, can move towards more planet and people friendly policies, but they require, ahem, encouragement.

XQuery, which doesn’t depend upon melting data centers, supercomputers, global data vacuuming, etc., can help supply that encouragement.

How would you use XQuery to transform government data to turn it against its originator?

November 3, 2017

XPath and XQuery Assertions in SoapUI

Filed under: XPath,XQuery — Patrick Durusau @ 11:06 am

The video, XPath and XQuery assertions in SoapUI in depth, drew my attention to SoapUI, but be forewarned the sound quality was so bad I could not follow it. Still, I can now mention SoapUI and that’s not a bad thing.

The SoapUI documentation has extended examples for Validating XML Messages, Getting started with Assertions, and Transferring Property Values.

SoapUI has the usual hand-waving about security but since critical airport security plans can be found USB litter, I’m not sure anyone bothers. Your Amazon account root password is probably on a sticky note on someone’s monitor. Go check.

October 12, 2017

XML Prague 2018 – Apology to Procrastinators

Filed under: Conferences,Cybersecurity,Security,XML,XPath,XQuery,XSLT — Patrick Durusau @ 10:49 am

Apology to all procrastinators, I just saw the Call for Proposals for XML Prague 2018

You only have 50 days (until November 30, 2017) to submit your proposals for XML Prague 2018.

Efficient people don’t realize that 50 days is hardly enough time to put off thinking about a proposal topic, much less fail to write down anything for a proposal. Completely unreasonable demand but, do try to procrastinate quickly and get a proposal done for XML Prague 2018.

The suggestion of doing a “…short video…” seems rife with potential for humor and/or NSFW images. Perhaps XML Prague will post the best “…short videos…” to YouTube?

From the webpage:

XML Prague 2018 now welcomes submissions for presentations on the following topics:

  • Markup and the Extensible Web – HTML5, XHTML, Web Components, JSON and XML sharing the common space
  • Semantic visions and the reality – micro-formats, semantic data in business, linked data
  • Publishing for the 21th century – publishing toolchains, eBooks, EPUB, DITA, DocBook, CSS for print, …
  • XML databases and Big Data – XML storage, indexing, query languages, …
  • State of the XML Union – updates on specs, the XML community news, …
  • XML success stories – real-world use cases of successful XML deployments

There are several different types of slots available during the conference and you can indicate your preferred slot during submission:

30 minutes
15 minutes
These slots are suitable for normal conference talks.
90 minutes (unconference)
Ideal for holding users meeting or workshop during the unconference day (Thursday).

All proposals will be submitted for review by a peer review panel made up of the XML Prague Program Committee. Submissions will be chosen based on interest, applicability, technical merit, and technical correctness.

Authors should strive to contain original material and belong in the topics previously listed. Submissions which can be construed as product or service descriptions (adverts) will likely be deemed inappropriate. Other approaches such as use case studies are welcome but must be clearly related to conference topics.

Proposals can have several forms:

full paper
In our opinion still ideal and classical way of proposing presentation. Full paper gives reviewers enough information to properly asses your proposal.
extended abstract
Concise 1-4 page long description of your topic. If you do not have time to write full paper proposal this is one possible way to go. Try to make your extended abstract concrete and specific. Too short or vague abstract will not convince reviewers that it is worth including into the conference schedule.
short video (max. 5 minutes)
If you are not writing person but you still have something interesting to present. Simply capture short video (no longer then 5 minutes) containing part of your presentation. Video can capture you or it can be screen cast.

I mentioned XSLT security attacks recently, perhaps you could do something similar on XQuery? Other ways to use XML and related technologies to breach cybersecurity?

Do submit proposals and enjoy XML Prague 2018!

October 4, 2017

Procrastinators – Dates/Location for Balisage: The Markup Conference 2018

Filed under: Conferences,JSON,XML,XPath,XQuery,XSLT — Patrick Durusau @ 12:48 pm

Procrastinators can be caught short, without enough time for proper procrastination on papers and slides.

To insure ample time for procrastination, Balisage: The Markup Conference 2018 has published its dates and location.

31 July 2018–3 August 2018 … Balisage: The Markup Conference
30 July 2018 … Symposium – topic to be announced
CAMBRiA Hotel & Suites
1 Helen Heneghan Way
Rockville, Maryland 20850
USA

For indecisive procrastinators, Balisage offers suggestions for your procrastination:

The 2017 program included papers discussing XML vocabularies, cutting-edge digital humanities, lossless JSON/XML roundtripping, reflections on concrete syntax and abstract syntax, parsing and generation, web app development using the XML stack, managing test cases, pipelining and micropipelinging, electronic health records, rethinking imperative algorithms for XSLT and XQuery, markup and intellectual property, digitiziging Ethiopian and Eritrean manuscripts, exploring “shapes” in RDF and their relationship to schema validation, exposing XML data to users of varying technical skill, test-suite management, and use case studies about large conversion applications, DITA, and SaxonJS.

Innovative procrastinators can procrastinate on other related topics, including any they find on the Master Topic List (ideas procrastinated on for prior Balisage conferences).

Take advantage of this opportunity to procrastinate early and long on your Balisage submissions. You and your audience will be glad you did!

PS: Don’t procrastinate on saying thank you to Tommie Usdin and company for another year of Balisage. Balisage improves XML theory and practice every year it is held.

May 17, 2017

Balisage: The Markup Conference 2017 Program Now Available

Filed under: Conferences,XML,XML Schema,XPath,XQuery,XSLT — Patrick Durusau @ 3:42 pm

Balisage: The Markup Conference 2017 Program Now Available

An email from Tommie Usdin, Chair, Chief Organizer and herder of markup cats for Balisage advises:

Balisage: where serious markup practitioners and theoreticians meet every August.

The 2017 program includes papers discussing XML vocabularies, cutting-edge digital humanities, lossless JSON/XML roundtripping, reflections on concrete syntax and abstract syntax, parsing and generation, web app development using the XML stack, managing test cases, pipelining and micropipelinging, electronic health records, rethinking imperative algorithms for XSLT and XQuery, markup and intellectual property, why YOU should use (my favorite XML vocabulary), developing a system to aid in studying manuscripts in the tradition of the Ethiopian and Eritrean Highlands, exploring “shapes” in RDF and their relationship to schema validation, exposing XML data to users of varying technical skill, test-suite management, and use case studies about large conversion applications, DITA, and SaxonJS.

Up-Translation and Up-Transformation: A one-day Symposium on the goals, challenges, solutions, and workflows for significant XML enhancements, including approaches, tools, and techniques that may potentially be used for a variety of other tasks. The symposium will be of value not only to those facing up-translation and transformation but also to general XML practitioners seeking to get the most out of their data.

Are you interested in open information, reusable documents, and vendor and application independence? Then you need descriptive markup, and Balisage is your conference. Balisage brings together document architects, librarians, archivists, computer scientists, XML practitioners, XSLT and XQuery programmers, implementers of XSLT and XQuery engines and other markup-related software, semantic-Web evangelists, standards developers, academics, industrial researchers, government and NGO staff, industrial developers, practitioners, consultants, and the world’s greatest concentration of markup theorists. Some participants are busy designing replacements for XML while other still use SGML (and know why they do).

Discussion is open, candid, and unashamedly technical.

Balisage 2017 Program:
http://www.balisage.net/2017/Program.html

Symposium Program:
https://www.balisage.net/UpTransform

NOTE: Members of the TEI and their employees are eligible for discount Balisage registration.

You need to see the program for yourself but the highlights (for me) include: Ethiopic manuscripts (ok, so I have odd tastes), Earley parsers (of particular interest), English Majors (my wife was an English major), and a number of other high points.

Mark your calendar for July 31 – August 4, 2017 – It’s Balisage!

April 19, 2017

Pure CSS crossword – CSS Grid

Filed under: Crossword Puzzle,Education,XPath,XQuery,XSLT — Patrick Durusau @ 4:22 pm

Pure CSS crossword – CSS Grid by Adrian Roworth.

The UI is slick, although creating the puzzle remains on you.

Certainly suitable for string answers, XQuery/XPath/XSLT expressions, etc.

Enjoy!

March 22, 2017

XQuery 3.1 and Company! (Deriving New Versions?)

Filed under: XML,XPath,XQuery — Patrick Durusau @ 9:09 am

XQuery 3.1: An XML Query Language W3C Recommendation 21 March 2017

Hurray!

Related reading of interest:

XML Path Language (XPath) 3.1

XPath and XQuery Functions and Operators 3.1

XQuery and XPath Data Model 3.1

These recommendations are subject to licenses that read in part:

No right to create modifications or derivatives of W3C documents is granted pursuant to this license, except as follows: To facilitate implementation of the technical specifications set forth in this document, anyone may prepare and distribute derivative works and portions of this document in software, in supporting materials accompanying software, and in documentation of software, PROVIDED that all such works include the notice below. HOWEVER, the publication of derivative works of this document for use as a technical specification is expressly prohibited.

You know I think the organization of XQuery 3.1 and friends could be improved but deriving and distributing “improved” versions is expressly prohibited.

Hmmm, but we are talking about XML and languages to query and transform XML.

Consider the potential of an query that calls XQuery 3.1: An XML Query Language and materials cited in it, then returns a version of XQuery 3.1 that has definitions from other standards off-set in the XQuery 3.1 text.

Or than inserts into the text examples or other materials.

For decades XML enthusiasts have bruited about dynamic texts but have produced damned few of them (as in zero) as their standards.

Let’s use the “no derivatives” language of the W3C as an incentive to not create another static document but a dynamic one that can grow or contract according to the wishes of its reader.

Suggestions for first round features?

January 29, 2017

Up-Translation and Up-Transformation … [Balisage Rocks!]

Filed under: Conferences,XML,XPath,XQuery,XSLT — Patrick Durusau @ 8:48 pm

Up-Translation and Up-Transformation: Tasks, Challenges, and Solutions (a Balisage pre-conference symposium)

When & Where:

Monday July 31, 2017
CAMBRiA Hotel, Rockville, MD USA

Chair: Evan Owens, Cenveo

You need more details than that?

Ok, from the webpage:

Increasing the granularity and/or specificity of markup is an important task in many different content and information workflows. Markup transformations might involve tasks such as high-level structuring, detailed component structuring, or enhancing information by matching or linking to external vocabularies or data. Enhancing markup presents numerous secondary challenges including lack of structure of the inputs or inconsistency of input data down to the level of spelling, punctuation, and vocabulary. Source data for up-translation may be XML, word processing documents, plain text, scanned & OCRed text, or databases; transformation goals may be content suitable for page makeup, search, or repurposing, in XML, JSON, or any other markup language.

The range of approaches to up-transformation is as varied as the variety of specifics of the input and required outputs. Solutions may combine automated processing with human review or could be 100% software implementations. With the potential for requirements to evolve over time, tools may have to be actively maintained and enhanced.

The presentations in this pre-conference symposium will include goals, challenges, solutions, and workflows for significant XML enhancements, including approaches, tools, and techniques that may potentially be used for a variety of other tasks. The symposium will be of value not only to those facing up-translation and transformation but also to general XML practitioners seeking to get the most out of their data.

If I didn’t know better, up-translation and up-transformation sound suspiciously like conferred properties of topic maps fame.

Well, modulo that conferred properties could be predicated on explicit subject identity and not hidden in the personal knowledge of the author.

There are two categories of up-translation and up-transformation:

  1. Ones that preserve jobs like spaghetti Cobol code, and
  2. Ones that support easy long term maintenance.

While writing your paper for the pre-conference, which category fits yours the best?

January 24, 2017

XQuery/XSLT Proposals – Comments by 28 February 2017

Filed under: XML,XPath,XQuery,XSLT — Patrick Durusau @ 4:09 pm

Proposed Recommendations Published for XQuery WG and XSLT WG.

From the webpage:

The XML Query Working Group and XSLT Working Group have published a Proposed Recommendation for four documents:

  • XQuery and XPath Data Model 3.1: This document defines the XQuery and XPath Data Model 3.1, which is the data model of XML Path Language (XPath) 3.1, XSL Transformations (XSLT) Version 3.0, and XQuery 3.1: An XML Query Language. The XQuery and XPath Data Model 3.1 (henceforth “data model”) serves two purposes. First, it defines the information contained in the input to an XSLT or XQuery processor. Second, it defines all permissible values of expressions in the XSLT, XQuery, and XPath languages.
  • XPath and XQuery Functions and Operators 3.1: The purpose of this document is to catalog the functions and operators required for XPath 3.1, XQuery 3.1, and XSLT 3.0. It defines constructor functions, operators, and functions on the datatypes defined in XML Schema Part 2: Datatypes Second Edition and the datatypes defined in XQuery and XPath Data Model (XDM) 3.1. It also defines functions and operators on nodes and node sequences as defined in the XQuery and XPath Data Model (XDM) 3.1.
  • XML Path Language (XPath) 3.1: XPath 3.1 is an expression language that allows the processing of values conforming to the data model defined in XQuery and XPath Data Model (XDM) 3.1. The name of the language derives from its most distinctive feature, the path expression, which provides a means of hierarchic addressing of the nodes in an XML tree. As well as modeling the tree structure of XML, the data model also includes atomic values, function items, and sequences.
  • XSLT and XQuery Serialization 3.1: This document defines serialization of an instance of the data model as defined in XQuery and XPath Data Model (XDM) 3.1 into a sequence of octets. Serialization is designed to be a component that can be used by other specifications such as XSL Transformations (XSLT) Version 3.0 or XQuery 3.1: An XML Query Language.

Comments are welcome through 28 February 2017.

Get your red pen out!

Unlike political flame wars on social media, comments on these proposed recommendatons could make a useful difference.

Enjoy!

January 16, 2017

XML.com Relaunch!

Filed under: XML,XML Schema,XPath,XQuery,XSLT — Patrick Durusau @ 4:11 pm

XML.com

Lauren Wood posted this note about the relaunch of XML.com recently:

I’ve relaunched XML.com (for some background, Tim Bray wrote an article here: https://www.xml.com/articles/2017/01/01/xmlcom-redux/). I’m hoping it will become part of the community again, somewhere for people to post their news (submit your news here: https://www.xml.com/news/submit-news-item/) and articles (see the guidelines at https://www.xml.com/about/contribute/). I added a job board to the site as well (if you’re in Berlin, Germany, or able to
move there, look at the job currently posted; thanks LambdaWerk!); if your employer might want to post XML-related jobs please email me.

The old content should mostly be available but some articles were previously available at two (or more) locations and may now only be at one; try the archive list (https://www.xml.com/pub/a/archive/) if you’re looking for something. Please let me know if something major is missing from the archives.

XML is used in a lot of areas, and there is a wealth of knowledge in this community. If you’d like to write an article, send me your ideas. If you have comments on the site, let me know that as well.

Just in time as President Trump is about to stir, vigorously, that big pot of crazy known as federal data.

Mapping, processing, transformation demands will grow at an exponential rate.

Notice the emphasis on demand.

Taking a two weeks to write custom software to sort files (you know the Weiner/Abedin laptop story, yes?) won’t be acceptable quite soon.

How are your on-demand XML chops?

December 13, 2016

XQuery/XPath CRs 3.1! [#DisruptJ20 Twitter Game]

Filed under: Saxon,XML,XPath,XQuery — Patrick Durusau @ 1:57 pm

Just in time for the holidays, new CRs for XQuery/XPath hit the street! Comments due by 2017-01-10.

XQuery and XPath Data Model 3.1 https://www.w3.org/TR/2016/CR-xpath-datamodel-31-20161213/

XML Path Language (XPath) 3.1 https://www.w3.org/TR/2016/CR-xpath-31-20161213/

XQuery 3.1: An XML Query Language https://www.w3.org/TR/2016/CR-xquery-31-20161213/

XPath and XQuery Functions and Operators 3.1 https://www.w3.org/TR/2016/CR-xpath-functions-31-20161213/

XQueryX 3.1 https://www.w3.org/TR/2016/CR-xqueryx-31-20161213/

#DisruptJ20 is too late for comments to the W3C but you can break the boredom of indeterminate waiting to protest excitedly for TV cameras and/or to be arrested.

How?

Play the XQuery/XPath 3.1 Twitter Game!

Definitions litter the drafts and appear as:

[Definition: A sequence is an ordered collection of zero or more items.]

You Tweet:

An ordered collection of zero or more items? #xquery

Correct response:

A sequence.

Some definitions are too long to be tweeted in full:

An expanded-QName is a value in the value space of the xs:QName datatype as defined in the XDM data model (see [XQuery and XPath Data Model (XDM) 3.1]): that is, a triple containing namespace prefix (optional), namespace URI (optional), and local name. (xpath-functions)

Suggest you tweet:

A triple containing namespace prefix (optional), namespace URI (optional), and local name.

or

A value in the value space of the xs:QName datatype as defined in the XDM data model (see [XQuery and XPath Data Model (XDM) 3.1].

In both cases, the correct response:

An expanded-QName.

Use a $10 burner phone and it unlocked at protests. If your phone is searched, imagine the attempts to break the “code.”

You could agree on definitions/responses as instructions for direct action. But I digress.

December 11, 2016

4 Days Left – Submission Alert – XML Prague

Filed under: Conferences,XML,XPath,XQuery,XSLT — Patrick Durusau @ 7:53 pm

A tweet by Jirka Kosek reminded me there are only 4 days left for XML Prague submissions!

  • December 15th – End of CFP (full paper or extended abstract)
  • January 8th – Notification of acceptance/rejection of paper to authors
  • January 29th – Final paper

From the call for papers:

XML Prague 2017 now welcomes submissions for presentations on the following topics:

  • Markup and the Extensible Web – HTML5, XHTML, Web Components, JSON and XML sharing the common space
  • Semantic visions and the reality – micro-formats, semantic data in business, linked data
  • Publishing for the 21th century – publishing toolchains, eBooks, EPUB, DITA, DocBook, CSS for print, …
  • XML databases and Big Data – XML storage, indexing, query languages, …
  • State of the XML Union – updates on specs, the XML community news, …

All proposals will be submitted for review by a peer review panel made up of the XML Prague Program Committee. Submissions will be chosen based on interest, applicability, technical merit, and technical correctness.

Accepted papers will be included in published conference proceedings.

I don’t travel but if you need a last-minute co-author or proofer, you know where to find me!

May 23, 2016

Balisage 2016 Program Posted! (Newcomers Welcome!)

Filed under: Conferences,Topic Maps,XML,XML Schema,XPath,XProc,XQuery,XSLT — Patrick Durusau @ 8:03 pm

Tommie Usdin wrote today to say:

Balisage: The Markup Conference
2016 Program Now Available
http://www.balisage.net/2016/Program.html

Balisage: where serious markup practitioners and theoreticians meet every August.

The 2016 program includes papers discussing reducing ambiguity in linked-open-data annotations, the visualization of XSLT execution patterns, automatic recognition of grant- and funding-related information in scientific papers, construction of an interactive interface to assist cybersecurity analysts, rules for graceful extension and customization of standard vocabularies, case studies of agile schema development, a report on XML encoding of subtitles for video, an extension of XPath to file systems, handling soft hyphens in historical texts, an automated validity checker for formatted pages, one no-angle-brackets editing interface for scholars of German family names and another for scholars of Roman legal history, and a survey of non-XML markup such as Markdown.

XML In, Web Out: A one-day Symposium on the sub rosa XML that powers an increasing number of websites will be held on Monday, August 1. http://balisage.net/XML-In-Web-Out/

If you are interested in open information, reusable documents, and vendor and application independence, then you need descriptive markup, and Balisage is the conference you should attend. Balisage brings together document architects, librarians, archivists, computer
scientists, XML practitioners, XSLT and XQuery programmers, implementers of XSLT and XQuery engines and other markup-related software, Topic-Map enthusiasts, semantic-Web evangelists, standards developers, academics, industrial researchers, government and NGO staff, industrial developers, practitioners, consultants, and the world’s greatest concentration of markup theorists. Some participants are busy designing replacements for XML while other still use SGML (and know why they do).

Discussion is open, candid, and unashamedly technical.

Balisage 2016 Program: http://www.balisage.net/2016/Program.html

Symposium Program: http://balisage.net/XML-In-Web-Out/symposiumProgram.html

Even if you don’t eat RELAX grammars at snack time, put Balisage on your conference schedule. Even if a bit scruffy looking, the long time participants like new document/information problems or new ways of looking at old ones. Not to mention they, on occasion, learn something from newcomers as well.

It is a unique opportunity to meet the people who engineered the tools and specs that you use day to day.

Be forewarned that most of them have difficulty agreeing what controversial terms mean, like “document,” but that to one side, they are a good a crew as you are likely to meet.

Enjoy!

February 2, 2016

Balisage 2016, 2–5 August 2016 [XML That Makes A Difference!]

Filed under: Conferences,XLink,XML,XML Data Clustering,XML Schema,XPath,XProc,XQuery,XSLT — Patrick Durusau @ 9:47 pm

Call for Participation

Dates:

  • 25 March 2016 — Peer review applications due
  • 22 April 2016 — Paper submissions due
  • 21 May 2016 — Speakers notified
  • 10 June 2016 — Late-breaking News submissions due
  • 16 June 2016 — Late-breaking News speakers notified
  • 8 July 2016 — Final papers due from presenters of peer reviewed papers
  • 8 July 2016 — Short paper or slide summary due from presenters of late-breaking news
  • 1 August 2016 — Pre-conference Symposium
  • 2–5 August 2016 — Balisage: The Markup Conference

From the call:

Balisage is the premier conference on the theory, practice, design, development, and application of markup. We solicit papers on any aspect of markup and its uses; topics include but are not limited to:

  • Web application development with XML
  • Informal data models and consensus-based vocabularies
  • Integration of XML with other technologies (e.g., content management, XSLT, XQuery)
  • Performance issues in parsing, XML database retrieval, or XSLT processing
  • Development of angle-bracket-free user interfaces for non-technical users
  • Semistructured data and full text search
  • Deployment of XML systems for enterprise data
  • Web application development with XML
  • Design and implementation of XML vocabularies
  • Case studies of the use of XML for publishing, interchange, or archiving
  • Alternatives to XML
  • the role(s) of XML in the application lifecycle
  • the role(s) of vocabularies in XML environments

Full papers should be submitted by the deadline given below. All papers are peer-reviewed — we pride ourselves that you will seldom get a more thorough, skeptical, or helpful review than the one provided by Balisage reviewers.

Whether in theory or practice, let’s make Balisage 2016 the one people speak of in hushed tones at future markup and information conferences.

Useful semantics continues to flounder about, cf. Vice-President Biden’s interest in “one cancer research language.” Easy enough to say. How hard could it be?

Documents are commonly thought of and processed as if from BOM to EOF is the definition of a document. Much to our impoverishment.

Silo dissing has gotten popular. What if we could have our silos and eat them too?

Let’s set our sights on a Balisage 2016 where non-technicals come away saying “I want that!”

Have your first drafts done well before the end of February, 2016!

December 25, 2015

Facets for Christmas!

Filed under: XML,XPath,XQuery — Patrick Durusau @ 11:47 am

Facet Module

From the introduction:

Faceted search has proven to be enormously popular in the real world applications. Faceted search allows user to navigate and access information via a structured facet classification system. Combined with full text search, it provides user with enormous power and flexibility to discover information.

This proposal defines a standardized approach to support the Faceted search in XQuery. It has been designed to be compatible with XQuery 3.0, and is intended to be used in conjunction with XQuery and XPath Full Text 3.0.

Imagine my surprise when after opening Christmas presents with family to see a tweet by XQuery announcing yet another Christmas present:

“Facets”: A new EXPath spec w/extension functions & data models to enable faceted navigation & search in XQuery http://expath.org/spec/facet

The EXPath homepage says:

XPath is great. XPath-based languages like XQuery, XSLT, and XProc, are great. The XPath recommendation provides a foundation for writing expressions that evaluate the same way in a lot of processors, written in different languages, running in different environments, in XML databases, in in-memory processors, in servers or in clients.

Supporting so many different kinds of processor is wonderful thing. But this also contrains which features are feasible at the XPath level and which are not. In the years since the release of XPath 2.0, experience has gradually revealed some missing features.

EXPath exists to provide specifications for such missing features in a collaborative- and implementation-independent way. EXPath also provides facilities to help and deliver implementations to as many processors as possible, via extensibility mechanisms from the XPath 2.0 Recommendation itself.

Other projects exist to define extensions for XPath-based languages or languages using XPath, as the famous EXSLT, and the more recent EXQuery and EXProc projects. We think that those projects are really useful and fill a gap in the XML core technologies landscape. Nevertheless, working at the XPath level allows common solutions when there is no sense in reinventing the wheel over and over again. This is just following the brilliant idea of the W3C’s XSLT and XQuery working groups, which joined forces to define XPath 2.0 together. EXPath purpose is not to compete with other projects, but collaborate with them.

Be sure to visit the resources page. It has a manageable listing of processors that handle extensions.

What would you like to see added to XPath?

Enjoy!

December 17, 2015

My Bad – You Are Not! 747 Edits Away From Using XML Tools

Filed under: XPath,XQuery,XSLT — Patrick Durusau @ 4:11 pm

The original, unedited post is below but in response to comments, I checked the XQuery, XPath, XSLT and XQuery Serialization 3.1 files in Chrome (CNTR-U) before saving them.

All the empty elements were properly closed.

I then saved the files and re-opened in Emacs, to discover that Chrome had stripped the “/” from the empty elements, which then caused BaseX to complain. It was an accurate complaint but the files I was tossing against BaseX were not the files as published by the W3C.

So now I need to file a bug report on Chrome, Version 47.0.2526.80 (64-bit) on Ubuntu, for mangling closed empty elements.


You could tell in XQuery, XPath, XSLT and XQuery Serialization 3.1, New Candidate Recommendations! that I was really excited to see the new drafts hit the street.

Me and my big mouth.

I grabbed copies of all three and tossed the XQuery draft against an xquery to create a list of all the paths in it. Simple enough.

The result weren’t.

Here is the first error message:

[FODC0002] “file:/home/patrick/working/w3c/XQuery3.1.html” (Line 68): The element type “link” must be terminated by the matching end-tag “</link>”.

Ouch!

I corrected that and running the query a second time I got:

[FODC0002] “file:/home/patrick/working/w3c/XQuery3.1.html” (Line 68): The element type “meta” must be terminated by the matching end-tag “</meta>”.

The <meta> elements appear on lines three and four.

On the third try:

[FODC0002] “file:/home/patrick/working/w3c/XQuery3.1.html” (Line 69): The element type “img” must be terminated by the matching end-tag “</img>”.

There are 3 <img> elements that are not closed.

I’m getting fairly annoyed at this point.

Fourth try:

[FODC0002] “file:/home/patrick/working/w3c/XQuery3.1.html” (Line 78): The element type “br” must be terminated by the matching end-tag “</br>”.

Of course at this point I revert to grep and discover there are 353
elements that are not closed.

Sigh, nothing to do but correct and soldier on.

Fifth attempt.

[FODC0002] “file:/home/patrick/working/w3c/XQuery3.1.html” (Line 17618): The element type “hr” must be terminated by the matching end-tag “</hr>”.

There are 2 <hr> elements that are not closed.

A total of 361 edits in order to use XML based tools with the most recent XQuery 3.1 Candidate draft.

The most recent XPath 3.1 has 238 empty elements that aren’t closed (same elements as XQuery 3.1).

The XSLT and XQuery Serialization 3.1 draft has 149 empty elements that aren’t closed, same as the other but with the addition of four <col> elements that weren’t closed.

Grand total: 747 edits in order to use XML tools.

Not an editorial but a production problem. A rather severe one it seems to me.

Anyone who wants to use XML tools on these drafts will have to perform the same edits.

XQuery, XPath, XSLT and XQuery Serialization 3.1, New Candidate Recommendations!

Filed under: W3C,XPath,XQuery,XSLT — Patrick Durusau @ 11:10 am

As I forecast 😉 earlier this week, new Candidate Recommendations for:

XQuery 3.1: An XML Query Language

XML Path Language (XPath) 3.1

XSLT and XQuery Serialization 3.1

have hit the streets for your review and comments!

Comments due by 2016-01-31.

That’s forty-five days, minus the ones spent with drugs/sex/rock-n-roll over the holidays and recovering from same.

Say something shy of forty-four actual working days (my endurance isn’t what it once was) for the review process.

What tools, techniques are you going to use to review this latest set of candidates?

BTW, some people review software and check only fixes, for standards I start at the beginning, go to the end, then stop. (Or the reverse for backward proofing.)

My estimates on days spent with drugs/sex/rock-n-rock are approximate only and your experience may vary.

December 14, 2015

XQuery, XPath, XSLT and XQuery Serialization 3.1 (Back-to-Front) Drafts (soon!)

Filed under: W3C,XPath,XQuery,XSLT — Patrick Durusau @ 4:04 pm

XQuery, XPath, XSLT and XQuery Serialization 3.1 (Back-to-Front) Drafts will be published quite soon so I wanted to give you a heads up on your holiday reading schedule.

This is deep enough in the review cycle that a back-to-front reading is probably your best approach.

You have read the drafts and corrections often enough by this point that you read the first few words of a paragraph and you “know” what it says so you move on. (At the very least I can report that happens to me.)

By back-to-front reading I mean to start at the end of each draft and read the last sentence and then the next to last sentence and so on.

The back-to-front process does two things:

  1. You are forced to read each sentence on its own.
  2. It prevents skimming and filling in errors with silent corrections (unknown to your conscious mind).

The back-to-front method is quite time consuming so its fortunate these drafts are due to appear just before a series of holidays in a large number of places.

I hesitate to mention it but there is another way to proof these drafts.

If you have XML experienced visitors, you could take turns reading the drafts to each other. It was a technique used by copyists many years ago where one person read and two others took down the text. The two versions were then compared to each other and the original.

Even with a great reading voice, I’m not certain many people would be up to that sort of exercise.

PS: I will post on the new drafts as soon as they are published.

December 8, 2015

XQuery, 2nd Edition, Updated! (A Drawback to XQuery)

Filed under: XML,XPath,XQuery — Patrick Durusau @ 3:57 pm

XQuery, 2nd Edition, Updated! by Priscilla Walmsley.

The updated version of XQuery, 2nd Edition has hit the streets!

As a plug for the early release program at O’Reilly, yours truly appears in the acknowledgments (page xxii) from having submitted comments on the early release version of XQuery. You can too. Early release participation is yet another way to contribute back to the community.

There is one drawback to XQuery which I discuss below.

For anyone not fortunate enough to already have a copy of XQuery, 2nd Edition, here is the full description from the O’Reilly site:

The W3C XQuery 3.1 standard provides a tool to search, extract, and manipulate content, whether it’s in XML, JSON or plain text. With this fully updated, in-depth tutorial, you’ll learn to program with this highly practical query language.

Designed for query writers who have some knowledge of XML basics, but not necessarily advanced knowledge of XML-related technologies, this book is ideal as both a tutorial and a reference. You’ll find background information for namespaces, schemas, built-in types, and regular expressions that are relevant to writing XML queries.

This second edition provides:

  • A high-level overview and quick tour of XQuery
  • New chapters on higher-order functions, maps, arrays, and JSON
  • A carefully paced tutorial that teaches XQuery without being bogged down by the details
  • Advanced concepts for taking advantage of modularity, namespaces, typing, and schemas
  • Guidelines for working with specific types of data, such as numbers, strings, dates, URIs, maps and arrays
  • XQuery’s implementation-specific features and its relationship to other standards including SQL and XSLT
  • A complete alphabetical reference to the built-in functions, types, and error messages

Drawback to XQuery:

You know I hate to complain, but the brevity of XQuery is a real drawback to billing.

For example, I have a post pending on taking 604 lines of XSLT down to 35 lines of XQuery.

Granted the XQuery is easier to maintain, modify, extend, but all a client will see is the 35 lines of XQuery. At least 604 lines of XSLT looks like you really worked to produce something.

I know about XQueryX but I haven’t seen any automatic way to convert XQuery into XQueryX. Am I missing something obvious? If that’s possible, I could just bulk up the deliverable with an XQueryX expression of the work and keep the XQuery version for production use.

As excellent as I think XQuery and Walmsley’s book both are, I did want to warn you about the brevity of your XQuery deliverables.

I look forward to finish reading XQuery, 2nd Edition. I started doing so many things based on the first twelve or so chapters that I just read selectively from that point on. It merits a complete read. You won’t be sorry you did.

November 24, 2015

XQuery and XPath Full Text 3.0 (Recommendation)

Filed under: Searching,XPath,XQuery — Patrick Durusau @ 4:29 pm

XQuery and XPath Full Text 3.0

From 1.1 Full-Text Search and XML:

As XML becomes mainstream, users expect to be able to search their XML documents. This requires a standard way to do full-text search, as well as structured searches, against XML documents. A similar requirement for full-text search led ISO to define the SQL/MM-FT [SQL/MM] standard. SQL/MM-FT defines extensions to SQL to express full-text searches providing functionality similar to that defined in this full-text language extension to XQuery 3.0 and XPath 3.0.

XML documents may contain highly structured data (fixed schemas, known types such as numbers, dates), semi-structured data (flexible schemas and types), markup data (text with embedded tags), and unstructured data (untagged free-flowing text). Where a document contains unstructured or semi-structured data, it is important to be able to search using Information Retrieval techniques such as
scoring and weighting.

Full-text search is different from substring search in many ways:

  1. A full-text search searches for tokens and phrases rather than substrings. A substring search for news items that contain the string “lease” will return a news item that contains “Foobar Corporation releases version 20.9 …”. A full-text search for the token “lease” will not.
  2. There is an expectation that a full-text search will support language-based searches which substring search cannot. An example of a language-based search is “find me all the news items that contain a token with the same linguistic stem as ‘mouse'” (finds “mouse” and “mice”). Another example based on token proximity is “find me all the news items that contain the tokens ‘XML’ and ‘Query’ allowing up to 3 intervening tokens”.
  3. Full-text search must address the vagaries and nuances of language. Search results are often of varying usefulness. When you search a web site for cameras that cost less than $100, this is an exact search. There is a set of cameras that matches this search, and a set that does not. Similarly, when you do a string search across news items for “mouse”, there is only 1 expected result set. When you do a full-text search for all the news items that contain the token “mouse”, you probably expect to find news items containing the token “mice”, and possibly “rodents”, or possibly “computers”. Not all results are equal. Some results are more “mousey” than others. Because full-text search may be inexact, we have the notion of score or relevance. We generally expect to see the most relevant results at the top of the results list.

Note:

As XQuery and XPath evolve, they may apply the notion of score to querying structured data. For example, when making travel plans or shopping for cameras, it is sometimes useful to get an ordered list of near matches in addition to exact matches. If XQuery and XPath define a generalized inexact match, we expect XQuery and XPath to utilize the scoring framework provided by XQuery and XPath Full Text 3.0.

Definition: Full-text queries are performed on tokens and phrases. Tokens and phrases are produced via tokenization.] Informally, tokenization breaks a character string into a sequence of tokens, units of punctuation, and spaces.

Tokenization, in general terms, is the process of converting a text string into smaller units that are used in query processing. Those units, called tokens, are the most basic text units that a full-text search can refer to. Full-text operators typically work on sequences of tokens found in the target text of a search. These tokens are characterized by integers that capture the relative position(s) of the token inside the string, the relative position(s) of the sentence containing the token, and the relative position(s) of the paragraph containing the token. The positions typically comprise a start and an end position.

Tokenization, including the definition of the term “tokens”, SHOULD be implementation-defined. Implementations SHOULD expose the rules and sample results of tokenization as much as possible to enable users to predict and interpret the results of tokenization. Tokenization operates on the string value of an item; for element nodes this does not include the content of attribute nodes, but for attribute nodes it does. Tokenization is defined more formally in 4.1 Tokenization.

[Definition: A token is a non-empty sequence of characters returned by a tokenizer as a basic unit to be searched. Beyond that, tokens are implementation-defined.] [Definition: A phrase is an ordered sequence of any number of tokens. Beyond that, phrases are implementation-defined.]

Not a fast read but a welcome one!

XQuery and XPath increase the value of all XML-encoded documents, at least down to the level of their markup. Beyond nodes, you are on your own.

XQuery and XPath Full Text 3.0 extend XQuery and XPath beyond existing markup in documents. Content that was too expensive or simply not of enough interest to encode, can still be reached in a robust and reliable way.

If you can “see” it with your computer, you can annotate it.

You might have to possess a copy of the copyrighted content, but still, it isn’t a closed box that resists annotation. Enabling you to sell the annotation as a value-add to the copyrighted content.

XQuery and XPath Full Text 3.0 says token and phrase are implementation defined.

Imagine the user (name) commented version of X movie, which is a driver file that has XQuery links into DVD playing on your computer (or rather to the data stream).

I rather like that idea.

PS: Check with a lawyer before you commercialize that annotation idea. I am not familiar with all EULAs and national laws.

November 3, 2015

XML Prague 2016 – Call for Papers [Looking for a co-author?]

Filed under: Conferences,XML,XPath,XQuery,XSLT — Patrick Durusau @ 6:54 pm

XML Prague 2016 – Call for Papers

Important Dates:

  • November 30th – End of CFP (full paper or extended abstract)
  • January 4th – Notification of acceptance/rejection of paper to authors
  • January 25th – Final paper
  • February 11-13, XML Prague 2016

From the webpage:

XML Prague 2016 now welcomes submissions for presentations on the following topics:

  • Markup and the Extensible Web – HTML5, XHTML, Web Components, JSON and XML sharing the common space
  • Semantic visions and the reality – micro-formats, semantic data in business, linked data
  • Publishing for the 21th century – publishing toolchains, eBooks, EPUB, DITA, DocBook, CSS for print, …
  • XML databases and Big Data – XML storage, indexing, query languages, …
  • State of the XML Union – updates on specs, the XML community news, …

All proposals will be submitted for review by a peer review panel made up of the XML Prague Program Committee. Submissions will be chosen based on interest, applicability, technical merit, and technical correctness.

Accepted papers will be included in published conference proceedings.

Authors should strive to contain original material and belong in the topics previously listed. Submissions which can be construed as product or service descriptions (adverts) will likely be deemed inappropriate. Other approaches such as use case studies are welcome but must be clearly related to conference topics.

Accepted presenters must submit their full paper (on time) and give their presentation and answer questions in English, as well as follow the XML Prague 2016 conference guidelines.

I don’t travel but am interested in co-authoring a paper with someone who plans on attending XML Prague 2016. Contact me at patrick@durusau.net.

Older Posts »

Powered by WordPress