Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 3, 2015

XML Prague 2016 – Call for Papers [Looking for a co-author?]

Filed under: Conferences,XML,XPath,XQuery,XSLT — Patrick Durusau @ 6:54 pm

XML Prague 2016 – Call for Papers

Important Dates:

  • November 30th – End of CFP (full paper or extended abstract)
  • January 4th – Notification of acceptance/rejection of paper to authors
  • January 25th – Final paper
  • February 11-13, XML Prague 2016

From the webpage:

XML Prague 2016 now welcomes submissions for presentations on the following topics:

  • Markup and the Extensible Web – HTML5, XHTML, Web Components, JSON and XML sharing the common space
  • Semantic visions and the reality – micro-formats, semantic data in business, linked data
  • Publishing for the 21th century – publishing toolchains, eBooks, EPUB, DITA, DocBook, CSS for print, …
  • XML databases and Big Data – XML storage, indexing, query languages, …
  • State of the XML Union – updates on specs, the XML community news, …

All proposals will be submitted for review by a peer review panel made up of the XML Prague Program Committee. Submissions will be chosen based on interest, applicability, technical merit, and technical correctness.

Accepted papers will be included in published conference proceedings.

Authors should strive to contain original material and belong in the topics previously listed. Submissions which can be construed as product or service descriptions (adverts) will likely be deemed inappropriate. Other approaches such as use case studies are welcome but must be clearly related to conference topics.

Accepted presenters must submit their full paper (on time) and give their presentation and answer questions in English, as well as follow the XML Prague 2016 conference guidelines.

I don’t travel but am interested in co-authoring a paper with someone who plans on attending XML Prague 2016. Contact me at patrick@durusau.net.

September 28, 2015

Balisage 2016!

Filed under: Conferences,XML — Patrick Durusau @ 10:52 am

From my inbox this morning:

Mark Your Calendars: Balisage 2016
 - pre-conference symposium 1 August 2016
 - Balisage: The Markup Conference 2-5 August 2016

Bethesda North Marriott Hotel & Conference Center
5701 Marinelli Road  
North Bethesda, Maryland  20852
USA 

This much advanced notice makes me think someone had a toe curling good time at Balisage 2015.

Was Bill Clinton there? 😉

Attend Balisage 2016, look for Bill Clinton or someone else with a silly grin on their faces!

September 4, 2015

Apache VXQuery: A Scalable XQuery Implementation

Filed under: XML,XQuery — Patrick Durusau @ 1:34 pm

Apache VXQuery: A Scalable XQuery Implementation by E. Preston Carman Jr., Till Westmann, Vinayak R. Borkar, Michael J. Carey, Vassilis J. Tsotras.

Abstract:

The wide use of XML for document management and data exchange has created the need to query large repositories of XML data. To efficiently query such large data collections and take advantage of parallelism, we have implemented Apache VXQuery, an open-source scalable XQuery processor. The system builds upon two other open-source frameworks — Hyracks, a parallel execution engine, and Algebricks, a language agnostic compiler toolbox. Apache VXQuery extends these two frameworks and provides an implementation of the XQuery specifics (data model, data-model dependent functions and optimizations, and a parser). We describe the architecture of Apache VXQuery, its integration with Hyracks and Algebricks, and the XQuery optimization rules applied to the query plan to improve path expression efficiency and to enable query parallelism. An experimental evaluation using a real 500GB dataset with various selection, aggregation and join XML queries shows that Apache VXQuery performs well both in terms of scale-up and speed-up. Our experiments show that it is about 3x faster than Saxon (an open-source and commercial XQuery processor) on a 4-core, single node implementation, and around 2.5x faster than Apache MRQL (a MapReduce-based parallel query processor) on an eight (4-core) node cluster.

Are you looking for more “pop” in your XQueries? Apache VXQuery may be the answer.

Suitable for the Edgar dataset and the OpenStreetMap dataset.

This maybe what finally pushes me over the edge in choosing between a local cluster or pursuing one of the online options. Just looks too interesting to not want to play with it.

June 30, 2015

XML Inclusions (XInclude) Version 1.1

Filed under: XInclude,XML — Patrick Durusau @ 2:30 pm

XML Inclusions (XInclude) Version 1.1 W3C Candidate Recommendation 30 June 2015.

Will not exit CR before 25 August 2015.

Comments to: www-xml-xinclude-comments@w3.org, comment Archives.

Abstract:

This document specifies a processing model and syntax for general purpose inclusion. Inclusion is accomplished by merging a number of XML information sets into a single composite infoset. Specification of the XML documents (infosets) to be merged and control over the merging process is expressed in XML-friendly syntax (elements, attributes, URI references).

The promise of XML of dynamic documents, composed from data stores, other documents, etc., does happen, but not nearly as frequently as it should.

Looking for XML Inclusions to be another step away from documents as static containers.

June 6, 2015

MorganaXProc

Filed under: XML,XProc — Patrick Durusau @ 7:35 pm

MorganaXProc

From the webpage:

MorganaXProc is an implementation of W3C’s XProc: An XML Pipeline Language written in Java™.

I first saw this in a tweet by Norm Walsh (think XML Calabash, also an implementation of XProc). We could use more people like Norm.

June 1, 2015

Identifiers vs. Identifications?

Filed under: Duke,Topic Maps,XML — Patrick Durusau @ 3:50 pm

One problem with topic map rhetoric has been its focus on identifiers (the flat ones):

identifier2

rather than saying topic are managing subject identifications, that is, making explicit what is represented by an expectant identifier:

identifier-pregnant

For processing purposes it is handy to map between identifiers, to query identifiers, access by identifiers, to mention only a few tasks, and all of them are machine facing.

However efficient it may be to use flat identifiers (even by humans), having access to bundle of properties thought to identify a subject is useful as well.

Topic maps already capture identifiers but their syntaxes need to be extended to support the capturing of subject identifications along with identifiers.

Years of reading has gone into the realization about identifiers and their relationship to identifications, but I would be remiss if I didn’t call out the work of Lars Marius Garshol on Duke.

From the GitHub page:

Duke is a fast and flexible deduplication (or entity resolution, or record linkage) engine written in Java on top of Lucene. The latest version is 1.2 (see ReleaseNotes).

Duke can find duplicate customer records, or other kinds of records in your database. Or you can use it to connect records in one data set with other records representing the same thing in another data set. Duke has sophisticated comparators that can handle spelling differences, numbers, geopositions, and more. Using a probabilistic model Duke can handle noisy data with good accuracy.

In an early post on Duke Lars observes:


The basic idea is almost ridiculously simple: you pick a set of properties which provide clues for establishing identity. To compare two records you compare the properties pairwise and for each you conclude from the evidence in that property alone the probability that the two records represent the same real-world thing. Bayesian inference is then used to turn the set of probabilities from all the properties into a single probability for that pair of records. If the result is above a threshold you define, then you consider them duplicates.

Bayesian identity resolution

Only two quibbles with Lars on that passage:

I would delete “same real-world thing” and substitute, “any subject you want to talk about.”

I would point out that Bayesian inference is only one means of determining if two or more sets of properties represent the same subject. Defining sets of matching properties comes to mind. Inferencing based on relationships (associations). “Ask Steve,” is another.

But, I have never heard a truer statement from Lars than:

The basic idea is almost ridiculously simple: you pick a set of properties which provide clues for establishing identity.

Many questions remain, such as how to provide for collections of sets “of properties which provide clues for establishing identity?,” how to make those collections extensible?, how to provide for constraints on such sets?, where to record “matching” (read “merging”) rules?, what other advantages can be offered?

In answering those questions, I think we need to keep in mind that identifiers and identifications lie along a continuum that runs from where we “know” what is meant by an identifier to where we ourselves need a full identification to know what is being discussed. A useful answer won’t be one or the other, but a pairing that suits a particular circumstance and use case.

The Silence of Attributes

Filed under: Topic Maps,XML — Patrick Durusau @ 3:22 pm

I was reading Relax NG by Eric van der Vlist, when I ran into this wonderful explanation of the silence of attributes:

Attributes are generally difficult to extend. When choosing from among elements and attributes, people often base their choice on the relative ease of processing, styling, or transforming. Instead, you should probably focus on their extensibility.

Independent of any XML schema language, when you have an attribute in an instance document, you are pretty much stuck with it. Unless you replace it with an element, there is no way to extend it. You can’t add any child elements or attributes to it because it is designed to be a leaf node and to remain a leaf node. Furthermore, you can’t extend the parent element to include a second instance of an attribute with the same name. (Attrbutes with duplicate names are forbidden by XML 1.0.) You are thus making an impact not only on the extensibility of the attribute but also on the extensibility of the parent element.

Because attributes can’t be annotated with new attributes and because they can’t be duplicated, they can’t be localized like elements through duplication with different values of xml:lang attributes. Because attributes are more difficult to localize, you should avoid storing any text targeted at human consumers within attributes. You never know whether your application will become international. These attributes would make it more difficult to localize. (At page 200)

Let’s think of “localization” as “use a local identifier” and re-read that last paragraph again (with apologies to Eric):

Because attributes can’t be annotated with new attributes and because they can’t be duplicated, they can’t use local identifiers like elements through duplication with different values of xml:lang attributes. Because attributes are more difficult to localize, you should avoid storing any identifiers targeted at human consumers within attributes. You never know whether your application will become international. These attributes would make it more difficult to use local identifiers.

As a design principle, the use of attributes prevents us from “localizing” to an identifier that a user might recognize.

What is more, identifiers stand in the place of or evoke, the properties that we would list as being “how” we identified a subject, even though we happily use an identifier as a shorthand for that set of properties.

While we should be able to use identifiers for subjects, we should also be able to provide the properties we see those identifiers as representing.

May 28, 2015

XML Calabash 1.1.4

Filed under: XML,XProc — Patrick Durusau @ 8:17 pm

XML Calabash 1.1.4 by Norm Walsh.

XML Calabash implements XProc: An XML Pipeline Language.

Time to update again!

Writing this reminds me I owe Norm responses on comments. 😉 Coming!

May 27, 2015

Balisage 2015 Program Is Out!

Filed under: Conferences,Topic Maps,XML — Patrick Durusau @ 4:24 pm

Balisage 2015 Program

Tommie Usdin posted this message announcing the Balisage 2015 program:

I think this is an especially strong Balisage program with a good mix of theoretical and practical. The 2015 program includes case studies from journal publishing, regulatory compliance systems, and large-scale document systems; formatting XML for print and browser-based print formatting; visualizing XML structures and documents. Technical papers cover such topics as: MathML; XSLT; use of XML in government and the humanities; XQuery; design of authoring systems; uses of markup that vary from poetry to spreadsheets to cyber justice; and hyperdocument link management.

Good as far as it goes but a synopsis (omitting blurbs and debauchery events) of the program works better for me:

  • The art of the elevator pitch B. Tommie Usdin, Mulberry Technologies
  • Markup as index interface: Thinking like a search engine Mary Holstege, MarkLogic
  • Markup and meter: Using XML tools to teach a computer to think about versification David J. Birnbaum, Elise Thorsen, University of Pittsburgh
  • XML (almost) all the way: Experiences with a small-scale journal publishing system Peter Flynn, University College Cork
  • The state of MathML in K-12 educational publishing Autumn Cuellar, Design Science Jean Kaplansky, Safari Books Online
  • Diagramming XML: Exploring concepts, constraints and affordances Liam R. E. Quin, W3C
  • Spreadsheets – 90+ million end user programmers with no comment tracking or version control Patrick Durusau Sam Hunting
  • State chart XML as a modeling technique in web engineering Anne
    Brüggemann-Klein
    , Marouane Sayih, Zlatina Cheva, Technische Universität München
  • Implementing a system at US Patent and Trademark Office to fully automate the conversion of filing documents to XML Terrel Morris, US Patent and Trademark Office Mark Gross, Data Conversion Laboratory Amit Khare, CGI Federal
  • XML solutions for Swedish farmers: A case study Ari Nordström, Creative Words
  • XSDGuide — Automated generation of web interfaces from XML schemas: A case study for suspicious activity reporting Fabrizio Gotti, Université de Montréal Kevin Heffner, Pegasus Research & Technologies Guy Lapalme, Université de Montréal
  • Tricolor automata C. M. Sperberg-McQueen, Black Mesa Technologies; Technische Universität Darmstadt
  • Two from three (in XSLT) John Lumley, jωL Research / Saxonica
  • XQuery as a data integration language Hans-Jürgen Rennau, Traveltainment Christian Grün, BaseX
  • Smart content for high-value communications David White, Quark Software
  • Vivliostyle: An open-source, web-browser based, CSS typesetting engine Shinyu Murakami, Johannes Wilm, Vivliostyle
  • Panel discussion: Quality assurance in XML transformation
  • Comparing and diffing XML schemas Priscilla Walmsley, Datypic
  • Applying intertextual semantics to cyberjustice: Many reality checks for the price of one Yves Marcoux, Université de Montréal
  • UnderDok: XML structured attributes, change tracking, and the metaphysics of documents Claus Huitfeldt, University of Bergen, Norway
  • Hyperdocument authoring link management using Git and XQuery in service of an abstract hyperdocument management model applied to DITA hyperdocuments Eliot Kimber, Contrext
  • Extending the cybersecurity digital thread with XForms Joshua Lubell, National Institute of Standards and Technology
  • Calling things by their true names: Descriptive markup and the search for a perfect language C. M. Sperberg-McQueen, Black Mesa Technologies; Technische Universität Darmstadt

Now are you ready to register and make your travel arrangements?

Disclaimer: I have no idea why the presentation: Spreadsheets – 90+ million end user programmers with no comment tracking or version control is highlighted in your browser. Have you checked your router for injection attacks by the NSA? 😉

PS: If you are doing a one-day registration, the Spreadsheets presentation is Wednesday, August 12, 2015, 9:00 AM. Just saying.

April 29, 2015

[U.S.] House Member Data in XML

Filed under: Government,XML — Patrick Durusau @ 6:31 pm

User Guide and Data Dictionary. (In PDF)

From the Introduction:

The Office of the Clerk makes available membership lists for the U.S. House of Representatives. These lists are available in PDF format, and starting with the 114th Congress, the data is available in XML format. The document and data are available at http://clerk.house.gov.

For unknown reasons, the link does not appear as a hyperlink in the guide. http://clerk.house.gov.

Just as well because the link to the XML isn’t on that page anyway. Try: http://clerk.house.gov/xml/lists/MemberData.xml instead.

Looking forward to the day when all information generated by Congress being available in daily XML dumps.

March 21, 2015

Balisage submissions are due on April 17th

Filed under: Conferences,XML — Patrick Durusau @ 7:42 pm

Balisage submissions are due on April 17th!

Yeah, that’s what I thought when I saw the email from Tommie Usdin earlier this week!

Tommie writes:

Just a friendly reminder: Balisage submissions are due on April 17th! That’s just under a month.

Do you want to speak at Balisage? Participate in the pre-conference symposium on Cultural Heritage Markup? Then it is time to put some work in on your paper!

See the Call for Participations at:

http://www.balisage.net/Call4Participation.html

http://www.balisage.net/CulturalHeritage/index.html

Instructions for authors: http://www.balisage.net/authorinstructions.html

Do you need help with the mechanics of your Balisage submission? If we can help please send email to info@balisage.net

It can’t be the case that the deep learning, GPU toting AI folks have had all the fun this past year. After all, without data they would not have anything to be sexy about. Or is that with? Never really sure with those folks.

What I am sure about is that the markup folks at Balisage are poised to save Big Data from becoming Big Dark Data without any semantics.

But they can’t do it without your help! Will you stand by and let darkness cover all of Big Data or will you fight to preserve markup and the semantics it carries?

Sharpen your markup! Back to back, our transparency against the legions of darkness.

Well, it may not get that radical because Tommie is such a nice person but she has to sleep sometime. 😉 After she’s asleep, then we rumble.

Be there!

Turning the MS Battleship

Filed under: Interoperability,Microsoft,WWW,XML,XPath — Patrick Durusau @ 8:46 am

Improving interoperability with DOM L3 XPath by Thomas Moore.

From the post:

As part of our ongoing focus on interoperability with the modern Web, we’ve been working on addressing an interoperability gap by writing an implementation of DOM L3 XPath in the Windows 10 Web platform. Today we’d like to share how we are closing this gap in Project Spartan’s new rendering engine with data from the modern Web.

Some History

Prior to IE’s support for DOM L3 Core and native XML documents in IE9, MSXML provided any XML handling and functionality to the Web as an ActiveX object. In addition to XMLHttpRequest, MSXML supported the XPath language through its own APIs, selectSingleNode and selectNodes. For applications based on and XML documents originating from MSXML, this works just fine. However, this doesn’t follow the W3C standards for interacting with XML documents or exposing XPath.

To accommodate a diversity of browsers, sites and libraries wrap XPath calls to switch to the right implementation. If you search for XPath examples or tutorials, you’ll immediately find results that check for IE-specific code to use MSXML for evaluating the query in a non-interoperable way:

It seems like a long time ago that a relatively senior Microsoft staffer told me that turning a battleship like MS takes time. No change, however important, is going to happen quickly. Just the way things are in a large organization.

The important thing to remember is that once change starts, that too takes on a certain momentum and so is more likely to continue, even though it was hard to get started.

Yes, I am sure the present steps towards greater interoperability could have gone further, in another direction, etc. but they didn’t. Rather than complain about the present change for the better, why not use that as a wedge to push for greater support for more recent XML standards?

For my part, I guess I need to get a copy of Windows 10 on a VM so I can volunteer as a beta tester for full XPath (XQuery?/XSLT?) support in a future web browser. MS as a full XML competitor and possible source of open source software would generate some excitement in the XML community!

February 21, 2015

Introducing MicroXML

Filed under: XML — Patrick Durusau @ 10:11 am

Introducing MicroXML by Uche Ogbuji.

Uche took until now to post his slides from XML Prague 2013 so I’m excused for not posting about them sooner! 😉

Some resources to help get you started:

Introducing MicroXML (the movie, starring Uche Ogbuji)

Introducing MicroXML, Part 1: Explore the basic principles of MicroXML

Introducing MicroXML, Part 2: Process MicroXML with microxml-js

MicroXML Community Group (W3C)

MicroXML (2012 spec)

Abstract:

MicroXML is a subset of XML intended for use in contexts where full XML is, or is perceived to be, too large and complex. It has been designed to complement rather than replace XML, JSON and HTML. Like XML, it is a general format for making use of markup vocabularies rather than a specific markup vocabulary like HTML. This document provides a complete description of MicroXML.

If you have seen any of the recent XML work you will be glad someone is working on MicroXML.

Enjoy!

February 14, 2015

XML Calabash version 1.0.25

Filed under: XLink,XML,XProc — Patrick Durusau @ 7:57 pm

XML Calabash version 1.0.25 by Norm Walsh.

New release of Calabash as of 10 February 2015.

Updated to support XML Inclusions (XInclude) Version 1.1, which was a last call working draft on 16 December 2014.

Time to update your XML toolkit again!

February 13, 2015

XPath/XQuery/FO/XDM 3.1 Comments Filed!

Filed under: XML,XPath,XQuery — Patrick Durusau @ 8:00 pm

I did manage to file seventeen (17) comments today on the XPath/XQuery/FO/XDM 3.1 drafts!

I haven’t mastered bugzilla well enough to create an HTML list of them to paste in here but no doubt will do so over the weekend.

Remember these are NOT “bugs” until they are accepted by the working group as “bugs.” Think of them as being suggestions on my part where the drafts were unclear or could be made clearer in my view.

Did you remember to post comments?

I will try to get a couple of short things posted tonight but getting the comments in was my priority today.

January 21, 2015

Balisage: The Markup Conference 2015

Filed under: Conferences,XML,XML Schema,XPath,XProc,XQuery,XSLT — Patrick Durusau @ 8:48 pm

Balisage: The Markup Conference 2015 – There is Nothing As Practical As A Good Theory

Key dates:
– 27 March 2015 — Peer review applications due
– 17 April 2015 — Paper submissions due
– 17 April 2015 — Applications for student support awards due
– 22 May 2015 — Speakers notified
– 17 July 2015 — Final papers due
– 10 August 2015 — Symposium on Cultural Heritage Markup
– 11–14 August 2015 — Balisage: The Markup Conference

Bethesda North Marriott Hotel & Conference Center, just outside Washington, DC (I know, no pool with giant head, etc. Do you think if we ask nicely they would put one in? And change the theme of the decorations about every 30 feet in the lobby?)

Balisage is the premier conference on the theory, practice, design, development, and application of markup. We solicit papers on any aspect of markup and its uses; topics include but are not limited to:

  • Cutting-edge applications of XML and related technologies
  • Integration of XML with other technologies (e.g., content management, XSLT, XQuery)
  • Web application development with XML
  • Performance issues in parsing, XML database retrieval, or XSLT processing
  • Development of angle-bracket-free user interfaces for non-technical users
  • Deployment of XML systems for enterprise data
  • Design and implementation of XML vocabularies
  • Case studies of the use of XML for publishing, interchange, or archiving
  • Alternatives to XML
  • Expressive power and application adequacy of XSD, Relax NG, DTDs, Schematron, and other schema languages
  • Detailed Call for Participation: http://balisage.net/Call4Participation.html
    About Balisage: http://balisage.net/
    Instructions for authors: http://balisage.net/authorinstructions.html

    For more information: info@balisage.net or +1 301 315 9631

    I wonder if the local authorities realize the danger in putting that many skilled markup people so close the source of so much content? (Washington) With attendees sparking off against each other, who knows?, could see an accountable and auditable legislative and rule making document flow arise. There may not be enough members of Congress in town to smother it.

    The revolution may not be televised but it will be powered by markup and its advocates. Come join the crowd with the tools to make open data transparent.

    January 19, 2015

    XPath/XQuery/FO/XDM 3.1 Definitions – Deduped/Sorted/Some Comments! Version 0.1

    Filed under: XML,XPath,XQuery — Patrick Durusau @ 10:11 am

    My first set of the XPath/XQuery/FO/XDM 3.1 Definitions, deduped, sorted, along with some comments is now online!

    XPath, XQuery, XQuery and XPath Functions and Operators, XDM – 3.1 – Sorted Definitions Draft

    Let me emphasize this draft is incomplete and more comments are needed on the varying definitions.

    I have included all definitions, including those that are unique or uniform. This should help with your review of those definitions as well.

    I am continuing to work on this and other work products to assist in your review of these drafts.

    Reminder: Tentative deadline for comments at the W3C is 13 February 2015.

    January 16, 2015

    58 XML Specs Led the Big Parade!

    Filed under: Standards,XML — Patrick Durusau @ 5:01 pm

    Earlier this week I ferreted out most of the current XML specifications from the W3C site. I say “most” because I didn’t take the time to run down XML “related” standards such as SVG, etc. At some point I will spend the time to track down all the drafts, prior versions, and related materials.

    But, for today, I have packaged up the fifty-eight (58) current XML standards in 58XMLRecs.tar.gz.

    BTW, do realize that Extensible Stylesheet Language (XSL) Version 1.0 and XHTML™ Modularization 1.1 – Second Edition have table of contents only versions. I included the full HTML file versions in the package.

    You can use grep or other search utilities to search prior XML work for definitions, productions, etc.

    Do you remember the compilation of XML standards that used the old MS Help application? The file format was a variation on RTF. Ring any bells? Anything like that available now?

    January 15, 2015

    Bob DuCharme’s Treasure Trove

    Filed under: XML — Patrick Durusau @ 7:16 pm

    Bob DuCharme’s Treasure Trove

    OK, Bob’s name for it is:

    My involvement with RDF and XML technology:

    So I took the liberty of spicing it up a bit! 😉

    Seriously, I was trying to recall some half-remembered doggerel about xml:sort when I stumbled (literally) over this treasure trove of Bob’s writing on markup technologies.

    I have been a fan of Bob’s writing since his SGML CD book. You probably want something more recent for XML, etc., but it was a great book.

    Whether you have a specific issue or just want to browse some literate writing on markup languages, this collection is a good place to start.

    Enjoy!

    Corrected Definitions Lists for XPath/XQuery/etc.

    Filed under: Standards,XML,XPath,XQuery — Patrick Durusau @ 3:01 pm

    In my extraction of the definitions yesterday I produced files that had HTML <p> elements embedded in other HTML <p> elements.

    The corrected files are as follows:

    These lists are unsorted and the paragraphs with multiple definitions are repeated for each definition. Helps me spot where I have multiple definitions that may be followed by non-normative prose, applicable to one or more definitions.

    The XSLT code I used yesterday was incorrect:

    <xsl:for-each select=”//p/a[contains(@name, ‘dt’)]”>
    <p>
    <xsl:copy-of select=”ancestor::p”/>
    </p>
    </xsl:for-each>

    And results in:

    <p>
    <p>[<a name=”dt-expression-context” id=”dt-expression-context” title=”expression context” shape=”rect”>Definition</a>: The <b>expression
    context</b> for a given expression consists of all the information
    that can affect the result of the expression.]
    </p>
    </p>

    Which is both ugly and incorrect.

    When using xsl:copy-of for a p element, the surrounding p elements were unnecessary.

    Thus (correctly):

    &lt:xsl:for-each select=”//p/a[contains(@name, ‘dt’)]”>
    <xsl:copy-of select=”ancestor::p”/>
    </xsl:for-each>

    I reproduced the corrected definition files above. Apologies for any inconvenience.

    Work continues on the sorting and deduping.

    January 9, 2015

    Structural Issues in XPath/XQuery/XPath-XQuery F&O Drafts

    Filed under: Standards,W3C,XML,XPath,XQuery — Patrick Durusau @ 1:02 pm

    Apologies as I thought I was going to be further along in demonstrating some proofing techniques for XPath 3.1, XQuery 3.1, XPath and XQuery Functions and Operations 3.1 by today.

    Instead, I encountered structural issues that are common to all three drafts that I didn’t anticipate but that need to be noted before going further with proofing. I will be using sample material to illustrate the problems and will not always have a sample from all three drafts or even note every occurrence of the issues. They are too numerous for that treatment and it would be repetition for repetition’s sake.

    First, consider these passages from XPath 3.1, 1 Introduction:

    [Definition: XPath 3.1 operates on the abstract, logical structure of an XML document, rather than its surface syntax. This logical structure, known as the data model, is defined in [XQuery and XPath Data Model (XDM) 3.1].]

    [Definition: An XPath 3.0 Processor processes a query according to the XPath 3.0 specification.] [Definition: An XPath 2.0 Processor processes a query according to the XPath 2.0 specification.] [Definition: An XPath 1.0 Processor processes a query according to the XPath 1.0 specification.]

    1. Unnumbered Definitions – Unidentified Cross-References

    The first structural issue that you will note with the “[Definition…” material is that all such definitions are unnumbered and appear throughout all three texts. The lack of numbering means that it is difficult to refer with any precision to a particular definition. How would I draw your attention to the third definition of the second grouping? Searching for XPath 1.0 turns up 79 occurrences in XPath 3.1 so that doesn’t sound satisfactory. (FYI, “Definition” turns up 193 instances.)

    While the “Definitions” have anchors that allow them to be addressed by cross-references, you should note that the cross-references are text hyperlinks that have no identifier by which a reader can find the definition without using the hyperlink. That is to say when I see:

    A lexical QName with a prefix can be converted into an expanded QName by resolving its namespace prefix to a namespace URI, using the statically known namespaces. [These are fake links to draw your attention to the text in question.]

    The hyperlinks in the original will take me to various parts of the document where these definitions occur, but if I have printed the document, I have no clue where to look for these definitions.

    The better practice is to number all the definitions and since they are all self-contained, to put them in a single location. Additionally, all interlinear references to those definitions (or other internal cross-references) should have a visible reference that enables a reader to find the definition or cross-reference, without use of an internal hyperlink.

    Example:

    A lexical QName Def-21 with a prefix can be converted into an expanded QName Def-19 by resolving its namespace prefix to a namespace URI, using the statically known namespaces. Def-99 [These are fake links to draw your attention to the text in question. The Def numbers are fictitious in this example. Actual references would have the visible definition numbers assigned to the appropriate definition.]

    2. Vague references – $N versus 5000 x $N

    Another problem I encountered was what I call “vague references,” or less generously, $N versus 5,000 x $N.

    For example:

    [Definition: An atomic value is a value in the value space of an atomic type, as defined in [XML Schema 1.0] or [XML Schema 1.1].] [Definition: A node is an instance of one of the node kinds defined in [XQuery and XPath Data Model (XDM) 3.1].

    Contrary to popular opinion, standards don’t write themselves and every jot and tittle was placed in a draft at the expense of someone’s time and resources. Let’s call that $N.

    In the example, you and I both know somewhere in XML Schema 1.0 and XML Schema 1.1 that the “value space of the atomic type” is defined. The same is true for nodes and XQuery and XPath Data Model (XDM) 3.1. But where? The authors of these specifications could insert that information at a cost of $N.

    What is the cost of not inserting that information in the current drafts? I estimate the number of people interested in reading these drafts to be 5,000. So each of those person will have to find the same information omitted from these specifications, which is a cost of 5,000 x $N. In terms of convenience to readers and reducing their costs of reading these specifications, references to exact locations in other materials are a necessity.

    In full disclosure, I have no more or less reason to think 5,000 people are interested in these drafts than the United States has for positing the existence of approximately 5,000 terrorists in the world. I suspect the number of people interested in XML is actually higher but the number works to make the point. Editors can either convenience themselves or their readers.

    Vague references are also problematic in terms of users finding the correct reference. The citation above, [XML Schema 1.0] for “value space of an atomic type,” refers to all three parts of XML Schema 1.0.

    Part 1, at 3.14.1 (non-normative) The Simple Type Definition Schema Component, has the only reference to “atomic type.”

    Part 2, actually has “0” hits for “atomic type.” True enough, “2.5.1.1 Atomic datatypes” is likely the intended reference but that isn’t what the specification says to look for.

    Bottom line is that any external reference needs to include in the inline citation the precise internal reference in the work being cited. If you want to inconvenience readers by pointing to internal bibliographies rather than online HTML documents, where available, that’s an editorial choice. But in any event, for every external reference, give the internal reference in the work being cited.

    Your readers will appreciate it and it could make your work more accurate as well.

    3. Normative vs. Non-Normative Text

    Another structural issue which is important for proofing is the distinction between normative and non-normative text.

    In XPath 3.1, still in the Introduction we read:

    This document normatively defines the static and dynamic semantics of XPath 3.1. In this document, examples and material labeled as “Note” are provided for explanatory purposes and are not normative.

    OK, and under 2.2.3.1 Static Analysis Phase (XPath 3.1), we find:

    Examples of inferred static types might be:

    Which is followed by a list so at least we know where the examples end.

    However, there are numerous cases of:

    For example, with the expression substring($a, $b, $c), $a must be of type xs:string (or something that can be converted to xs:string by the function calling rules), while $b and $c must be of type xs:double. [also in 2.2.3.1 Static Analysis Phase (XPath 3.1)]

    So, is that a non-normative example? If so, what is the nature of the “must” that occurs in it? Is that normative?

    Moreover, the examples (XPath 3.1 has 283 occurrences of that term, XQuery has 455 occurrences of that term, XPath and XQuery Functions and Operators have 537 occurrences of that term) are unnumbered, which makes referencing the examples by other materials very imprecise and wordy. For the use of authors creating secondary literature on these materials, to promote adoption, etc., number of all examples should be the default case.

    Oh, before anyone protests that XPath and XQuery Functions and Operators has separated its examples into lists, that is true but only partially. There remain 199 occurrences of “for example” which do not occur in lists. Where lists are used, converting to numbered examples should be trivial. The elimination of “for example” material may be more difficult. Hard to say without a good sampling of the cases.

    Conclusion:

    As I said at the outset, apologies for not reaching more substantive proofing techniques but structural issues are important for the readability and usability of specifications for readers. Being correct and unreadable isn’t a useful goal.

    It may seem like some of the changes I suggest are a big “ask” this late in the processing of these specifications. If this were a hand edited document, I would quickly agree with you. But it’s not. Or at least it shouldn’t be. I don’t know where the source is held but the HTML you read is an generated artifact.

    Gathering and numbering the definitions and inserting those numbers into the internal cross-references are a matter of applying a different style sheet to the source. Fixing the vague references and unnumbered example texts would take more editorial work but readers would greatly benefit from precise references and a clear separation of normative from non-normative text.

    I will try again over the weekend to reach aids for substantive proofing on these drafts. With luck, I will return to these drafts on Monday of next week (12 January 2014).

    January 5, 2015

    Redefining RFC 2119? Danger! Danger! Will Robinson!

    Filed under: Standards,W3C,XML,XPath,XQuery — Patrick Durusau @ 3:43 pm

    I’m lagging behind in reading XQuery 3.1: An XML Query Language, XML Path Language (XPath) 3.1, and, XPath and XQuery Functions and Operators 3.1 in order to comment by 13 February 2015.

    In order to catch up this past weekend I started trying to tease these candidate recommendations apart to make them easier to proof. One of the things I always do I check for key word conformance language and that means, outside of ISO, RFC 2119.

    I was reading XPath and XQuery Functions and Operators 3.1 (herein Functions and Operators) when I saw:

    1.1 Conformance

    The Functions and Operators specification is intended primarily as a component that can be used by other specifications. Therefore, Functions and Operators relies on specifications that use it (such as [XML Path Language (XPath) 3.1], [XQuery 3.1: An XML Query Language], and potentially future versions of XSLT) to specify conformance criteria for their respective environments.

    That works. You have a normative document of definitions, etc., and some other standard cites those definitions and supplies the must,should, may according to RFC 2119. Not common but that works.

    But then I started running scripts for usage of key words and I found in Functions and Operators:

    1.6.3 Conformance terminology

    [Definition] may

    Conforming documents and processors are permitted to, but need not, behave as described.

    [Definition] must

    Conforming documents and processors are required to behave as described; otherwise, they are either non-conformant or else in error.

    Thus the title: Redefining RFC 2119? Danger! Danger! Will Robinson!

    RFC 2119 reads in part:

    1. MUST This word, or the terms “REQUIRED” or “SHALL”, mean that the definition is an absolute requirement of the specification.

    5. MAY This word, or the adjective “OPTIONAL”, mean that an item is truly optional. One vendor may choose to include the item because a particular marketplace requires it or because the vendor feels that it enhances the product while another vendor may omit the same item. An implementation which does not include a particular option MUST be prepared to interoperate with another implementation which does include the option, though perhaps with reduced functionality. In the same vein an implementation which does include a particular option MUST be prepared to interoperate with another implementation which does not include the option (except, of course, for the feature the option provides.)

    6. Guidance in the use of these Imperatives

    Imperatives of the type defined in this memo must be used with care and sparingly. In particular, they MUST only be used where it is actually required for interoperation or to limit behavior which has potential for causing harm (e.g., limiting retransmisssions) For example, they must not be used to try to impose a particular method on implementors where the method is not required for interoperability.

    First, the referencing of RFC 2119 is standard practice at the W3C, at least with regard to XML specifications. I wanted to have more than personal experience to cite so I collected the fifty-eight current XML specifications and summarize them in the list at the end of this post.

    Of the fifty-nine (59) current XML specifications (there may be others, the W3C has abandoned simply listing its work without extraneous groupings), fifty-two of the standards cite and follow RFC 2119. Three of the remaining seven (7) fail to cite RFC due to errors in editing.

    The final four (4) as it were that don’t cite RFC 2119 are a good illustration of how errors get perpetuated from one standard to another.

    The first W3C XML specification to not cite RFC 2119 was: Extensible Markup Language (XML) 1.0 (Second Edition) where it reads in part:

    1.2 Terminology

    may

    [Definition: Conforming documents and XML processors are permitted to but need not behave as described.]

    must

    [Definition: Conforming documents and XML processors are required to behave as described; otherwise they are in error. ]

    The definitions of must and may were ABANDONED in Extensible Markup Language (XML) 1.0 (Third Edition), which simply dropped those definitions and instead reads in part:

    1.2 Terminology

    The terminology used to describe XML documents is defined in the body of this specification. The key words must, must not, required, shall, shall not, should, should not, recommended, may, and optional, when emphasized, are to be interpreted as described in [IETF RFC 2119].

    The exclusive use of RFC 2119 continues through Extensible Markup Language (XML) 1.0 (Fourth Edition) to the current Extensible Markup Language (XML) 1.0 (Fifth Edition)

    However, as is often said, whatever good editing we do is interred with us and any errors we make live on.

    Before the abandonment of attempts to define may and must appeared in XML 3rd edition, XML Schema Part 1: Structures Second Edition and XML Schema Part 2: Datatypes Second Edition cite XML 2nd edition as their rationale for defining may and must. That error has never been corrected.

    Which brings us to W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes which is the last W3C XML specification to not cite RFC 2119.

    XSD 1.1 Part 2 reads in part, under Appendix I Changes since version 1.0, I.4 Other Changes:

    The definitions of must, must not, and ·error· have been changed to specify that processors must detect and report errors in schemas and schema documents (although the quality and level of detail in the error report is not constrained).

    The problem being XML Schema Part 2: Datatypes Second Edition

    relies upon XML Schema Part 2: Datatypes Second Edition which cites Extensible Markup Language (XML) 1.0 (Second Edition) as the reason for redefining the terms may and must.

    The redefining of may and must relies upon language in a superceded version of the XML standard. Language that was deleted ten (10) years ago from the XML standard.

    If you have read this far, you have a pretty good guess that I am going to suggest that XPath and XQuery Functions and Operators 3.1 drop the attempt to redefine terms that appear in RFC 2119.

    First, redefining widely used terms for conformance is clearly a bad idea. Do you mean RFC2119 must or do you mean and F&O must? Clearly different. If a requirement has an RFC2119 must, my application either conforms or fails. If a requirement has an F&O must, my application may simple be in error. All the time. Is that useful?

    Second, by redefining must, we lose the interoperability aspects as define by RFC2119 for all uses of must. Surely interoperability is a goal of Functions and Operators. Yes?

    Third, the history of redefining may and must at the W3C shows (to me) the perpetuation of an error long beyond its correction date. It’s time to put an end to redefining may and must.

    PS: Before you decide you “know” the difference in upper and lower case key words from RFC 2119, take a look at: RFC Editorial Guidelines and Procedures, Normative References to RFC 2119. Summary, UPPER CASE is normative, lower case is “a necessary logical relationship.”

    PPS: Tracking this error down took longer than expected so it will be later this week before I have anything that may help with proofing the specifications.


    XML Standards Consulted in preparation of this post. Y = Cites RFC 2119, N = Does not cite RFC 2119.

    January 3, 2015

    XML Prague Sponsoring for Students

    Filed under: Conferences,XML — Patrick Durusau @ 1:53 pm

    le-tec XML Tech posted a tweet today saying:

    We are sponsoring the full #xmlprague pass and accommodation for up to 5 students. xmlprague.cz Please apply to letexml@le-tex.de

    XML Prague 2015 is February 13-15 2015 so there isn’t a lot of time to spare!

    The schedule reads like a Who’s Who in XML, including Michael Kay speaking on parallel processing in XSLT. That alone would be worth the trip to Prague!

    If you are a student, apply for sponsoring. If you’re not a student, online registration closes February 9, 24:00 CET. Plus you need to get plane reservations, hotel, etc. Don’t delay!

    January 2, 2015

    Early English Books Online – Good News and Bad News

    Early English Books Online

    The very good news is that 25,000 volumes from the Early English Books Online collection have been made available to the public!

    From the webpage:

    The EEBO corpus consists of the works represented in the English Short Title Catalogue I and II (based on the Pollard & Redgrave and Wing short title catalogs), as well as the Thomason Tracts and the Early English Books Tract Supplement. Together these trace the history of English thought from the first book printed in English in 1475 through to 1700. The content covers literature, philosophy, politics, religion, geography, science and all other areas of human endeavor. The assembled collection of more than 125,000 volumes is a mainstay for understanding the development of Western culture in general and the Anglo-American world in particular. The STC collections have perhaps been most widely used by scholars of English, linguistics, and history, but these resources also include core texts in religious studies, art, women’s studies, history of science, law, and music.

    Even better news from Sebastian Rahtz Sebastian Rahtz (Chief Data Architect, IT Services, University of Oxford):

    The University of Oxford is now making this collection, together with Gale Cengage’s Eighteenth Century Collections Online (ECCO), and Readex’s Evans Early American Imprints, available in various formats (TEI P5 XML, HTML and ePub) initially via the University of Oxford Text Archive at http://www.ota.ox.ac.uk/tcp/, and offering the source XML for community collaborative editing via Github. For the convenience of UK universities who subscribe to JISC Historic Books, a link to page images is also provided. We hope that the XML will serve as the base for enhancements and corrections.

    This catalogue also lists EEBO Phase 2 texts, but the HTML and ePub versions of these can only be accessed by members of the University of Oxford.

    [Technical note]
    Those interested in working on the TEI P5 XML versions of the texts can check them out of Github, via https://github.com/textcreationpartnership/, where each of the texts is in its own repository (eg https://github.com/textcreationpartnership/A00021). There is a CSV file listing all the texts at https://raw.githubusercontent.com/textcreationpartnership/Texts/master/TCP.csv, and a simple Linux/OSX shell script to clone all 32853 unrestricted repositories at https://raw.githubusercontent.com/textcreationpartnership/Texts/master/cloneall.sh

    Now for the BAD NEWS:

    An additional 45,000 books:

    Currently, EEBO-TCP Phase II texts are available to authorized users at partner libraries. Once the project is done, the corpus will be available for sale exclusively through ProQuest for five years. Then, the texts will be released freely to the public.

    Can you guess why the public is barred from what are obviously public domain texts?

    Because our funding is limited, we aim to key as many different works as possible, in the language in which our staff has the most expertise.

    Academic projects are supposed to fund themselves and be self-sustaining. When anyone asks about sustainability of an academic project, ask them when the last time your countries military was “self sustaining?” The U.S. has spent $2.6 trillion on a “war on terrorism” and has nothing to show for it other than dead and injured military personnel, perversion of budgetary policies, and loss of privacy on a world wide scale.

    It is hard to imagine what sort of life-time access for everyone on Earth could be secured for less than $1 trillion. No more special pricing and contracts if you are in countries A to Zed. Eliminate all that paperwork for publishers and to access all you need is a connection to the Internet. The publishers would have a guaranteed income stream, less overhead from sales personnel, administrative staff, etc. And people would have access (whether used or not) to educate themselves, to make new discoveries, etc.

    My proposal does not involve payments to large military contractors or subversion of legitimate governments or imposition of American values on other cultures. Leaving those drawbacks to one side, what do you think about it otherwise?

    December 21, 2014

    New Open XML PowerTool Cmdlet simplifies retrieval of document metrics

    Filed under: Microsoft,XML — Patrick Durusau @ 8:43 pm

    New Open XML PowerTool Cmdlet simplifies retrieval of document metrics by Doug Mahugh.

    From the post:

    It’s been a good year for Open XML developers. The release of the Open XML SDK as an open source project back in June was well-received by the community, and enabled contributions such as the makefile to automate use of the SDK on Mono and a Visual Studio project for the SDK. Project leader Eric White has worked to refine and improve the testing process, and here at MS Open Tech we’ve been working with our China team to get the word out, starting with mirroring the repo to GitCafe for ease of access in China.

    Today there’s another piece of good news for Open XML developers: Eric White has added a new Get-DocxMetrics Cmdlet to the Open XML PowerTools, the latest step in a developer-focused reengineering of the PowerTools to make them even more flexible and useful to Open XML developers. As Eric explains in his blog post on the Open XML Developer site:

    My latest foray is a new Cmdlet, Get-DocxMetrics, which returns a lot of useful information about a WordprocessingML document. A summary of the information it returns for a document:

    • The style hierarchy – styles can inherit from other styles, and it is helpful to know what styles are defined in a document.
    • The content control hierarchy. We can examine the hierarchy, and design an XSD schema to validate them.
    • The list of languages used in a document, such as en-US, fr-FR, and so on.
    • Whether a document contains tracked revisions, text boxes, complex fields, simple fields, altChunk content, tables, hyperlinks, legacy frames, ActiveX controls, sub documents, references to null images, embedded spreadsheets, document protection, multi-font runs, the list of numbering formats used, and more.
    • Metrics on how large the document is, including element counts, average paragraph lengths, run count, zero length text elements, ASCII character counts, complex script character counts, East Asia character counts, and the count of runs of each of the variety of characters.

    Get-DocxMetrics sounds like a viable way to generate statistics on a collection of OpenXML files to determine what features of OpenXML are actually in use by an enterprise or government. That would make creation of specialized tools for such entities a far more certain proposition.

    Output from such analysis would be a nice input into a topic map for purposes of mapping usage to other formats. What maps?, what misses?, etc.

    Looking forward to hearing more about this tool in the new year!

    December 19, 2014

    XProc 2.0: An XML Pipeline Language

    Filed under: XML,XProc — Patrick Durusau @ 12:07 pm

    XProc 2.0: An XML Pipeline Language W3C First Public Working Draft 18 December 2014

    Abstract:

    This specification describes the syntax and semantics of XProc 2.0: An XML Pipeline Language, a language for describing operations to be performed on documents.

    An XML Pipeline specifies a sequence of operations to be performed on documents. Pipelines generally accept documents as input and produce documents as output. Pipelines are made up of simple steps which perform atomic operations on documents and constructs similar to conditionals, iteration, and exception handlers which control which steps are executed.

    For your proofing responses:

    Please report errors in this document by raising issues on the specification
    repository
    . Alternatively, you may report errors in this document to the public mailing list public-xml-processing-model-comments@w3.org (public archives are available).

    First drafts always need a close reading for omissions and errors. However, after looking at the editors of XProc 2.0, you aren’t likely to find any “cheap” errors. Makes proofing all the more fun.

    Enjoy!

    XQuery, XPath, XQuery/XPath Functions and Operators 3.1

    Filed under: XML,XPath,XQuery — Patrick Durusau @ 11:56 am

    XQuery, XPath, XQuery/XPath Functions and Operators 3.1 were published on 18 December 2014 as a call for implementation of these specifications.

    The changes most often noted were the addition of capabilities for maps and arrays. “Support for JSON” means sections 17.4 and 17.5 of XPath and XQuery Functions and Operators 3.1.

    XQuery 3.1 and XPath 3.1 depend on XPath and XQuery Functions and Operators 3.1 for JSON support. (Is there no acronym for XPath and XQuery Functions and Operators? Suggest XF&O.)

    For your reading pleasure:

    XQuery 3.1: An XML Query Language

      3.10.1 Maps.

      3.10.2 Arrays.

    XML Path Language (XPath) 3.1

    1. 3.11.1 Maps
    2. 3.11.2 Arrays

    XPath and XQuery Functions and Operators 3.1

    1. 17.1 Functions that Operate on Maps
    2. 17.3 Functions that Operate on Arrays
    3. 17.4 Conversion to and from JSON
    4. 17.5 Functions on JSON Data

    Hoping that your holiday gifts include a large box of highlighters and/or a box of red pencils!

    Oh, these specifications will “…remain as Candidate Recommendation(s) until at least 13 February 2015. (emphasis added)”

    Less than two months so read quickly and carefully.

    Enjoy!

    I first saw this in a tweet by Jonathan Robie.

    December 7, 2014

    Overlap and the Tree of Life

    Filed under: Bioinformatics,Biology,XML — Patrick Durusau @ 9:43 am

    I encountered a wonderful example of “overlap” in the markup sense today while reading about resolving conflicts in constructing a comprehensive tree of life.

    overlap and the tree of life

    The authors use a graph database which allows them to study various hypotheses on the resolutions of conflicts.

    Their graph database, opentree-treemachine, is available on GitHub, https://github.com/OpenTreeOfLife/treemachine, as is the source to all the project’s software, https://github.com/OpenTreeOfLife.

    There’s a thought for Balisage 2015. Is the processing of overlapping markup a question of storing documents with overlapping markup in graph databases and then streaming the non-overlapping results of a query to an XML processor?

    And visualizing overlapping results or alternative resolutions to overlapping results via a graph database.

    The question of which overlapping syntax to use becoming a matter of convenience and the amount of information captured, as opposed to attempts to fashion syntax that cheats XML processors and/or developing new means for processing XML.

    Perhaps graph databases can make overlapping markup in documents the default case just as overlap is the default case in documents (single tree documents being rare outliers).

    Remind me to send a note to Michael Sperberg-McQueen and friends about this idea.

    BTW, the details of the article that lead me down this path:

    Synthesis of phylogeny and taxonomy into a comprehensive tree of life by Steven A. Smith, et al.

    Abstract:

    Reconstructing the phylogenetic relationships that unite all biological lineages (the tree of life) is a grand challenge of biology. However, the paucity of readily available homologous character data across disparately related lineages renders direct phylogenetic inference currently untenable. Our best recourse towards realizing the tree of life is therefore the synthesis of existing collective phylogenetic knowledge available from the wealth of published primary phylogenetic hypotheses, together with taxonomic hierarchy information for unsampled taxa. We combined phylogenetic and taxonomic data to produce a draft tree of life—the Open Tree of Life—containing 2.3 million tips. Realization of this draft tree required the assembly of two resources that should prove valuable to the community: 1) a novel comprehensive global reference taxonomy, and 2) a database of published phylogenetic trees mapped to this common taxonomy. Our open source framework facilitates community comment and contribution, enabling a continuously updatable tree when new phylogenetic and taxonomic data become digitally available. While data coverage and phylogenetic conflict across the Open Tree of Life illuminates significant gaps in both the underlying data available for phylogenetic reconstruction and the publication of trees as digital objects, the tree provides a compelling starting point from which we can continue to improve through community contributions. Having a comprehensive tree of life will fuel fundamental research on the nature of biological diversity, ultimately providing up-to-date phylogenies for downstream applications in comparative biology, ecology, conservation biology, climate change studies, agriculture, and genomics.

    A project with a great deal of significance beyond my interest in overlap in markup documents. Highly recommended reading. The resolution of conflicts in trees here involves an evaluation of data, much as you would for merging in a topic map.

    Unlike the authors, I see no difficulty in super trees being rich enough with the underlying data to permit direct use of trees for resolution of conflicts. But you would have to design the trees from the start with those capabilities or have topic map like merging capabilities so you are not limited by early and necessarily preliminary data design decisions.

    Enjoy!

    I first saw this in a tweet by Ross Mounce.

    November 18, 2014

    MarkLogic® 8…

    Filed under: MarkLogic,RDF,XML — Patrick Durusau @ 4:06 pm

    MarkLogic® 8 Evolves Database Technology to Solve Heterogeneous Data Integration Problems with the Power of Search, Semantics and Bitemporal Features All in One System

    From the post:

    MarkLogic Corporation, the leading Enterprise NoSQL database platform provider, today announced the availability of MarkLogic® Version 8 Early Access Edition. MarkLogic 8 brings together advanced search, semantics, bitemporal and native JavaScript support into one powerful, agile and trusted database platform. Companies can now:

    • Get better answers faster through integrated search and query of all of their data, metadata, and relationships, regardless of the data type or source;
    • Lower costs and increase agility by easily integrating heterogeneous data, including relational, unstructured, and richly structured data, across silos and at massive scale;
    • Rapidly build production-ready applications in weeks versus months or years to address the needs of the business or organization.

    For enterprise customers who value agility but can’t compromise on resiliency, MarkLogic software is the only database platform that integrates Google-like search with rich query and semantics into an intelligent and extensible data layer that works equally well in a data center or in the cloud. Unlike other NoSQL solutions, MarkLogic provides ACID transactions, HA, DR, and other hardened features that enterprises require, along with the scalability and agility they need to accelerate their business.

    “As more complex data, much of it semi-structured, becomes increasingly important to businesses’ daily operations, enterprises are realizing that they must look beyond relational databases to help them understand, integrate, and manage all of their data, deriving maximum value in a simple, yet sophisticated manner,” said Carl Olofson, research vice president at IDC. “MarkLogic has a history of bringing advanced data management technology to market and many of their customers and partners are accustomed to managing complex data in an agile manner. As a result, they have a more mature and creative view of how to manage and use data than do mainstream database users. MarkLogic 8 offers some very advanced tools and capabilities, which could expand the market’s definition of enterprise database technology.”

    I’m not in the early release program but if you are, heads up!

    By “semantics,” MarkLogic means RDF triples and the ability to query those triples with text, values, etc.

    Since we can all see triples, text and values with different semantics, your semantic mileage with MarkLogic may vary greatly.

    June 2, 2014

    HTML5 vs. XML: War?

    Filed under: HTML5,XML — Patrick Durusau @ 3:11 pm

    I stole part of the title from a tweet by Deborah A. Lapeyre that reads:

    HTML5 and XML: War? Snub fest? Harmony? How should they interact? pre-Balisage 1-day Symposium. Come be heard! https://www.balisage.net/HTML5-XML/index.html

    As you will gather from the tweet, Balisage is having a one day pre-conference meeting on HTML5 and XML. From the Symposium page:

    Despite a decade of efforts dedicated to making XML the markup language of the Web, today it is HTML5 that has taken on that role. While HTML5 can in part be made to work with an XML syntax, reliance on that feature is rare compared to use of HTML5’s own syntax.

    Over the years, the competition between these two approaches has led to animosity and frustration. But both XML and HTML5 are now clearly here to stay, and with the upcoming standardisation of HTML5 in 2014 it is now time to take stock and see how both technologies — and both communities — can coöperate constructively.

    There are many environments in which these two markup languages are brought to interact. Additionally, there is much that they can learn from one another. We are looking forward to sharing experiences and ideas that bring the two together.

    Does HTML 5 have the role of markup language of the Web?

    As far as layout engines, you would have to say “partial support” for HTML5 at best.

    And the number I was hearing last year was 10% of the Web using HTML5. Have you heard differently?

    I’m sure the W3C is absolutely certain that HTML5 is the very thing for the Web but remember it wasn’t all that long ago that they abandoned their little red RDF wagon to its own fate. With enough money you can promote anything for more than a decade. Adoption, well, that’s something else entirely.

    For me the obvious tip-off about HTML5 came from its description at Wikipedia HTML5:

    It includes detailed processing models to encourage more interoperable implementations;

    Anyone who needs a “detailed processing models” for “interoperability” doesn’t understand the nature of markup languages.

    Markup languages capture the structure of documents in order for documents to be interchanged between applications. So long as an application can parse the document into its internal model and deliver the expected result to its user, then the document is “interoperable” between the applications.

    What the W3C is attempting to hide behind its processing models is forcing users to view materials as defined by others. That is they want to take away your right to view and/or process a document as you want. Such as avoiding advertising or reformatting a document after removal of advertising.

    Do you remember Rocky Horror Picture Show? And Janet’s comment about Rocky was “Well, I don’t like men with too many muscles.”

    And Dr. Frank N. Furter’s response?

    I didn’t make him for you!

    Same can be said for HTML5. They didn’t make it for you.

    If you think differently, bring your false gods to the HTML5 and XML: Mending Fences: A Balisage pre-conference symposium. Stay for the conference. It will give you time to find a new set of false gods.

    « Newer PostsOlder Posts »

    Powered by WordPress