Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 19, 2017

An Honest Soul At The W3C? EME/DRM Secret Ballot

Filed under: Cybersecurity,DRM,Electronic Frontier Foundation,Leaks,Security,W3C — Patrick Durusau @ 9:49 am

Billions of current and future web users have been assaulted and robbed in what Jeff Jaffe (W3C CEO) calls a “respectful debate.” Reflections on the EME Debate.

Odd sense of “respectful debate.”

A robber demands all of your money and clothes, promises to rent you clothes to get home, but won’t tell you how to make your own clothes. You are now and forever a captive of the robber. (That’s a lay persons summary but accurate account of what the EME crowd wanted and got.)

Representatives for potential victims, the EFF and others, pointed out the problems with EME at length, over years of debate. The response of the robbers: “We want what we want.

Consistently, for years, the simple minded response of EME advocates continued to be: “We want what we want.

If you think I’m being unkind to the EME advocates, consider the language of the Disposition of Comments for Encrypted Media Extensions and Director’s decision itself:


Given that there was strong support to initially charter this work (without any mention of a covenant) and continued support to successfully provide a specification that meets the technical requirements that were presented, the Director did not feel it appropriate that the request for a covenant from a minority of Members should block the work the Working Group did to develop the specification that they were chartered to develop. Accordingly the Director overruled these objections.

The EME lacks a covenant protecting researchers and others from anti-circumvention laws, enabling continued research on security and other aspects of EME implementations.

That covenant was not in the original charter, the director’s “(without any mention of a covenant),” aka, “We want what we want.

There wasn’t ever any “respectful debate,” but rather EME supporters repeating over and over again, “We want what we want.

A position which prevailed, which bring me to the subject of this post. A vote, a secret vote was conducted by the W3C seeking support for the Director’s cowardly and self-interested support for EME, the result of which as been reported as:


Though some have disagreed with W3C’s decision to take EME to recommendation, the W3C determined that the hundreds of millions of users who want to watch videos on the Web, some of which have copyright protection requirements from their creators, should be able to do so safely and in a Web-friendly way. In a vote by Members of the W3C ending mid September, 108 supported the Director’s decision to advance EME to W3C Recommendation that was appealed mid-July through the appeal process, while 57 opposed it and 20 abstained. Read about reflections on the EME debate, in a Blog post by W3C CEO Jeff Jaffe.

(W3C Publishes Encrypted Media Extensions (EME) as a W3C Recommendation)

One hundred and eight members took up the cry of “We want what we want.” rob billions of current and future web users. The only open question being who?

To answer that question, the identity of these robbers, I posted this note to Jeff Jaffe:

Jeff,

I read:

***

In a vote by Members of the W3C ending mid September, 108 supported the Director’s decision to advance EME to W3C Recommendation that was appealed mid-July through the appeal process, while 57 opposed it and 20 abstained.

***

at: https://www.w3.org/2017/09/pressrelease-eme-recommendation.html.en

But I can’t seem to find a link to the vote details, that is a list of members and their vote/abstention.

Can you point me to that link?

Thanks!

Hope you are having a great week!

Patrick

It didn’t take long for Jeff to respond:

On 9/19/2017 9:38 AM, Patrick Durusau wrote:
> Jeff,
>
> I read:
>
> ***
>
> In a vote by Members of the W3C ending mid September, 108 supported the
> Director’s decision to advance EME to W3C Recommendation that was
> appealed mid-July through the appeal process, while 57 opposed it and 20
> abstained.
>
> ***
>
> at: https://www.w3.org/2017/09/pressrelease-eme-recommendation.html.en
>
> But I can’t seem to find a link to the vote details, that is a list of
> members and their vote/abstention.
>
> Can you point me to that link?

It is long-standing process not to release individual vote details publicly.

I wonder about a “long-standing process” for the only vote on an appeal in W3C history but there you have it, the list of robbers isn’t public. No need to search the W3C website for it.

If there is an honest person at the W3C, a person who stands with the billions of victims of this blatant robbery, then we will see a leak of the EME vote.

If there is no leak of the EME vote, that is a self-comment on the staff of the W3C.

Yes?

PS: Kudos to the EFF and others for delaying EME this long but the outcome was never seriously in question. Especially in organizations where continued membership and funding are more important than the rights of individuals.

EME can only be defeated by action in the trenches as it were, depriving its advocates of any perceived benefit and imposing ever higher costs upon them.

You do have your marker pens and sticky tape ready. Yes?

April 18, 2017

XSL Transformations (XSLT) Version 3.0 (Proposed Recommendation 18 April 2017)

Filed under: W3C,XML,XSLT — Patrick Durusau @ 6:53 pm

XSL Transformations (XSLT) Version 3.0 (Proposed Recommendation 18 April 2017)

Michael Kay tweeted today:

XSLT 3.0 is a Proposed Recommendation: https://www.w3.org/TR/xslt-30/ It’s taken ten years but we’re nearly there!

Congratulations to Michael and the entire team!

What’s new?

A major focus for enhancements in XSLT 3.0 is the requirement to enable streaming of source documents. This is needed when source documents become too large to hold in main memory, and also for applications where it is important to start delivering results before the entire source document is available.

While implementations of XSLT that use streaming have always been theoretically possible, the nature of the language has made it very difficult to achieve this in practice. The approach adopted in this specification is twofold: it identifies a set of restrictions which, if followed by stylesheet authors, will enable implementations to adopt a streaming mode of operation without placing excessive demands on the optimization capabilities of the processor; and it provides new constructs to indicate that streaming is required, or to express transformations in a way that makes it easier for the processor to adopt a streaming execution plan.

Capabilities provided in this category include:

  • A new xsl:source-document instruction, which reads and processes a source document, optionally in streaming mode;
  • The ability to declare that a mode is a streaming mode, in which case all the template rules using that mode must be streamable;
  • A new xsl:iterate instruction, which iterates over the items in a sequence, allowing parameters for the processing of one item to be set during the processing of the previous item;
  • A new xsl:merge instruction, allowing multiple input streams to be merged into a single output stream;
  • A new xsl:fork instruction, allowing multiple computations to be performed in parallel during a single pass through an input document.
  • Accumulators, which allow a value to be computed progressively during streamed processing of a document, and accessed as a function of a node in the document, without compromise to the functional nature of the XSLT language.

A second focus for enhancements in XSLT 3.0 is the introduction of a new mechanism for stylesheet modularity, called the package. Unlike the stylesheet modules of XSLT 1.0 and 2.0 (which remain available), a package defines an interface that regulates which functions, variables, templates and other components are visible outside the package, and which can be overridden. There are two main goals for this facility: it is designed to deliver software engineering benefits by improving the reusability and maintainability of code, and it is intended to streamline stylesheet deployment by allowing packages to be compiled independently of each other, and compiled instances of packages to be shared between multiple applications.

Other significant features in XSLT 3.0 include:

  • An xsl:evaluate instruction allowing evaluation of XPath expressions that are dynamically constructed as strings, or that are read from a source document;
  • Enhancements to the syntax of patterns, in particular enabling the matching of atomic values as well as nodes;
  • An xsl:try instruction to allow recovery from dynamic errors;
  • The element xsl:global-context-item, used to declare the stylesheet’s expectations of the global context item (notably, its type).
  • A new instruction xsl:assert to assist developers in producing correct and robust code.

XSLT 3.0 also delivers enhancements made to the XPath language and to the standard function library, including the following:

  • Variables can now be bound in XPath using the let expression.
  • Functions are now first class values, and can be passed as arguments to other (higher-order) functions, making XSLT a fully-fledged functional programming language.
  • A number of new functions are available, for example trigonometric functions, and the functions parse-xmlFO30 and serializeFO30 to convert between lexical and tree representations of XML.

XSLT 3.0 also includes support for maps (a data structure consisting of key/value pairs, sometimes referred to in other programming languages as dictionaries, hashes, or associative arrays). This feature extends the data model, provides new syntax in XPath, and adds a number of new functions and operators. Initially developed as XSLT-specific extensions, maps have now been integrated into XPath 3.1 (see [XPath 3.1]). XSLT 3.0 does not require implementations to support XPath 3.1 in its entirety, but it does requires support for these specific features.

This will remain a proposed recommendation until 1 June 2017.

How close can you read? 😉

Enjoy!

February 13, 2017

Do You Feel Chilled? W3C and DRM

Filed under: DRM,Intellectual Property (IP),W3C — Patrick Durusau @ 8:56 pm

Indefensible: the W3C says companies should get to decide when and how security researchers reveal defects in browsers by Cory Doctorow.

From the post:

The World Wide Web Consortium has just signaled its intention to deliberately create legal jeopardy for security researchers who reveal defects in its members’ products, unless the security researchers get the approval of its members prior to revealing the embarrassing mistakes those members have made in creating their products. It’s a move that will put literally billions of people at risk as researchers are chilled from investigating and publishing on browsers that follow W3C standards.

It is indefensible.

I enjoy Cory’s postings and fiction but I had to read this one more than once to capture the nature of Cory’s complaint.

As I understand it the argument runs something like this:

1. The W3C is creating a “…standardized DRM system for video on the World Wide Web….”

2. Participants in the W3C process must “…surrender the right to invoke their patents in lawsuits as a condition of participating in the W3C process….” (The keyword here is participants. No non-participant waives their patent rights as a result of W3C policy.)

3. The W3C isn’t requiring waiver of DCMA 1201 rights as a condition for participating in the video DRM work.

All true but I don’t see Cory gets to the conclusion:

…deliberately create legal jeopardy for security researchers who reveal defects in its members’ products, unless the security researchers get the approval of its members prior to revealing the embarrassing mistakes those members have made in creating their products.

Whether the W3C requires participants in the DRM system for video to waive DCMA 1201 rights or not, the W3C process has no impact on non-participants in that process.

Secondly, security researchers are in jeopardy if and only if they incriminate themselves when publishing defects in DRM products. As security researchers, they are capable of anonymously publishing any security defects they find.

Third, legal liability flows from statutory law and not the presence or absence of consensual agreement among a group of vendors. Private agreements can only protect you from those agreeing.

I don’t support DRM and never have. Personally I think it is a scam and tax on content creators. It’s unfortunate that fear that someone, somewhere might not be paying full rate, is enough for content creators to tax themselves with DRM schemes and software. None of which is free.

Rather than arguing about W3C policy, why not point to the years of wasted effort and expense by content creators on DRM? With no measurable return. That’s a plain ROI question.

DRM software vendors know the pot of gold content creators are chasing is at the end of an ever receding rainbow. In fact, they’re counting on it.

December 17, 2015

XQuery, XPath, XSLT and XQuery Serialization 3.1, New Candidate Recommendations!

Filed under: W3C,XPath,XQuery,XSLT — Patrick Durusau @ 11:10 am

As I forecast 😉 earlier this week, new Candidate Recommendations for:

XQuery 3.1: An XML Query Language

XML Path Language (XPath) 3.1

XSLT and XQuery Serialization 3.1

have hit the streets for your review and comments!

Comments due by 2016-01-31.

That’s forty-five days, minus the ones spent with drugs/sex/rock-n-roll over the holidays and recovering from same.

Say something shy of forty-four actual working days (my endurance isn’t what it once was) for the review process.

What tools, techniques are you going to use to review this latest set of candidates?

BTW, some people review software and check only fixes, for standards I start at the beginning, go to the end, then stop. (Or the reverse for backward proofing.)

My estimates on days spent with drugs/sex/rock-n-rock are approximate only and your experience may vary.

December 14, 2015

XQuery, XPath, XSLT and XQuery Serialization 3.1 (Back-to-Front) Drafts (soon!)

Filed under: W3C,XPath,XQuery,XSLT — Patrick Durusau @ 4:04 pm

XQuery, XPath, XSLT and XQuery Serialization 3.1 (Back-to-Front) Drafts will be published quite soon so I wanted to give you a heads up on your holiday reading schedule.

This is deep enough in the review cycle that a back-to-front reading is probably your best approach.

You have read the drafts and corrections often enough by this point that you read the first few words of a paragraph and you “know” what it says so you move on. (At the very least I can report that happens to me.)

By back-to-front reading I mean to start at the end of each draft and read the last sentence and then the next to last sentence and so on.

The back-to-front process does two things:

  1. You are forced to read each sentence on its own.
  2. It prevents skimming and filling in errors with silent corrections (unknown to your conscious mind).

The back-to-front method is quite time consuming so its fortunate these drafts are due to appear just before a series of holidays in a large number of places.

I hesitate to mention it but there is another way to proof these drafts.

If you have XML experienced visitors, you could take turns reading the drafts to each other. It was a technique used by copyists many years ago where one person read and two others took down the text. The two versions were then compared to each other and the original.

Even with a great reading voice, I’m not certain many people would be up to that sort of exercise.

PS: I will post on the new drafts as soon as they are published.

June 30, 2015

Dataset Usage Vocabulary

Filed under: W3C — Patrick Durusau @ 3:16 pm

Dataset Usage Vocabulary W3C First Public Working Draft 25 June 2015.

Abstract:

Datasets published on the Web are accessed and experienced by consumers in a variety of ways, but little information about these experiences is typically conveyed. Dataset publishers many times lack feedback from consumers about how datasets are used. Consumers lack an effective way to discuss experiences with fellow collaborators and explore referencing material citing the dataset. Datasets as defined by DCAT are a collection of data, published or curated by a single agent, and available for access or download in one or more formats. The Dataset Usage Vocabulary (DUV) is used to describe consumer experiences, citations, and feedback about the dataset from the human perspective.

By specifying a number of foundational concepts used to collect dataset consumer feedback, experiences, and cite references associated with a dataset, APIs can be written to support collaboration across the Web by structurally publishing consumer opinions and experiences, and provide a means for data consumers and producers advertise and search for published open dataset usage.

From Status of This Document:

This is a draft document which may be merged with the Data Quality Vocabulary or remain as a standalone document. Feedback is sought on the overall direction being taken as much as the specific details of the proposed vocabulary.

Comments to: public-dwbp-comments@w3.org, to subscribe comment Archives.

This could be useful. Especially if specialized vocabularies are developed from experience in particular data domains.

June 3, 2015

W3C Validation Tools – New Location

Filed under: CSS3,HTML,HTML5,W3C,WWW — Patrick Durusau @ 10:52 am

W3C Validation Tools

The W3C graciously hosts the following free validation tools:

CSS Validator – Checks your Cascading Style Sheets (CSS)

Internationalization Checker – Checks level of internationalization-friendliness.

Link Checker – Checks your web pages for broken links.

Markup Validator – Checks the markup of your Web documents (HTML or XHTML).

RSS Feed Validator – Validator for syndicated feeds. (RSS and Atom feeds)

RDF Validator – Checks and visualizes RDF documents.

Unicorn – Unified validator. HTML, CSS, Links & Mobile.

Validator.nu Checks HTML5.

I mention that these tools are free to emphasize there is no barrier to their use.

Just as you wouldn’t submit a research paper with pizza grease stains on it, use these tools to proof draft standards before you submit them for review.

April 16, 2015

New CSV on the Web Drafts (CVS From RDF?)

Filed under: CSV,W3C — Patrick Durusau @ 3:49 pm

Four Drafts Published by the CSV on the Web Working Group

From the post:

The CSV on the Web Working Group has published a new set of Working Drafts, which the group considers feature complete and implementable.

The group is keen to receive comments on these specifications, either as issues on the Group’s GitHub repository or by posting to public-csv-wg-comment target=”_blank”s@w3.org.

The CSV on the Web Working Group would also like to invite people to start implementing these specifications and to donate their test cases into the group’s test suite. Building this test suite, as well as responding to comments, will be the group’s focus over the next couple of months.

Learn more about the CSV on the Web Working Group.

If nothing else, Model for Tabular Data and Metadata on the Web, represents a start on documenting a largely undocumented data format. Perhaps the most common undocumented data format of all. I say that, there may be a dissertation or even a published book that has collected all the CSV variants at a point in time. Sing out if you know of such a volume.

Will be interested to see if the group issues a work product entitled: Generating CSV from RDF on the Web. Our return to data opacity will be complete.

Not that I have any objections to CSV, compact, easy to parse (well, perhaps not correctly but I didn’t say that), widespread, but it isn’t the best format for blind interchange, which by definition includes legacy data. Odd that in a time of practically limitless storage and orders of magnitude faster processing, that data appears to be seeking less documented semantics.

Or is that the users and producers of data prefer data with less documented semantics? I knew an office manager once upon a time who produced reports from unshared cheatsheets using R Writer. Was loathe to share field information with anyone. Seemed like a very sad way to remain employed.

Documenting data semantics isn’t going to obviate the need for the staffs who previously concealed data semantics. Just as data is dynamic so are its semantics which will require the same staffs to continually update and document the changing semantics of data. Same staffs for the most part, just more creative duties.

March 19, 2015

UI Events (Formerly DOM Level 3 Events) Draft Published

Filed under: Interface Research/Design,UX,W3C,Web Applications,Web Browser,WWW — Patrick Durusau @ 6:37 pm

UI Events (Formerly DOM Level 3 Events) Draft Published

From the post:

The Web Applications Working Group has published a Working Draft of UI Events (formerly DOM Level 3 Events). This specification defines UI Events which extend the DOM Event objects defined in DOM4. UI Events are those typically implemented by visual user agents for handling user interaction such as mouse and keyboard input. Learn more about the Rich Web Client Activity.

If you are planning on building rich web clients, now would be the time to start monitoring W3C drafts in this area. To make sure your use cases are met.

People have different expectations with regard to features and standards quality. Make sure your expectations are heard.

March 13, 2015

W3C Invites Implementations of XQuery and XPath Full Text 3.0;…

Filed under: W3C,XPath,XQuery — Patrick Durusau @ 4:26 pm

W3C Invites Implementations of XQuery and XPath Full Text 3.0; Supporting Requirements and Use Cases Draft Updated

From the post:

The XML Query Working Group and the XSLT Working Group invite implementation of the Candidate Recommendation of XQuery and XPath Full Text 3.0. The Full Text specification extends the XPath and XQuery languages to support fast and efficient full text searches over arbitrarily large collections of documents. This release brings the Full Text specification up to date with XQuery 3.0 and XPath 3.0; the language itself is unchanged.

Both groups also published an updated Working Draft of XQuery and XPath Full Text 3.0 Requirements and Use Cases. This document specifies requirements and use cases for Full-Text Search for use in XQuery 3.0 and XPath 3.0. The goal of XQuery and XPath Full Text 3.0 is to extend XQuery and XPath Full Text 1.0 with additional functionality in response to requests from users and implementors.

If you have comments that arise out of implementation experience, be advised that XQuery and XPath Full Text 3.0 will be a Candidate Recommendation until at least 26 March 2015.

Enjoy!

January 9, 2015

Structural Issues in XPath/XQuery/XPath-XQuery F&O Drafts

Filed under: Standards,W3C,XML,XPath,XQuery — Patrick Durusau @ 1:02 pm

Apologies as I thought I was going to be further along in demonstrating some proofing techniques for XPath 3.1, XQuery 3.1, XPath and XQuery Functions and Operations 3.1 by today.

Instead, I encountered structural issues that are common to all three drafts that I didn’t anticipate but that need to be noted before going further with proofing. I will be using sample material to illustrate the problems and will not always have a sample from all three drafts or even note every occurrence of the issues. They are too numerous for that treatment and it would be repetition for repetition’s sake.

First, consider these passages from XPath 3.1, 1 Introduction:

[Definition: XPath 3.1 operates on the abstract, logical structure of an XML document, rather than its surface syntax. This logical structure, known as the data model, is defined in [XQuery and XPath Data Model (XDM) 3.1].]

[Definition: An XPath 3.0 Processor processes a query according to the XPath 3.0 specification.] [Definition: An XPath 2.0 Processor processes a query according to the XPath 2.0 specification.] [Definition: An XPath 1.0 Processor processes a query according to the XPath 1.0 specification.]

1. Unnumbered Definitions – Unidentified Cross-References

The first structural issue that you will note with the “[Definition…” material is that all such definitions are unnumbered and appear throughout all three texts. The lack of numbering means that it is difficult to refer with any precision to a particular definition. How would I draw your attention to the third definition of the second grouping? Searching for XPath 1.0 turns up 79 occurrences in XPath 3.1 so that doesn’t sound satisfactory. (FYI, “Definition” turns up 193 instances.)

While the “Definitions” have anchors that allow them to be addressed by cross-references, you should note that the cross-references are text hyperlinks that have no identifier by which a reader can find the definition without using the hyperlink. That is to say when I see:

A lexical QName with a prefix can be converted into an expanded QName by resolving its namespace prefix to a namespace URI, using the statically known namespaces. [These are fake links to draw your attention to the text in question.]

The hyperlinks in the original will take me to various parts of the document where these definitions occur, but if I have printed the document, I have no clue where to look for these definitions.

The better practice is to number all the definitions and since they are all self-contained, to put them in a single location. Additionally, all interlinear references to those definitions (or other internal cross-references) should have a visible reference that enables a reader to find the definition or cross-reference, without use of an internal hyperlink.

Example:

A lexical QName Def-21 with a prefix can be converted into an expanded QName Def-19 by resolving its namespace prefix to a namespace URI, using the statically known namespaces. Def-99 [These are fake links to draw your attention to the text in question. The Def numbers are fictitious in this example. Actual references would have the visible definition numbers assigned to the appropriate definition.]

2. Vague references – $N versus 5000 x $N

Another problem I encountered was what I call “vague references,” or less generously, $N versus 5,000 x $N.

For example:

[Definition: An atomic value is a value in the value space of an atomic type, as defined in [XML Schema 1.0] or [XML Schema 1.1].] [Definition: A node is an instance of one of the node kinds defined in [XQuery and XPath Data Model (XDM) 3.1].

Contrary to popular opinion, standards don’t write themselves and every jot and tittle was placed in a draft at the expense of someone’s time and resources. Let’s call that $N.

In the example, you and I both know somewhere in XML Schema 1.0 and XML Schema 1.1 that the “value space of the atomic type” is defined. The same is true for nodes and XQuery and XPath Data Model (XDM) 3.1. But where? The authors of these specifications could insert that information at a cost of $N.

What is the cost of not inserting that information in the current drafts? I estimate the number of people interested in reading these drafts to be 5,000. So each of those person will have to find the same information omitted from these specifications, which is a cost of 5,000 x $N. In terms of convenience to readers and reducing their costs of reading these specifications, references to exact locations in other materials are a necessity.

In full disclosure, I have no more or less reason to think 5,000 people are interested in these drafts than the United States has for positing the existence of approximately 5,000 terrorists in the world. I suspect the number of people interested in XML is actually higher but the number works to make the point. Editors can either convenience themselves or their readers.

Vague references are also problematic in terms of users finding the correct reference. The citation above, [XML Schema 1.0] for “value space of an atomic type,” refers to all three parts of XML Schema 1.0.

Part 1, at 3.14.1 (non-normative) The Simple Type Definition Schema Component, has the only reference to “atomic type.”

Part 2, actually has “0” hits for “atomic type.” True enough, “2.5.1.1 Atomic datatypes” is likely the intended reference but that isn’t what the specification says to look for.

Bottom line is that any external reference needs to include in the inline citation the precise internal reference in the work being cited. If you want to inconvenience readers by pointing to internal bibliographies rather than online HTML documents, where available, that’s an editorial choice. But in any event, for every external reference, give the internal reference in the work being cited.

Your readers will appreciate it and it could make your work more accurate as well.

3. Normative vs. Non-Normative Text

Another structural issue which is important for proofing is the distinction between normative and non-normative text.

In XPath 3.1, still in the Introduction we read:

This document normatively defines the static and dynamic semantics of XPath 3.1. In this document, examples and material labeled as “Note” are provided for explanatory purposes and are not normative.

OK, and under 2.2.3.1 Static Analysis Phase (XPath 3.1), we find:

Examples of inferred static types might be:

Which is followed by a list so at least we know where the examples end.

However, there are numerous cases of:

For example, with the expression substring($a, $b, $c), $a must be of type xs:string (or something that can be converted to xs:string by the function calling rules), while $b and $c must be of type xs:double. [also in 2.2.3.1 Static Analysis Phase (XPath 3.1)]

So, is that a non-normative example? If so, what is the nature of the “must” that occurs in it? Is that normative?

Moreover, the examples (XPath 3.1 has 283 occurrences of that term, XQuery has 455 occurrences of that term, XPath and XQuery Functions and Operators have 537 occurrences of that term) are unnumbered, which makes referencing the examples by other materials very imprecise and wordy. For the use of authors creating secondary literature on these materials, to promote adoption, etc., number of all examples should be the default case.

Oh, before anyone protests that XPath and XQuery Functions and Operators has separated its examples into lists, that is true but only partially. There remain 199 occurrences of “for example” which do not occur in lists. Where lists are used, converting to numbered examples should be trivial. The elimination of “for example” material may be more difficult. Hard to say without a good sampling of the cases.

Conclusion:

As I said at the outset, apologies for not reaching more substantive proofing techniques but structural issues are important for the readability and usability of specifications for readers. Being correct and unreadable isn’t a useful goal.

It may seem like some of the changes I suggest are a big “ask” this late in the processing of these specifications. If this were a hand edited document, I would quickly agree with you. But it’s not. Or at least it shouldn’t be. I don’t know where the source is held but the HTML you read is an generated artifact.

Gathering and numbering the definitions and inserting those numbers into the internal cross-references are a matter of applying a different style sheet to the source. Fixing the vague references and unnumbered example texts would take more editorial work but readers would greatly benefit from precise references and a clear separation of normative from non-normative text.

I will try again over the weekend to reach aids for substantive proofing on these drafts. With luck, I will return to these drafts on Monday of next week (12 January 2014).

January 5, 2015

Redefining RFC 2119? Danger! Danger! Will Robinson!

Filed under: Standards,W3C,XML,XPath,XQuery — Patrick Durusau @ 3:43 pm

I’m lagging behind in reading XQuery 3.1: An XML Query Language, XML Path Language (XPath) 3.1, and, XPath and XQuery Functions and Operators 3.1 in order to comment by 13 February 2015.

In order to catch up this past weekend I started trying to tease these candidate recommendations apart to make them easier to proof. One of the things I always do I check for key word conformance language and that means, outside of ISO, RFC 2119.

I was reading XPath and XQuery Functions and Operators 3.1 (herein Functions and Operators) when I saw:

1.1 Conformance

The Functions and Operators specification is intended primarily as a component that can be used by other specifications. Therefore, Functions and Operators relies on specifications that use it (such as [XML Path Language (XPath) 3.1], [XQuery 3.1: An XML Query Language], and potentially future versions of XSLT) to specify conformance criteria for their respective environments.

That works. You have a normative document of definitions, etc., and some other standard cites those definitions and supplies the must,should, may according to RFC 2119. Not common but that works.

But then I started running scripts for usage of key words and I found in Functions and Operators:

1.6.3 Conformance terminology

[Definition] may

Conforming documents and processors are permitted to, but need not, behave as described.

[Definition] must

Conforming documents and processors are required to behave as described; otherwise, they are either non-conformant or else in error.

Thus the title: Redefining RFC 2119? Danger! Danger! Will Robinson!

RFC 2119 reads in part:

1. MUST This word, or the terms “REQUIRED” or “SHALL”, mean that the definition is an absolute requirement of the specification.

5. MAY This word, or the adjective “OPTIONAL”, mean that an item is truly optional. One vendor may choose to include the item because a particular marketplace requires it or because the vendor feels that it enhances the product while another vendor may omit the same item. An implementation which does not include a particular option MUST be prepared to interoperate with another implementation which does include the option, though perhaps with reduced functionality. In the same vein an implementation which does include a particular option MUST be prepared to interoperate with another implementation which does not include the option (except, of course, for the feature the option provides.)

6. Guidance in the use of these Imperatives

Imperatives of the type defined in this memo must be used with care and sparingly. In particular, they MUST only be used where it is actually required for interoperation or to limit behavior which has potential for causing harm (e.g., limiting retransmisssions) For example, they must not be used to try to impose a particular method on implementors where the method is not required for interoperability.

First, the referencing of RFC 2119 is standard practice at the W3C, at least with regard to XML specifications. I wanted to have more than personal experience to cite so I collected the fifty-eight current XML specifications and summarize them in the list at the end of this post.

Of the fifty-nine (59) current XML specifications (there may be others, the W3C has abandoned simply listing its work without extraneous groupings), fifty-two of the standards cite and follow RFC 2119. Three of the remaining seven (7) fail to cite RFC due to errors in editing.

The final four (4) as it were that don’t cite RFC 2119 are a good illustration of how errors get perpetuated from one standard to another.

The first W3C XML specification to not cite RFC 2119 was: Extensible Markup Language (XML) 1.0 (Second Edition) where it reads in part:

1.2 Terminology

may

[Definition: Conforming documents and XML processors are permitted to but need not behave as described.]

must

[Definition: Conforming documents and XML processors are required to behave as described; otherwise they are in error. ]

The definitions of must and may were ABANDONED in Extensible Markup Language (XML) 1.0 (Third Edition), which simply dropped those definitions and instead reads in part:

1.2 Terminology

The terminology used to describe XML documents is defined in the body of this specification. The key words must, must not, required, shall, shall not, should, should not, recommended, may, and optional, when emphasized, are to be interpreted as described in [IETF RFC 2119].

The exclusive use of RFC 2119 continues through Extensible Markup Language (XML) 1.0 (Fourth Edition) to the current Extensible Markup Language (XML) 1.0 (Fifth Edition)

However, as is often said, whatever good editing we do is interred with us and any errors we make live on.

Before the abandonment of attempts to define may and must appeared in XML 3rd edition, XML Schema Part 1: Structures Second Edition and XML Schema Part 2: Datatypes Second Edition cite XML 2nd edition as their rationale for defining may and must. That error has never been corrected.

Which brings us to W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes which is the last W3C XML specification to not cite RFC 2119.

XSD 1.1 Part 2 reads in part, under Appendix I Changes since version 1.0, I.4 Other Changes:

The definitions of must, must not, and ·error· have been changed to specify that processors must detect and report errors in schemas and schema documents (although the quality and level of detail in the error report is not constrained).

The problem being XML Schema Part 2: Datatypes Second Edition

relies upon XML Schema Part 2: Datatypes Second Edition which cites Extensible Markup Language (XML) 1.0 (Second Edition) as the reason for redefining the terms may and must.

The redefining of may and must relies upon language in a superceded version of the XML standard. Language that was deleted ten (10) years ago from the XML standard.

If you have read this far, you have a pretty good guess that I am going to suggest that XPath and XQuery Functions and Operators 3.1 drop the attempt to redefine terms that appear in RFC 2119.

First, redefining widely used terms for conformance is clearly a bad idea. Do you mean RFC2119 must or do you mean and F&O must? Clearly different. If a requirement has an RFC2119 must, my application either conforms or fails. If a requirement has an F&O must, my application may simple be in error. All the time. Is that useful?

Second, by redefining must, we lose the interoperability aspects as define by RFC2119 for all uses of must. Surely interoperability is a goal of Functions and Operators. Yes?

Third, the history of redefining may and must at the W3C shows (to me) the perpetuation of an error long beyond its correction date. It’s time to put an end to redefining may and must.

PS: Before you decide you “know” the difference in upper and lower case key words from RFC 2119, take a look at: RFC Editorial Guidelines and Procedures, Normative References to RFC 2119. Summary, UPPER CASE is normative, lower case is “a necessary logical relationship.”

PPS: Tracking this error down took longer than expected so it will be later this week before I have anything that may help with proofing the specifications.


XML Standards Consulted in preparation of this post. Y = Cites RFC 2119, N = Does not cite RFC 2119.

December 19, 2014

Terms Defined in the W3C HTML5 Recommendation

Filed under: HTML5,W3C — Patrick Durusau @ 6:27 pm

Terms Defined in the W3C HTML5 RecommendationHTML5 Recommendation.

I won’t say this document has much of a plot or that it is an easy read. 😉

If you are using HTML5, however, this should either be a bookmark or open in your browser.

Enjoy!

I first saw this in a tweet by AdobeWebCC.

November 20, 2014

Indexed Database API Proposed Recommendation Published (Review Opportunity)

Filed under: Database,W3C — Patrick Durusau @ 2:02 pm

Indexed Database API Proposed Recommendation Published

From the post:

The Web Applications Working Group has published a Proposed Recommendation of Indexed Database API. This document defines APIs for a database of records holding simple values and hierarchical objects. Each record consists of a key and some value. Moreover, the database maintains indexes over records it stores. An application developer directly uses an API to locate records either by their key or by using an index. A query language can be layered on this API. An indexed database can be implemented using a persistent B-tree data structure. Comments are welcome through 18 December. Learn more about the Rich Web Client Activity.

If you have the time between now and 18 December, this is a great opportunity to “get your feet wet” reviewing W3C recommendations.

The Indexed Database API document isn’t long (43 pages approximately) and you are no doubt already familiar with databases in general.

An example to get you started:

3.2 APIs

The API methods return without blocking the calling thread. All asynchronous operations immediately return an IDBRequest instance. This object does not initially contain any information about the result of the operation. Once information becomes available, an event is fired on the request and the information becomes available through the properties of the IDBRequest instance.

  1. When you read:

    The API methods return without blocking the calling thread.

    How do you answer the question: What is being returned by an API method?

  2. What is your answer after reading the next sentence?

    All asynchronous operations immediately return an IDBRequest instance.

  3. Does this work?

    API methods return an IDBRequest instance without blocking the calling thread.

    (Also correcting the unnecessary definite article “The.”)

  4. One more issue with the first sentence is:

    …without blocking the calling thread.

    If you search the document, there is no other mention of calling threads.

    I suspect this is unnecessary and would ask for its removal.

    So, revised the first sentence would read:

    API methods return an IDBRequest instance.

  5. Maybe, except that the second sentence says “All asynchronous operations….”

    When you see a statement of “All …. operations…,” you should look for a definition of those operations.

    I have looked and while “asynchronous” is used thirty-four (34) times, “asynchronous operations” is used only once.

    (more comments on “asynchronous” below)

  6. I am guessing you caught the “immediately” problem on your own. Undefined and what other response would there be?

    If we don’t need “asynchronous” and the first sentence is incomplete, is this a good suggestion for 3.2 APIs?

    (Proposed Edit)

    3.2 APIs

    API methods return an IDBRequest instance. This object does not initially contain any information about the result of the operation. Once information becomes available, an event is fired on the request and the information becomes available through the properties of the IDBRequest instance.

    There, your first edit to a W3C Recommendation removed twelve (12) words and made the text clearer.

    Plus there are the other thirty-three (33) instances of “asynchronous” to investigate.

    Not bad for your first effort!


    After looking around the rest of the proposed recommendation, I suspect that “asynchronous” is used to mean that results and requests can arrive and be processed in any order (except for some cases of overlapping scope of operations). It’s simpler just to say that once and not go about talking about “asynchronous requests,” “opening databases asynchronously,” etc.

November 11, 2014

More Public Input @ W3C

Filed under: Standards,W3C — Patrick Durusau @ 10:15 am

In an effort to get more public input on W3C drafts, a new mailing list has been created:

public-review-announce@w3.org list.

One outcome of this list could be little or not increase in public input on W3C drafts. In which case the forces that favor a closed club at the W3C will be saying “I told you so,” privately of course.

Another outcome of this list could be an increase in public input on W3C drafts. From a broader range of stakeholders that has been the case in the past. In which case the W3C drafts will benefit from the greater input and the case can be made for a greater public voice at the W3C.

But the fate of a greater public voice at the W3C rests with you and others like you. If you don’t speak up when you have the opportunity, people will assume you don’t want to speak at all. Perhaps wrong but that is the way it works.

My recommendation is that you subscribe to this new list and as appropriate, spread the news of W3C drafts of interest to stakeholders in your community. More than that, you should actively encourage people to review and submit comments. And review and submit comments yourself.

The voice at risk is yours.

Your call.

subscribe to public-review-announce

October 2, 2014

Consolidated results of the public Webizen survey

Filed under: W3C — Patrick Durusau @ 7:12 pm

Consolidated results of the public Webizen survey by Coralie Mercier.

You may recall my post: A Greater Voice for Individuals in W3C – Tell Us What You Would Value [Deadline: 30 Sept 2014]. The W3C had a survey on what would attract individuals to the W3C.

Out of 7,264,774,500 (approximately as of 19:53 EST) potential voters, Coralie reports that 205 answers were received.

Even allowing for those too young, too infirm, in prison, stranded in reality shows, that is a very poor showing.

Some of the answers, at least to me, appear to be spot on. But given the response rate, it will be hard to reach even irrational conclusions about individuals at the W3C. A Ouija board would have required less technology and be about as accurate.

If you don’t want a voice at the W3C for individuals, not filing a survey response was a step in that direction. Congratulations.

PS: I don’t mean to imply that the W3C would listen to N number of survey responses saying X. Maybe yes and maybe no. But non-participation saves them from having to make that choice.

September 17, 2014

Call for Review: HTML5 Proposed Recommendation Published

Filed under: HTML5,W3C — Patrick Durusau @ 10:13 am

Call for Review: HTML5 Proposed Recommendation Published

From the post:

The HTML Working Group has published a Proposed Recommendation of HTML5. This specification defines the 5th major revision of the core language of the World Wide Web: the Hypertext Markup Language (HTML). In this version, new features are introduced to help Web application authors, new elements are introduced based on research into prevailing authoring practices, and special attention has been given to defining clear conformance criteria for user agents in an effort to improve interoperability. Comments are welcome through 14 October. Learn more about the HTML Activity.

Now would be the time to submit comments, corrections, etc.

Deadline: 14 October 2014.

August 20, 2014

Web Annotation Working Group (Preventing Semantic Rot)

Filed under: Annotation,W3C — Patrick Durusau @ 4:12 pm

Web Annotation Working Group

From the post:

The W3C Web Annotation Working Group is chartered to develop a set of specifications for an interoperable, sharable, distributed Web annotation architecture. The chartered specs consist of:

  1. Abstract Annotation Data Model
  2. Data Model Vocabulary
  3. Data Model Serializations
  4. HTTP API
  5. Client-side API

The working group intends to use the Open Annotation Data Model and Open Annotation Extension specifications, from the W3C Open Annotation Community Group, as a starting point for development of the data model specification.

The Robust Link Anchoring specification will be jointly developed with the WebApps WG, where many client-side experts and browser implementers participate.

Some good news for the middle of a week!

Shortcomings to watch for:

Can annotations be annotated?

Can non-Web addressing schemes be used by annotators?

Can the structure of files (visible or not) in addition to content be annotated?

If we don’t have all three of those capabilities, then the semantics of annotations will rot, just as semantics of earlier times have rotted away. The main distinction is that most of our ancestors didn’t choose to allow the rot to happen.

I first saw this in a tweet by Rob Sanderson.

August 17, 2014

Value-Loss Conduits?

Filed under: W3C,Web Browser — Patrick Durusau @ 3:52 pm

Do you remove links from materials that you quote?

I ask because of the following example:

The research, led by Alexei Efros, associate professor of electrical engineering and computer sciences, will be presented today (Thursday, Aug. 14) at the International Conference and Exhibition on Computer Graphics and Interactive Techniques, or SIGGRAPH, in Vancouver, Canada.

“Visual data is among the biggest of Big Data,” said Efros, who is also a member of the UC Berkeley Visual Computing Lab. “We have this enormous collection of images on the Web, but much of it remains unseen by humans because it is so vast. People have called it the dark matter of the Internet. We wanted to figure out a way to quickly visualize this data by systematically ‘averaging’ the images.”

Which is a quote from: New tool makes a single picture worth a thousand – and more – images by Sarah Yang.

Those passages were reprinted by Science Daily reading:

The research, led by Alexei Efros, associate professor of electrical engineering and computer sciences, was presented Aug. 14 at the International Conference and Exhibition on Computer Graphics and Interactive Techniques, or SIGGRAPH, in Vancouver, Canada.

“Visual data is among the biggest of Big Data,” said Efros, who is also a member of the UC Berkeley Visual Computing Lab. “We have this enormous collection of images on the Web, but much of it remains unseen by humans because it is so vast. People have called it the dark matter of the Internet. We wanted to figure out a way to quickly visualize this data by systematically ‘averaging’ the images.”

Why leave out the hyperlinks for SIGGRAPH and the Visual Computing Laboratory?

Or for that matter, the link to the original paper: AverageExplorer: Interactive Exploration and Alignment of Visual Data Collections (ACM Transactions on Graphics, SIGGRAPH paper, August 2014) which appeared in the news release.

All three hyperlinks enhance your ability to navigate to more information. Isn’t navigation to more information a prime function of the WWW?

If so, we need to clue ScienceDaily and other content repackagers to include hyperlinks passed onto them, at least.

If you can’t be a value-add, at least don’t be a value-loss conduit.

July 23, 2014

Web Annotation Working Group Charter

Filed under: Annotation,OWL,RDF,W3C — Patrick Durusau @ 3:32 pm

Web Annotation Working Group Charter

From the webpage:

Annotating, which is the act of creating associations between distinct pieces of information, is a widespread activity online in many guises but currently lacks a structured approach. Web citizens make comments about online resources using either tools built into the hosting web site, external web services, or the functionality of an annotation client. Readers of ebooks make use the tools provided by reading systems to add and share their thoughts or highlight portions of texts. Comments about photos on Flickr, videos on YouTube, audio tracks on SoundCloud, people’s posts on Facebook, or mentions of resources on Twitter could all be considered to be annotations associated with the resource being discussed.

The possibility of annotation is essential for many application areas. For example, it is standard practice for students to mark up their printed textbooks when familiarizing themselves with new materials; the ability to do the same with electronic materials (e.g., books, journal articles, or infographics) is crucial for the advancement of e-learning. Submissions of manuscripts for publication by trade publishers or scientific journals undergo review cycles involving authors and editors or peer reviewers; although the end result of this publishing process usually involves Web formats (HTML, XML, etc.), the lack of proper annotation facilities for the Web platform makes this process unnecessarily complex and time consuming. Communities developing specifications jointly, and published, eventually, on the Web, need to annotate the documents they produce to improve the efficiency of their communication.

There is a large number of closed and proprietary web-based “sticky note” and annotation systems offering annotation facilities on the Web or as part of ebook reading systems. A common complaint about these is that the user-created annotations cannot be shared, reused in another environment, archived, and so on, due to a proprietary nature of the environments where they were created. Security and privacy are also issues where annotation systems should meet user expectations.

Additionally, there are the related topics of comments and footnotes, which do not yet have standardized solutions, and which might benefit from some of the groundwork on annotations.

The goal of this Working Group is to provide an open approach for annotation, making it possible for browsers, reading systems, JavaScript libraries, and other tools, to develop an annotation ecosystem where users have access to their annotations from various environments, can share those annotations, can archive them, and use them how they wish.

Depending on how fine grained you want your semantics, annotation is one way to convey them to others.

Unfortunately, looking at the starting point for this working group, “open” means RDF, OWL and other non-commercially adopted technologies from the W3C.

Defining the ability to point, using XQuery perhaps and reserving to users the ability to create standards for annotation payloads would be a much more “open” approach. That is an approach you are unlikely to see from the W3C.

I would be more than happy to be proven wrong on that point.

July 12, 2014

May 25, 2014

Emotion Markup Language 1.0 (No Repeat of RDF Mistake)

Filed under: EmotionML,Subject Identity,Topic Maps,W3C — Patrick Durusau @ 3:19 pm

Emotion Markup Language (EmotionML) 1.0

Abstract:

As the Web is becoming ubiquitous, interactive, and multimodal, technology needs to deal increasingly with human factors, including emotions. The specification of Emotion Markup Language 1.0 aims to strike a balance between practical applicability and scientific well-foundedness. The language is conceived as a “plug-in” language suitable for use in three different areas: (1) manual annotation of data; (2) automatic recognition of emotion-related states from user behavior; and (3) generation of emotion-related system behavior.

I started reading EmotionML with the expectation that the W3C had repeated its one way and one way only for identification mistake from RDF.

Much to my pleasant surprise I found:

1.2 The challenge of defining a generally usable Emotion Markup Language

Any attempt to standardize the description of emotions using a finite set of fixed descriptors is doomed to failure: even scientists cannot agree on the number of relevant emotions, or on the names that should be given to them. Even more basically, the list of emotion-related states that should be distinguished varies depending on the application domain and the aspect of emotions to be focused. Basically, the vocabulary needed depends on the context of use. On the other hand, the basic structure of concepts is less controversial: it is generally agreed that emotions involve triggers, appraisals, feelings, expressive behavior including physiological changes, and action tendencies; emotions in their entirety can be described in terms of categories or a small
number of dimensions; emotions have an intensity, and so on. For details, see Scientific Descriptions of Emotions in the Final Report of the Emotion Incubator Group.

Given this lack of agreement on descriptors in the field, the only practical way of defining an EmotionML is the definition of possible structural elements and their valid child elements and attributes, but to allow users to “plug in” vocabularies that they consider appropriate for their work. A separate W3C Working Draft complements this specification to provide a central repository of [Vocabularies for EmotionML] which can serve as a starting point; where the vocabularies listed there seem inappropriate, users can create their custom vocabularies.

An additional challenge lies in the aim to provide a generally usable markup, as the requirements arising from the three different use cases (annotation, recognition, and generation) are rather different. Whereas manual annotation tends to require all the fine-grained distinctions considered in the scientific literature, automatic recognition systems can usually distinguish
only a very small number of different states.

For the reasons outlined here, it is clear that there is an inevitable tension between flexibility and interoperability, which need to be weighed in the formulation of an EmotionML. The guiding principle in the following specification has been to provide a choice only where it is needed, and to propose reasonable default options for every choice.

Everything that is said about emotions is equally true for identification, emotions being on one of the infinite sets of subjects that you might want to identify.

Had the W3C avoided the one identifier scheme of RDF (and the reliance on a subset of reasoning, logic), RDF could have had plugin “identifier” modules, enabling the use of all extant and future identifiers, not to mention “reasoning” according to the designs of users.

It is good to see the W3C learning from its earlier mistakes and enabling users to express their world views, as opposed to a world view as prescribed by the W3C.

When users declare their emotional vocabularies, those are subjects which merit further identification. To avoid the problem of us not meaning the same thing by “owl:sameAs” as someone else means by “owl:sameAs.” (When owl:sameAs isn’t the Same: An Analysis of Identity Links on the Semantic Web by Harry Halpin, Ivan Herman, Patrick J. Hayes.)

Topic maps are a good solution for documenting subject identity and deciding when two or more identifications of subjects are the same subject.

I first saw this in a tweet by Inge Henriksen

May 8, 2014

Piercing the Document Barrier

Filed under: DOM4,Navigation,W3C — Patrick Durusau @ 4:13 pm

If you aspire to return more detailed search results than: “See this bundle of documents” to your users, you are talking about piercing the document barrier.

It’s a benefit to be able to search thousands of articles and get the top ten (10) or twenty (20) for some search but assuming an average of twelve (12) pages per article, I’m still left with between one hundred and twenty (120) and two hundred and forty (240) pages of material to read. Beats the hell out of the original thousands or hundreds of thousands of pages, but not be enough.

What if I could search for the latest graph research and the search results opted out of the traditional re-explanation of graphs that wastes space at the first of nearly every graph article? After all, anyone intentionally seeking out a published graph article probably has a lock on that detail. And if they don’t, the paragraphs wasted on explanation aren’t going to save them.

I mention that because the W3C‘s HTML Working Group has invited implementation of W3C DOM4.

From W3C Invites Implementations of W3C DOM4:

DOM defines a platform-neutral model for events and node trees.

I expect to see graph-based implementations out in force. Given the recent “discovery” by some people that graphs are “universal.” Having a single node is enough to have a graph under most definitions.

For your amusement from Glossary of graph theory

An edgeless graph or empty graph or null graph is a graph with zero or more vertices, but no edges. The empty graph or null graph may also be the graph with no vertices and no edges. If it is a graph with no edges and any number n of vertices, it may be called the null graph on n vertices. (There is no consistency at all in the literature.)

I don’t find the lack of “consistency” in the literature surprising.

You?

April 10, 2014

The X’s Are In Town

Filed under: HyTime,W3C,XML,XPath,XQuery — Patrick Durusau @ 6:53 pm

XQuery 3.0, XPath 3.0, XQueryX 3.0, XDM 3.0, Serialization 3.0, Functions and Operators 3.0 are now W3C Recommendations

From the post:

The XML Query Working Group published XQuery 3.0: An XML Query Language, along with XQueryX, an XML representation for XQuery, both as W3C Recommendations, as well as the XQuery 3.0 Use Cases and Requirements as final Working Group Notes. XQuery extends the XPath language to provide efficient search and manipulation of information represented as trees from a variety of sources.

The XML Query Working Group and XSLT Working Group also jointly published W3C Recommendations of XML Path Language (XPath) 3.0, a widely-used language for searching and pointing into tree-based structures, together with XQuery and XPath Data Model 3.0 which defines those structures, XPath and XQuery Functions and Operators 3.0 which provides facilities for use in XPath, XQuery, XSLT and a number of other languages, and finally the XSLT and XQuery Serialization 3.0 specification giving a way to turn values and XDM instances into text, HTML or XML.

Read about the XML Activity.

I was wondering what I was going to have to read this coming weekend. 😉

It may just be me but the “…provide efficient search and manipulation of information represented as trees from a variety of sources…” sounds a lot like groves to me.

You?

March 27, 2014

CSV on the Web

Filed under: CSV,Data,W3C — Patrick Durusau @ 10:58 am

CSV on the Web Use Cases and Requirements, and Model for Tabular Data and Metadata Published

I swear, that really is the title.

Two recent drafts of interest:

The CSV on the Web: Use Cases and Requirements collects use cases that are at the basis of the work of the Working Group. A large percentage of the data published on the Web is tabular data, commonly published as comma separated values (CSV) files. The Working Group aim to specify technologies that provide greater interoperability for data dependent applications on the Web when working with tabular datasets comprising single or multiple files using CSV, or similar, format. This document lists a first set of use cases compiled by the Working Group that are considered representative of how tabular data is commonly used within data dependent applications. The use cases observe existing common practice undertaken when working with tabular data, often illustrating shortcomings or limitations of existing formats or technologies. This document also provides a first set of requirements derived from these use cases that have been used to guide the specification design.

The Model for Tabular Data and Metadata on the Web outlines a basic data model, or infoset, for tabular data and metadata about that tabular data. The document contains first drafts for various methods of locating metadata: one of the output the Working Group is chartered for is to produce a metadata vocabulary and standard method(s) to find such metadata. It also contains some non-normative information about a best practice syntax for tabular data, for mapping into that data model, to contribute to the standardisation of CSV syntax by IETF (as a possible update of RFC4180).

I guess they mean to use CSV as it exists? What a radical concept. 😉

What next?

Could use an updated specification for the COBOL data format in which many government data sets are published (even now).

That last statement isn’t entirely in jest. There is a lot of COBOL formatted files on government websites in particular.

January 23, 2014

Provenance Reconstruction Challenge 2014

Filed under: Provenance,Semantic Web,W3C — Patrick Durusau @ 12:06 pm

Provenance Reconstruction Challenge 2014

Schedule

  • February 17, 2014 Test Data released
  • May 18, 2014 Last day to register for participation
  • May 19, 2014 Challenge Data released
  • June 13, 2014 Provenance Reconstruction Challenge Event at Provenance Week – Cologne Germany

From the post:

While the use of version control systems, workflow engines, provenance aware filesystems and databases, is growing there is still a plethora of data that lacks associated data provenance. To help solve this problem, a number of research groups have been looking at reconstructing the provenance of data using the computational environment in which it resides. This research however is still very new in the community. Thus, the aim the Provenance Reconstruction Challenge is to help spur research into the reconstruction of provenance by providing a common task and datasets for experimentation.

The Challenge

Challenge participants will receive an open data set and corresponding provenance graphs (in W3C PROV formant). They will then have several months to work with the data trying to reconstruct the provenance graphs from the open data set. 3 weeks before the challenge face-2-face event the participants will receive a new data set and a gold standard provenance graph. Participants are asked to register before the challenge dataset is released and to prepare a short description of their system to be placed online after the event.

The Event

At the event, we will have presentations of the results and the systems as well as a group conversation around the techniques used. The event will result in a joint report about techniques for reproducing provenance and paths forward.

For further information on the W3C PROV format:

Provenance Working Group

PROV at Semantic Web Wiki.

PROV Implementation Report (60 implementations as of 30 April 2013)

I first saw this in a tweet by Paul Groth.

December 17, 2013

RDF Nostalgia for the Holidays

Filed under: RDF,W3C — Patrick Durusau @ 4:07 pm

Three RDF First Public Working Drafts Published

From the announcement:

  • RDF 1.1 Primer, which explains how to use this language for representing information about resources in the World Wide Web.
  • RDF 1.1: On Semantics of RDF Datasets, which presents some issues to be addressed when defining a formal semantics for datasets, as they have been discussed in the RDF Working Group, and specify several semantics in terms of model theory, each corresponding to a certain design choice for RDF datasets.
  • What’s New in RDF 1.1

The drafts, which may someday become W3C Notes, could be helpful.

There isn’t as much RDF loose in the world as COBOL but what RDF does exist will keep the need for RDF instructional materials alive.

December 13, 2013

XSL TRANSFORMATIONS (XSLT) VERSION 3.0 [Last Call]

Filed under: W3C,XSLT — Patrick Durusau @ 8:42 pm

XSL TRANSFORMATIONS (XSLT) VERSION 3.0

From the post:

The XSLT Working Group has published today a Last Call Working Draft of XSL Transformations (XSLT) Version 3.0. This specification defines the syntax and semantics of XSLT 3.0, a language for transforming XML documents into other XML documents. A transformation in the XSLT language is expressed in the form of a stylesheet, whose syntax is well-formed XML. Comments are welcome by 10 February 2014. Learn more about the Extensible Markup Language (XML) Activity.

One of the successful activities at the W3C.

A fundamental part of your XML toolkit.

November 13, 2013

New Data Standard May Save Billions [Danger! Danger! Will Robinson]

Filed under: Standards,W3C — Patrick Durusau @ 7:56 pm

New Data Standard May Save Billions by Isaac Lopez.

When I read:

The international World Wide Web Consortium (W3C) is finalizing a new data standard that could lead to $3 billion of savings each year for the global web industry. The new standard, called the Customer Experience Digital Data Acquisition standard, aims to simplify and standardize data for such endeavors as marketing, analytics, and personalization across websites worldwide.

“At the moment, every technology ingests and outputs information about website visitors in a dizzying array of different formats,” said contributing company Qubit in a statement. “Every time a site owner wants to deploy a new customer experience technology such as web analytics, remarketing or web apps, overstretched development teams have to build a bespoke set of data interfaces to make it work, meaning site owners can’t focus on what’s important.”

The new standard aims to remove complexity by unifying the language that is used by marketing, analytics, and other such tools that are being used as part of the emerging big data landscape. According to the initial figures from customer experience management platform company (and advocate of the standard), Qubit, the savings from the increased efficiency could reach the equivalent of 0.1% of the global internet economy.

Of those benefitting the most from the standard, the United States comes in a clear winner, with savings that reach into the billions, with average savings per business in the tens of thousands of dollars.

I thought all my news feeds from, on and about the W3C had failed. I couldn’t recall any W3C standard work that resembled what was being described.

I did find it hosted at the W3C: Customer Experience Digital Data Community Group, where you will read:

The Customer Experience Digital Data Community Group will work on reviewing and upgrading the W3C Member Submission in Customer Experience Digital Data, starting with the Customer Experience Digital Data Acquisition submission linked here (http://www.w3.org/Submission/2012/04/). The group will also focus on developing connectivity between the specification and the Data Privacy efforts in the industry, including the W3C Tracking Protection workgroup. The goal is to upgrade the Member Submission specification via this Community Group and issue a Community Group Final Specification.

Where you will also read:

Note: Community Groups are proposed and run by the community. Although W3C hosts these conversations, the groups do not necessarily represent the views of the W3C Membership or staff. (emphasis added)

So, The international World Wide Web Consortium (W3C) is [NOT] finalizing a new data standard….

The W3C should not be attributed work it has not undertaken or approved.

November 8, 2013

Property Graphs Model and API Community Group

Filed under: Graphs,W3C — Patrick Durusau @ 5:00 pm

Property Graphs Model and API Community Group

From the webpage:

This group will explore the Property Graph data model and API and decide whether this area is ripe for standardization. Property Graphs are used to analyze social networks and in other Big Data applications using NoSQL databases.

The group may want to investigate several extensions to the data model. For example, should nodes be typed; what datatypes are allowed for property values; can properties have multiple values and should we add collection types such as sets and maps to the data model? At the same time, we need to bear in mind that there are several Property Graph vendors and implementations and we may not want to deviate significantly from current practice.

Existing Property Graph APIs are either navigational e.g. Tinkerpop or declarative e.g. Neo4j. For a W3C standard we may want to design a more HTTP and REST-oriented interface in the style of OData Protocol and OData URL Conventions. In this style, you construct URls for collections of nodes and edges. For example, a GET on http://server/nodes would return the collection of nodes on the server. A GET on http://server/nodes/in(type = ‘knows’ ) would return the collection of incoming arcs with type = ‘knows’ and a GET on http://server/nodes/out(type = ‘created’ ) would return the collection of outgoing arcs with type = ‘created’. Once a collection of nodes or arcs is selected with the URI, query operators can be used to add functions to select properties to be returned. Functions can also be used to return aggregate properties such as count and average.

The group will deliver a recommendation to the W3C regarding whether and how the Property Graph work should be taken forward towards standardization.

Potentially an interesting blend of property graphs and ODATA.

No W3C membership is required for this community group, so if you are interested, join the community!

Older Posts »

Powered by WordPress