Four Drafts Published by the CSV on the Web Working Group
From the post:
The CSV on the Web Working Group has published a new set of Working Drafts, which the group considers feature complete and implementable.
- Model for Tabular Data and Metadata on the Web: an abstract model for tabular data, and how to locate metadata that enables users to better understand what the data holds; this specification also contains non-normative guidance on how to parse CSV files
- Metadata Vocabulary for Tabular Data: a JSON-based format for expressing metadata about tabular data to inform validation, conversion, display and data entry for tabular data
- Generating JSON from Tabular Data on the Web: how to convert tabular data into JSON
- Generating RDF from Tabular Data on the Web: how to convert tabular data into RDF
The group is keen to receive comments on these specifications, either as issues on the Group’s GitHub repository or by posting to public-csv-wg-comment target=”_blank”s@w3.org.
The CSV on the Web Working Group would also like to invite people to start implementing these specifications and to donate their test cases into the group’s test suite. Building this test suite, as well as responding to comments, will be the group’s focus over the next couple of months.
Learn more about the CSV on the Web Working Group.
If nothing else, Model for Tabular Data and Metadata on the Web, represents a start on documenting a largely undocumented data format. Perhaps the most common undocumented data format of all. I say that, there may be a dissertation or even a published book that has collected all the CSV variants at a point in time. Sing out if you know of such a volume.
Will be interested to see if the group issues a work product entitled: Generating CSV from RDF on the Web. Our return to data opacity will be complete.
Not that I have any objections to CSV, compact, easy to parse (well, perhaps not correctly but I didn’t say that), widespread, but it isn’t the best format for blind interchange, which by definition includes legacy data. Odd that in a time of practically limitless storage and orders of magnitude faster processing, that data appears to be seeking less documented semantics.
Or is that the users and producers of data prefer data with less documented semantics? I knew an office manager once upon a time who produced reports from unshared cheatsheets using R Writer. Was loathe to share field information with anyone. Seemed like a very sad way to remain employed.
Documenting data semantics isn’t going to obviate the need for the staffs who previously concealed data semantics. Just as data is dynamic so are its semantics which will require the same staffs to continually update and document the changing semantics of data. Same staffs for the most part, just more creative duties.