Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 11, 2015

In Praise of CSV

Filed under: CSV,Dark Data — Patrick Durusau @ 3:23 pm

In Praise of CSV by Waldo Jaquith

From the post:

Comma Separated Values is the file format that open data advocates love to hate. Compared to JSON, CSV is clumsy; compared to XML, CSV is simplistic. Its reputation is as a tired, limited, it’s-better-than-nothing format. Not only is that reputation is undeserved, but CSV should often be your first choice when publishing data.

It’s true—CSV is tired and limited, though superior to not having data, but there’s another side to those coins. One man‘s tired is another man’s established. One man’s limited is another man’s focused. And “better than nothing” is in, fact, better than nothing, which is frequently the alternative to producing CSV.

A bit further on:


The lack of typing makes schemas generally impractical, and as a result validation of field contents is also generally impractical.

There is ongoing work to improve that situation at the CSV on the Web Working Group (W3C). As of today, see: Metadata Vocabulary for Tabular Data, W3C Editor’s Draft 11 March 2015.

The W3C work is definitely a step in the right direction but even if you “know” a field heading or its data type, do you really “know” the semantics of that field? Assume you have a floating point number, is that “pound-seconds” or “newton-seconds?” Mars orbiters really need to know.

Perhaps CSV files are nearly the darkest dark data with a structure. Even with field names and data types, the semantics of any field and/or its relationship to other fields, remains a mystery.

It may be the case that within a week, month or year, someone may remember the field semantics but what of ten (10) years or even one hundred (100) years from now?

3 Comments

  1. Patrick,

    You may want to go through the last few days of posts and check that your links are going where you intend; I’m clicking on a lot of links in your posts via my RSS reader and getting directed to your blog instead of the post you’re discussing.

    Comment by marijane — March 11, 2015 @ 5:11 pm

  2. Thanks for the alert!

    I upgraded (read changed) how I was entering data.

    Forgot to upgrade/change the user as well. 😉 I think I have caught most of them for the last several days. I need to do a full audit on the system sometime soon.

    Comment by Patrick Durusau — March 11, 2015 @ 8:13 pm

  3. Here is an interesting chart of the trends of JSON vs CSV

    For those that really do prefer CSV over JSON, here is a good JSON to CSV Converter

    Comment by yuland — March 14, 2015 @ 4:53 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress