Linked CSV by Jeni Tennison.
Abstract:
Many open data sets are essentially tables, or sets of tables, which follow the same regular structure. This document describes a set of conventions for CSV files that enable them to be linked together and to be interpreted as RDF.
An encouraging observation in the draft:
Linked CSV is built around the concept of using URIs to name things. Every record, column, and even slices of data, in a linked CSV file is addressable using URI Identifiers for the text/csv Media Type. For example, if the linked CSV file is accessed at
http://example.org/countries
, the first record in the CSV file above, which happens to be the first data line within the linked CSV file (which describes Andorra) is addressable with the URI:http://example.org/countries#row:0However, this addressing merely identifies the records within the linked CSV file, not the entities that the record describes. This distinction is important for two reasons:
- a single entity may be described by multiple records within the linked CSV file
- addressing entities and records separately enables us to make statements about the source of the information within a particular record
By default, each data line describes an entity, each entity is described by a single data line, and there is no way to address the entities. However, adding a
$id
column enables entities to be given identifiers. These identifiers are always URIs, and they are interpreted relative to the location of the linked CSV file. The$id
column may be positioned anywhere but by convention it should be the first column (unless there is a#
column, in which case it should be the second). For example:
Hopefully Jeni is setting a trend in Linked Data circles of distinguishing locations from entities.
I first saw this in Christophe Lalanne’s A bag of tweets / April 2013.