Archive for the ‘RDF Data Cube Vocabulary’ Category

LSD Dimensions

Monday, October 20th, 2014

LSD Dimensions

From the about page: http://lsd-dimensions.org/dimensions

LSD Dimensions is an observatory of the current usage of dimensions and codes in Linked Statistical Data (LSD).

LSD Dimensions is an aggregator of all qb:DimensionProperty resources (and their associated triples), as defined in the RDF Data Cube vocabulary (W3C recommendation for publishing statistical data on the Web), that can be currently found in the Linked Data Cloud (read: the SPARQL endpoints in Datahub.io). Its purpose is to improve the reusability of statistical dimensions, codes and concept schemes in the Web of Data, providing an interface for users (future work: also for programs) to search for resources commonly used to describe open statistical datasets.

Usage

The main view shows the count of queried SPARQL endpoints and the number of retrieved dimensions, together with a table that displays these dimensions.

  • Sorting. Dimensions can be sorted by their dimension URI, label and number of references (i.e. number of times a dimension is used in the endpoints) by clicking on the column headers.
  • Pagination. The number of rows per page can be customized and browsed by clicking at the bottom selectors.
  • Search. String-based search can be performed by writing the search query in the top search field.

Any of these dimensions can be further explored by clicking at the eye icon on the left. The dimension detail view shows

  • Endpoints.. The endpoints that make use of that dimension.
  • Codes. Popular codes that are defined (future work: also assigned) as valid values for that dimension.

Motivation

RDF Data Cube (QB) has boosted the publication of Linked Statistical Data (LSD) as Linked Open Data (LOD) by providing a means “to publish multi-dimensional data, such as statistics, on the web in such a way that they can be linked to related data sets and concepts”. QB defines cubes as sets of observations affected by dimensions, measures and attributes. For example, the observation “the measured life expectancy of males in Newport in the period 2004-2006 is 76.7 years” has three dimensions (time period, with value 2004-2006; region, with value Newport; and sex, with value male), a measure (population life expectancy) and two attributes (the units of measure, years; and the metadata status, measured, to make explicit that the observation was measured instead of, for instance, estimated or interpolated). In some cases, it is useful to also define codes, a closed set of values taken by a dimension (e.g. sensible codes for the dimension sex could be male and female).

There is a vast diversity of domains to publish LSD about, and quite some dimensions and codes can be very heterogeneous, domain specific and hardly comparable. To this end, QB allows users to mint their own URIs to create arbitrary dimensions and associated codes. Conversely, some other dimensions and codes are quite common in statistics, and could be easily reused. However, publishers of LSD have no means to monitor the dimensions and codes currently used in other datasets published in QB as LOD, and consequently they cannot (a) link to them; nor (b) reuse them.

This is the motivation behind LSD Dimensions: it monitors the usage of existing dimensions and codes in LSD. It allows users to browse, search and gain insight into these dimensions and codes. We depict the diversity of statistical variables in LOD, improving their reusability.

(Emphasis added.)

The highlighted text:

There is a vast diversity of domains to publish LSD about, and quite some dimensions and codes can be very heterogeneous, domain specific and hardly comparable.

is the key isn’t it? If you can’t rely on data titles, users must examine the data and determine which sets can or should be compared.

The question then is how do you capture the information such users developed in making those decisions and pass it on to following users? Or do you just allow following users make their own way afresh?

If you document the additional information for each data set, by using a topic map, each use of this resource becomes richer for the following users. Richer or stays the same. Your call.

I first saw this in a tweet by Bob DuCharme. Who remarked this organization has a great title!

If you have made it this far, you realize that with all the data set, RDF and statistical language this isn’t the post you were looking for. 😉

PS: Yes Bob, it is a great title!

RDF Data Cube Vocabulary [Last Call ends 08 April 2013]

Tuesday, March 12th, 2013

RDF Data Cube Vocabulary

Abstract:

There are many situations where it would be useful to be able to publish multi-dimensional data, such as statistics, on the web in such a way that it can be linked to related data sets and concepts. The Data Cube vocabulary provides a means to do this using the W3C RDF (Resource Description Framework) standard. The model underpinning the Data Cube vocabulary is compatible with the cube model that underlies SDMX (Statistical Data and Metadata eXchange), an ISO standard for exchanging and sharing statistical data and metadata among organizations. The Data Cube vocabulary is a core foundation which supports extension vocabularies to enable publication of other aspects of statistical data flows or other multi-dimensional data sets.

If you have comments, now would be a good time to finish them up for submission.

I first saw this in a tweet by Sandro Hawke.

New draft standard XKOS developed at Dagstuhl workshop

Saturday, November 17th, 2012

New draft standard XKOS developed at Dagstuhl workshop

From the post:

The AIMS team as part of its work in promoting good practices in information management participated in the development of the new draft standard XKOS at the Dagstuhl workshop “Semantic Statistics for Social, Behavioural, and Economic Sciences: Leveraging the DDI Model for the Linked Data Web” in Wadern, Germany, October 15-19, 2012. XKOS is an extension to the popular Simple Knowledge Organization System (SKOS), a W3C Recommendation, to meet the needs of classification schemes.

Improving visibility & discoverability of statistical data

XKOS is designed to facilitate the interoperability of micro and macro data both within and without the statistics domain and to be complementary to existing standards such as SDMX, DDI and RDF Data Cube. This proposed extension to SKOS may well become the basis for improving the visibility and discoverability of statistical data on the semantic web as well as a mechanism to maintain and disseminate classification schemes according to a standard, cross-domain, machine-readable format.

Acronym Safety Zone:

SDMX – Statistical Data and Metadata eXchange

Data Documentation Initiative

RDF Data Cube

Apologies but I was unable to find a draft of XKOS for a link. Do be aware that is also the acronym for the Korean stock exchange. 😉

Statistical Data and Metadata eXchange (SDMX)

Friday, April 20th, 2012

Statistical Data and Metadata eXchange (SDMX)

SDMX is the core information model that informs the vocabulary of the RDF Data Cube Vocabulary.

It isn’t clear in working draft of 05 April 2011, which version of the SDMX materials informs the RDF Data Cube Vocabulary work.

You may also be interested in SDMX pages on domains where statistical work is ongoing, implementations and tools.

On SDMX in general:

SDMX 2.1 Technical Specification

Section 1 – Framework. Introduces the documents and the content of the revised Version 2.1

Section 2 – Information Model. UML model and functional description, definition of classes, associations and attributes

Section 3A – SDMX_ML. Specifies and documents the XML formats for describing structure, data, reference metadata, and interfaces to the registry

Section 3B – SDMX-ML. XML schemas, samples, WADL and WSDL (update: 12 May 2011)

Section 4 – SDMX-EDI. Specifies and documents the UN/EDIFACT format for describing structure and data.

Section 5 – Registry Specification – Logical Interfaces. Provides the specification for the logical registry interfaces, including subscription/notification, registration of data and metadata, submission of structural metadata, and querying

Section 6 – Technical Notes. Provides some technical information which may be useful for the implementation (this was called “Implementor’s Guide” in the 2.0 release)

Section 7 – Web Services Guidelines. Provides guidelines for using SDMX standards to promote interoperability among SDMX web services

ZIP file of all the documents: SDMX 2.1 ALL SECTIONS

And SDMX concepts:

The SDMX Content-Oriented Guidelines recommend practices for creating interoperable data and metadata sets using the SDMX technical standards. They are intended to be applicable to all statistical subject-matter domains. The Guidelines focus on harmonising specific concepts and terminology that are common to a large number of statistical domains. Such harmonisation is useful for achieving an even more efficient exchange of comparable data and metadata, and builds on the experience gained in implementations to date.

Content-Oriented Guidelines

The Guidelines are supplemented by five annexes:

Annex 1 – Cross-Domain Concepts
Annex 2 – Cross-Domain Code Lists
Annex 3 – Statistical Subject-Matter Domains
Annex 4 – Metadata Common Vocabulary
Annex 5 – SDMX-ML for Content-Oriented Guidelines (zip file)

Additional information is provided in the following files:

  1. Mapping of SDMX Cross-Domain Concepts to metadata frameworks at international organisations (IMF-Data Quality Assessment Framework, Eurostat-SDMX Metadata Structure and OECD-Metastore)
  2. Use of Cross-Domain Concepts in Data and Metadata Structure Definitions
  3. A disposition log of comments and suggestions directly received by the SDMX Secretariat.

The RDF Data Cube Vocabulary

Monday, April 16th, 2012

The RDF Data Cube Vocabulary

A new draft from the W3C, adapting existing data cube vocabularies into an RDF representation.

The proposal re-uses several other vocabularies that I will be covering separately.

There are several open issues so read carefully.


What do you make of: The RDF Data Cube Vocabulary? I haven’t run diffs on it, yet.