Archive for the ‘Data Attribution’ Category

For Attribution… [If One Identifier/URL isn’t enough]

Tuesday, November 27th, 2012

For Attribution — Developing Data Attribution and Citation Practices and Standards: Summary of an International Workshop by Paul F. Uhlir.

From the preface:

The growth of electronic publishing of literature has created new challenges, such as the need for mechanisms for citing online references in ways that can assure discoverability and retrieval for many years into the future. The growth in online datasets presents related, yet more complex challenges. It depends upon the ability to reliably identify, locate, access, interpret and verify the version, integrity, and provenance of digital datasets.

Data citation standards and good practices can form the basis for increased incentives, recognition, and rewards for scientific data activities that in many cases are currently lacking in many fields of research. The rapidly-expanding universe of online digital data holds the promise of allowing peer-examination and review of conclusions or analysis based on experimental or observational data, the integration of data into new forms of scholarly publishing, and the ability for subsequent users to make new and unforeseen uses and analyses of the same data – either in isolation, or in combination with other datasets.

The problem of citing online data is complicated by the lack of established practices for referring to portions or subsets of data. As funding sources for scientific research have begun to require data management plans as part of their selection and approval processes, it is important that the necessary standards, incentives, and conventions to support data citation, preservation, and accessibility be put into place.

Of particular interest are the four questions that shaped this workshop:

1. What is the status of data attribution and citation practices in the natural and social (economic and political) sciences in United States and internationally?

2. Why is the attribution and citation of scientific data important and for what types of data? Is there substantial variation among disciplines?

3. What are the major scientific, technical, institutional, economic, legal, and socio-cultural issues that need to be considered in developing and implementing scientific data citation standards and practices? Which ones are universal for all types of research and which ones are field or context specific?

4. What are some of the options for the successful development and implementation of scientific data citation practices and standards, both across the natural and social sciences and in major contexts of research?

The workshop did not presume a solution (is that a URL in your pocket?) but explores the complex nature of attribution and citation.

Michael Sperberg-McQueen remarks:

Longevity: Finally, there is the question of longevity. It is well known that the half-life of citations is much higher in humanities than in the natural sciences. We have been cultivating a culture of citation of referencing for about 2,000 years in the West since the Alexandrian era. Our current citation practice may be 400 years old. The http scheme, by comparison, is about 19 years old. It is a long reach to assume, as some do, that http URLs are an adequate mechanism for all citations of digital (and non-digital!) objects. It is not unreasonable for scholars to be skeptical of the use of URLs to cite data of any long-term significance, even if they are interested in citing the data resources they use. [pp. 63-64]

What I find the most attractive about topic maps is you can have:

  • A single URL as a citation/identifier.
  • Multiple URLs as citations/identifiers (for the same data resource).
  • Multiple URLs and/or other forms of citations/identifiers as they develop(ed) over time for the same data resource.

Why the concept of multiple citations/identifiers (quite common in biblical studies) for a single resource is so difficult I cannot explain.