Where Does the Data Go?

A brief editorial on The Availability of Research Data Declines Rapidly with Article Age by Timothy H. Vines, et.al., which reads in part:

A group of researchers in Canada examined 516 articles published between 1991 and 2011, and “found that availability of the data was strongly affected by article age.” For instance, the team reports that the odds of finding a working email address associated with a paper decreased by 7 percent each year and that the odds of an extant dataset decreased by 17 percent each year since publication. Some data was technically available, the researchers note, but stored on floppy disk or on zip drives that many researchers no longer have the hardware to access.

The one of highlights of the article (which appears in Current Biology) reads:

Broken e-mails and obsolete storage devices were the main obstacles to data sharing

Curious because I would have ventured that semantic drift over twenty (20) years would have been a major factor as well.

Then I read the paper and discovered:

To avoid potential confounding effects of data type and different research community practices, we focused on recovering data from articles containing morphological data from plants or animals that made use of a discriminant function analysis (DFA). [Under Results, the online edition has no page numbers]

The authors appeared to have dodged the semantic bullet by the selection of data and their non-reporting of difficulties, if any, in using the data (19.5%) that was shared by the original authors.

Preservation of data is a major concern for researchers but I would urge that the semantics of data be preserved as well.

Imagine that feeling when you “ls -l” a directory and recognize only some of the file names writ large. Writ very large.

