Tools for Reproducible Research by Karl Broman.
From the post:
A minimal standard for data analysis and other scientific computations is that they be reproducible: that the code and data are assembled in a way so that another group can re-create all of the results (e.g., the figures in a paper). The importance of such reproducibility is now widely recognized, but it is still not so widely practiced as it should be, in large part because many computational scientists (and particularly statisticians) have not fully adopted the required tools for reproducible research.
In this course, we will discuss general principles for reproducible research but will focus primarily on the use of relevant tools (particularly make, git, and knitr), with the goal that the students leave the course ready and willing to ensure that all aspects of their computational research (software, data analyses, papers, presentations, posters) are reproducible.
As you already know, there is a great deal of interest in making scientific experiments reproducible in fact as well as in theory.
At the time time, there has been an increasing interest in reproducible data analysis as it concerns the results from reproducible experiments.
One logically follows on from the other.
Of course, reproducible data analysis as far as any combination of data from different sources, would simply cookie cutter follow the combining of data in a reported experiment.
But what if a user wants to replicate the combining (mapping) of data with other data? From different sources? That could be followed by rote by others but they would not know the underlying basis for the choices made in the mapping.
Experiments take a great deal of effort to identify the substances used in an experiment. When data is combined from different sources, why not do the same for the data?
I first saw this in a tweet by YihuI Xie.