Implementing Reproducible Research edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng.
From the webpage:
In many of today’s research fields, including biomedicine, computational tools are increasingly being used so that the results can be reproduced. Researchers are now encouraged to incorporate software, data, and code in their academic papers so that others can replicate their research results. Edited by three pioneers in this emerging area, this book is the first one to explore this groundbreaking topic. It presents various computational tools useful in research, including cloud computing, data repositories, virtual machines, R’s Sweave function, XML-based programming, and more. It also discusses legal issues and case studies.
There is a growing concern over the ability of scientists to reproduce the published results of other scientists. The Economist rang one of many alarm bells when it published: Trouble at the lab [Data Skepticism].
From the introduction to Reproducible Research:
Literate statistical programming is a concept introduced by Rossini () that builds on the idea of literate programming as described by Donald Knuth. With literate statistical programming, one combines the description of a statistical analysis and the code for doing the statistical analysis into a single document. Subsequently, one can take the combined document and produce either a human-readable document (i.e. PDF) or a machine readable code file. An early implementation of this concept was the Sweave system of Leisch which uses R as its programming language and LATEX as its documentation language (). Yihui Xie describes his knitr package which builds substantially on Sweave and incorporates many new ideas developed since the initial development of Sweave. Along these lines, Tanu Malik and colleagues describe the Science Object Linking and Embedding framework for creating interactive publications that allow authors to embed various aspects of computational research in document, creating a complete research compendium. Tools
Of course, we all cringe when we read that a drug company can reproduce only 1/4 of 67 “seminal” studies.
What has me curious is why we don’t have the same reaction when enterprise IT systems require episodic remapping, which requires the mappers to relearn what was known at the time of the last remapping? We all know that enterprise (and other) IT systems change and evolve, but practically speaking, no effort is make to capture the knowledge that would reduce the time, cost and expense of every future remapping.
We can see the expense and danger of science not being reproducible, but when our own enterprise data mappings are not reproducible, that’s just the way things are.
Take inspiration from the movement towards reproducible science and work towards reproducible semantic mappings.
I first saw this in a tweet by Victoria Stodden.