How effective is your research programming workflow? by Philip Guo.
From the post:
For my Ph.D. dissertation, I investigated research programming, a common type of programming activity where people write computer programs to obtain insights from data. Millions of professionals in fields ranging from science, engineering, business, finance, public policy, and journalism, as well as numerous students and computer hobbyists, all perform research programming on a daily basis.
Inspired by The Joel Test for rating software engineering teams, here is my informal “Philip test” to determine whether your research programming workflow is effective:
- Do you have reliable ways of taking, organizing, and reflecting on notes as you’re working?
- Do you have reliable to-do lists for your projects?
- Do you write scripts to automate repetitive tasks?
- Are your scripts, data sets, and notes backed up on another computer?
- Can you quickly identify errors and inconsistencies in your raw data sets?
- Can you write scripts to acquire and merge together data from different sources and in different formats?
- Do you use version control for your scripts?
- If you show analysis results to a colleague and they offer a suggestion for improvement, can you adjust your script, re-run it, and produce updated results within an hour?
- Do you use assert statements and test cases to sanity check the outputs of your analyses?
- Can you re-generate any intermediate data set from the original raw data by running a series of scripts?
- Can you re-generate all of the figures and tables in your research paper by running a single command?
- If you got hit by a bus, can one of your lab-mates resume your research where you left off with less than a week of delay?
…
Philip suggests a starting point in his post.
His post alone is pure gold I would say.
Came to this by following a tweet by Neil Saunders that pointed to: How effective is my research programming workflow? The Philip Test – Part 1 and from there I found the link to Philips post.
This sounds a lot like the recent controversy over the ability to duplicate research published in scientific journals. Can someone else replicate your results?