From the post:
There are only really two ways to preserve your statistical analyses. You either save the variables that you create, or you save the code that you used to create them. In general the latter is much preferred because at some point you’ll realise that your model was wrong, or your dataset has changed, and you need to re-run your analysis. If you only stored your variables then you are now stuck rewriting your code in order to create new versions, which is really not fun. On the other hand, if you saved your code, all your have to do is tweak it and run it.
Occasionally though, just keeping the code and rerunning an analysis isn’t practical. The most obvious case being when it takes a long time. If your model takes more than ten minutes to run, it can be really useful to save its variables as well as the source code.
The problem with saving variables is that when you come back and load them six months later, it isn’t always obvious what they are or where they came from. With code, we solve this by using comments to jog our memory, so it would be nice to have an equivalent for variables. In fact, in R, such a facility exists with the – you guessed it – comment function.
library(lattice)
comment(barley) <- "Immer's barley data, 1934. The data from the Morris site may have the wrong years." comment(barley)The comment function simply stores the string as an attribute of the variable, with some special rules on printing. Other common attributes that you may be familiar with are names for vectors and lists, and dim and dimnames for matrices.
Used here to store information about variables but no apparent barrier to storing information about other parts of a program.
With a little structure, this could become the "just enough" semantic data to make re-use and interchange possible.