Linux Distribution for Recomputable Experiments

I had a great time yesterday at the Workshop for Research Software Engineers put on by the Sustainable Software Institute and hosted at the Oxford e-Research Centre. While there, I had an interesting conversation with Ian Gent, writer of “The Recomputation Manifesto”. You should head over to the site and read some of Ian’s articles, particularly this one, as he’s put quite a lot of thought into the topic. The gist of the idea is as follows (apologies Ian for any misunderstandings, you should really go to his site after this one): 1) computational experiments are only valuable if they can be verified and validated, 2) in theory, it should be fairly easy to make computational science experiments, particularly small-scale computer science experiments, perfectly repeatable for all time, 3) in practice this is never/rarely done and reproduction/verification is really hard, 4) the best or possibly only way to accomplish this goal is to make sure that the entire environment is reproducible by packaging it in a virtual machine. I have a few thoughts of my own on how we can better accomplish these goals after the break. The implementation of these ideas should be fairly simple and basically add up to extending or developing a few system utilities and systematically archiving distributions and updates on a service like figshare, but I believe that the benefits to the cause of improving the reproducibility of computational experiments would be enormous.

Repetition of Research (Unintended)

Most of us are familiar with the classical narrative of ongoing scientific progress: that each new discovery builds upon previous ones, creating an ever upward-rising edifice of human knowledge shaped something like an inverted pyramid. There’s also an idea that, in the semi-distant past of a few hundred years ago one person could know all the scientific knowledge of his (or her) time, but today the vast and ever-expanding amount of information available means that scientists must be much more specialized, and that as time passes they will become ever more specialized. There is some truth to these ideas, but there are problems as well: when new knowledge is created, how is it added to the edifice? How do we make sure that future scholars will know about it and properly reference it in their own works? If a scientist must be incredibly specialized to advance knowledge, then what does he (or she) do when just starting out? How does one choose a field of research? And what happens when the funding for research into that area dries up? Contrary to what we learned in grade school, a scientist cannot choose to simply study, say, some obscure species of Peruvian moth and spend the next 40 years of summers in South America learning everything there is to know about it without also spending some time justifying that decision to colleagues and funding bodies.