In my day-to-day life I need to access a number of different intranet web services. Some of these are on my local work intranet, but I also need to be able to get to things like temporary services hosted on Grid’5000. These websites are not available on the public Internet so the standard procedure is to connect to a gateway that sits at the edge of the intranet and route traffic through that gateway using either a VPN or SSH socks proxy. This solution isn’t ideal, as it requires setting up the connection and possibly reconfiguring my browser whenever I want to access a site on one of these intranets. I can’t access work and Grid’5000 systems at the same time, and, when traffic is being routed through Grid’5000, I can’t access sites on the public Internet (this was a big problem when trying to troubleshoot a web application with my teammates by screen sharing over Google Hangouts).
I had a great time yesterday at the Workshop for Research Software Engineers put on by the Sustainable Software Institute and hosted at the Oxford e-Research Centre. While there, I had an interesting conversation with Ian Gent, writer of “The Recomputation Manifesto”. You should head over to the site and read some of Ian’s articles, particularly this one, as he’s put quite a lot of thought into the topic. The gist of the idea is as follows (apologies Ian for any misunderstandings, you should really go to his site after this one): 1) computational experiments are only valuable if they can be verified and validated, 2) in theory, it should be fairly easy to make computational science experiments, particularly small-scale computer science experiments, perfectly repeatable for all time, 3) in practice this is never/rarely done and reproduction/verification is really hard, 4) the best or possibly only way to accomplish this goal is to make sure that the entire environment is reproducible by packaging it in a virtual machine.
I have a few thoughts of my own on how we can better accomplish these goals after the break. The implementation of these ideas should be fairly simple and basically add up to extending or developing a few system utilities and systematically archiving distributions and updates on a service like figshare, but I believe that the benefits to the cause of improving the reproducibility of computational experiments would be enormous.
In our department there’s a course that includes lab sections on programming fluid dynamics simulations in OpenFOAM, which isn’t supported by the University’s IT services. We’ve traditionally gotten around this by installing the required environment in a virtual machine image and running it with VMWare under MS Windows, but there are infrastructural headaches associated with this approach: the image is several gigabytes in size, and so distributing to all of the machines and importing it into VMWare manually on a large number of machines is time consuming and tedious. The research fellow who runs these lab sections asked me to help him come up with a solution to this problem, so my suggestion was to create some minimal virtual machine images that users could download and import into VMware quickly, and have the root file system shared out from a server over NFS. I’ve decided to document the process of creating these images here, on the off chance it might turn out to be useful to someone else. The commands documented below were run on an Ubuntu 12.10 system.