googleclusterdata

First Steps Exploring the Google Cluster dataset with iPython

A lot of my research is based around developing and testing heuristic algorithms for various computation problems, in particular, I try to develop and adapt vector-packing based heuristics for scheduling virtual machines in distributed heterogeneous environments. Toward this end, I've spent a fair amount of time investigating the large data trace released by google in 2011. The trace is available through Google Cloud Storage, and can be downloaded using the gsutil utility, which can be installed in a python virtual environment via pypi/pip.