From the post:
Google has a strong interest in promoting high quality systems research, and we believe that providing information about real-life workloads to the academic community can help.
In support of this we published a small (7-hour) sample of resource-usage information from a Google production cluster in 2010 (research blog on Google Cluster Data). Approximately a dozen researchers at UC Berkeley, CMU, Brown, NCSU, and elsewhere have made use of it.
Recently, we released a larger dataset. It covers a longer period of time (29 days) for a larger cell (about 11k machines) and includes significantly more information, including:
I remember Robert Barta describing the use of topic maps for systems administration. This data set could give some insight into the design of a topic map for cluster management.
What subjects and relationships would you recognize, how and why?
If you are looking for employment, this might be a good way to attract Google’s attention. (Hint to Google: Releasing interesting data sets could be a way to vet potential applicants in realistic situations.)