Google BigQuery Public Datasets

Google BigQuery Public Datasets

An amazing set of public datasets, from the post:

  • : A Social Security Administration dataset that contains all names from Social Security card applications for births that occurred in the United States after 1879.
  • : Data collected by the NYC Taxi and Limousine Commission (TLC) that includes trip records from all trips completed in yellow and green taxis in NYC from 2009 to 2015.
  • : A dataset that contains all stories and comments from Hacker News since its launch in 2006.
  • : A dataset published by the US Department of Health and Human Services that includes all weekly surveillance reports of nationally notifiable diseases for all U.S. cities and states published between 1888 and 2013.
  • : A dataset that contains 3.5 million digitized books stretching back two centuries, encompassing the complete English-language public domain collections of the Internet Archive (1.3M volumes) and HathiTrust (2.2 million volumes).
  • : This public dataset was created by the National Oceanic and Atmospheric Administration (NOAA) and includes global data obtained from the USAF Climatology Center. This dataset covers GSOD data between 1929 and 2016, collected from over 9000 stations.

I can readily see myself loosing serious time in the GDELT Book Corpus!


Comments are closed.