I was rummaging around on the Opendata.dc.gov website today when I encountered Taxicab Trips (2.2 GB), described as:
DC Taxicab trip data from April 2015 to August 2016. Pick up and drop off locations are assigned to block locations with times rounded to the nearest hour. Detailed metadata included in download.The Department of For-Hire Vehicles (DFHV) provided OCTO with a taxicab trip text file representing trips from May 2015 to August 2016. OCTO processed the data to assign block locations to pick up and drop off locations.
For your convenience, I extracted README_DC_Taxicab_trip.txt and it gives the data structure of the files (“|” separated) as follows:
TABLE STRUCTURE: COLUMN_NAME DATA_TYPE DEFINITION OBJECTID NUMBER(9) Table Unique Identifier TRIPTYPE VARCHAR2(255) Type of Taxi Trip PROVIDER VARCHAR2(255) Taxi Company that Provided trip METERFARE VARCHAR2(255) Meter Fare TIP VARCHAR2(255) Tip amount SURCHARGE VARCHAR2(255) Surcharge fee EXTRAS VARCHAR2(255) Extra fees TOLLS VARCHAR2(255) Toll amount TOTALAMOUNT VARCHAR2(255) Total amount from Meter fare, tip, surcharge, extras, and tolls. PAYMENTTYPE VARCHAR2(255) Payment type PAYMENTCARDPROVIDER VARCHAR2(255) Payment card provider PICKUPCITY VARCHAR2(255) Pick up location city PICKUPSTATE VARCHAR2(255) Pick up location state PICKUPZIP VARCHAR2(255) Pick up location zip DROPOFFCITY VARCHAR2(255) Drop off location city DROPOFFSTATE VARCHAR2(255) Drop off location state DROPOFFZIP VARCHAR2(255) Drop off location zip TRIPMILEAGE VARCHAR2(255) Trip milaege TRIPTIME VARCHAR2(255) Trip time PICKUP_BLOCK_LATITUDE NUMBER Pick up location latitude PICKUP_BLOCK_LONGITUDE NUMBER Pick up location longitude PICKUP_BLOCKNAME VARCHAR2(255) Pick up location street block name DROPOFF_BLOCK_LATITUDE NUMBER Drop off location latitude DROPOFF_BLOCK_LONGITUDE NUMBER Drop off location longitude DROPOFF_BLOCKNAME VARCHAR2(255) Drop off location street block name AIRPORT CHAR(1) Pick up or drop off location is a local airport (Y/N) PICKUPDATETIME_TR DATE Pick up location city DROPOFFDATETIME_TR DATE Drop off location city
The taxi data files are zipped by the month:
Archive: taxitrip2015_2016.zip Length Date Time Name --------- ---------- ----- ---- 107968907 2016-11-29 14:27 taxi_201511.zip 117252084 2016-11-29 14:20 taxi_201512.zip 99545739 2016-11-30 11:15 taxi_201601.zip 129755310 2016-11-30 11:24 taxi_201602.zip 152793046 2016-11-30 11:31 taxi_201603.zip 148835360 2016-11-30 11:20 taxi_201604.zip 143734132 2016-11-30 11:19 taxi_201605.zip 139396173 2016-11-30 11:13 taxi_201606.zip 121112859 2016-11-30 11:08 taxi_201607.zip 104015666 2016-11-30 12:04 taxi_201608.zip 154623796 2016-11-30 11:03 taxi_201505.zip 161666797 2016-11-29 14:15 taxi_201506.zip 153483725 2016-11-29 14:32 taxi_201507.zip 121135328 2016-11-29 14:06 taxi_201508.zip 142098999 2016-11-30 10:55 taxi_201509.zip 160977058 2016-11-30 10:35 taxi_201510.zip 3694 2016-12-09 16:43 README_DC_Taxicab_trip.txt
I extracted taxi_201601.zip
, decompressed it and created a 10,000 line sample, named taxi-201601-10k.ods.
I was hopeful that taxi trip times might allow inference of traffic conditions but with rare exceptions, columns AA and AB record the same time.
Rats!
I’m sure there are other patterns you can extract from the data but inferring traffic conditions doesn’t appear to be one of those.
Or am I missing something obvious?
More posts about Opendata.dc.gov coming as I look for blockade information.
PS: I didn’t explore any month other than January of 2016, but it’s late and I will tend to that tomorrow.