Washington Taxi Data 2015 – 2016 (Caution: 2.2 GB File Size)

I was rummaging around on the Opendata.dc.gov website today when I encountered Taxicab Trips (2.2 GB), described as:

DC Taxicab trip data from April 2015 to August 2016. Pick up and drop off locations are assigned to block locations with times rounded to the nearest hour. Detailed metadata included in download.The Department of For-Hire Vehicles (DFHV) provided OCTO with a taxicab trip text file representing trips from May 2015 to August 2016. OCTO processed the data to assign block locations to pick up and drop off locations.

For your convenience, I extracted README_DC_Taxicab_trip.txt and it gives the data structure of the files (“|” separated) as follows:


OBJECTID	NUMBER(9)	Table Unique Identifier	   
TRIPTYPE	VARCHAR2(255)	Type of Taxi Trip	   
PROVIDER	VARCHAR2(255)	Taxi Company that Provided trip	   
METERFARE	VARCHAR2(255)	Meter Fare	   
TIP	VARCHAR2(255)	Tip amount	   
SURCHARGE	VARCHAR2(255)	Surcharge fee	   
EXTRAS	VARCHAR2(255)	Extra fees	   
TOLLS	VARCHAR2(255)	Toll amount	   
TOTALAMOUNT	VARCHAR2(255)	Total amount from Meter fare, tip, 
                                surcharge, extras, and tolls. 	   
PAYMENTTYPE	VARCHAR2(255)	Payment type	   
PAYMENTCARDPROVIDER	VARCHAR2(255)	Payment card provider	   
PICKUPCITY	VARCHAR2(255)	Pick up location city	   
PICKUPSTATE	VARCHAR2(255)	Pick up location state	   
PICKUPZIP	VARCHAR2(255)	Pick up location zip	   
DROPOFFCITY	VARCHAR2(255)	Drop off location city	   
DROPOFFSTATE	VARCHAR2(255)	Drop off location state	   
DROPOFFZIP	VARCHAR2(255)	Drop off location zip	   
TRIPMILEAGE	VARCHAR2(255)	Trip milaege	   
TRIPTIME	VARCHAR2(255)	Trip time	   
PICKUP_BLOCK_LATITUDE	NUMBER	Pick up location latitude	   
PICKUP_BLOCK_LONGITUDE	NUMBER	Pick up location longitude	   
PICKUP_BLOCKNAME	VARCHAR2(255)	Pick up location street block name	   
DROPOFF_BLOCK_LATITUDE	NUMBER	Drop off location latitude	   
DROPOFF_BLOCK_LONGITUDE	NUMBER	Drop off location longitude	   
DROPOFF_BLOCKNAME	VARCHAR2(255)	Drop off location street block name	   
AIRPORT	CHAR(1)	Pick up or drop off location is a local airport (Y/N)	   
PICKUPDATETIME_TR	DATE	Pick up location city	   
DROPOFFDATETIME_TR	DATE	Drop off location city	 

The taxi data files are zipped by the month:

Archive:  taxitrip2015_2016.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
107968907  2016-11-29 14:27   taxi_201511.zip
117252084  2016-11-29 14:20   taxi_201512.zip
 99545739  2016-11-30 11:15   taxi_201601.zip
129755310  2016-11-30 11:24   taxi_201602.zip
152793046  2016-11-30 11:31   taxi_201603.zip
148835360  2016-11-30 11:20   taxi_201604.zip
143734132  2016-11-30 11:19   taxi_201605.zip
139396173  2016-11-30 11:13   taxi_201606.zip
121112859  2016-11-30 11:08   taxi_201607.zip
104015666  2016-11-30 12:04   taxi_201608.zip
154623796  2016-11-30 11:03   taxi_201505.zip
161666797  2016-11-29 14:15   taxi_201506.zip
153483725  2016-11-29 14:32   taxi_201507.zip
121135328  2016-11-29 14:06   taxi_201508.zip
142098999  2016-11-30 10:55   taxi_201509.zip
160977058  2016-11-30 10:35   taxi_201510.zip
     3694  2016-12-09 16:43   README_DC_Taxicab_trip.txt

I extracted taxi_201601.zip, decompressed it and created a 10,000 line sample, named taxi-201601-10k.ods.

I was hopeful that taxi trip times might allow inference of traffic conditions but with rare exceptions, columns AA and AB record the same time.


I’m sure there are other patterns you can extract from the data but inferring traffic conditions doesn’t appear to be one of those.

Or am I missing something obvious?

More posts about Opendata.dc.gov coming as I look for blockade information.

PS: I didn’t explore any month other than January of 2016, but it’s late and I will tend to that tomorrow.

Comments are closed.