Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 25, 2016

Cops Driving Cabs – Not Just Moonlighting (Awk)

Filed under: Awk,Government,Politics — Patrick Durusau @ 4:57 pm

NYPD has at least five undercover ‘Cop Cabs’ by Matthew Guariglia.

Matthew walks you through the process of inferring the New York Police department has at least five (5) vehicles that look like taxi cabs.

Or at least they have taxi cab emblems.

A patrol car with a taxi cab emblem would look out of place.

A good lesson in persistence, asking more than one source and collating information.

Just for grins, I downloaded the Medallion Vehicles – Authorized file as a CSV file, said to contain 14265362 lines and as of today, runs a little over 2 GB.

I was curious about was under what name did the TLC issue cop medallions? Unlikely to have added them to a third-party account because of property tax issues. Would they have made up different owners for each of the five medallions? Or would they use a common owner for all five medallions?

Possible that they created the five medallions “off the books,” but that seems unlikely as well. They would want to tie them to license plates.

First observation on the data: The “name” field appears variously with enclosing quotes and no quotes at all.

For example:

License Number,Name,Expiration Date,Current Status,DMV License Plate Number,
Vehicle VIN Number,Vehicle Type,Model Year,Medallion Type,Agent Number,
Agent Name,Agent Telephone Number,Agent Website Address,Agent Address,
Last Date Updated,Last Time Updated

...

8E94,"SINGH,BAGICHA",MEDALLION,CUR,8E94H,1FMCU4K35BKA45650,HYB,2011,OWNER 
MUST DRIVE,000,,,,,03/12/2014,13:20
7A19,"SKLAVOUNAKIS, IOANNIS",MEDALLION,CUR,7A19A,JTDKN3DU4A0234023,HYB,2010,
NAMED DRIVER,000,,,,,03/03/2014,13:20
5E85,FIRST KD INC.,MEDALLION,CUR,5E85H,1N4CL21E39C141012,HYB,2009,OWNER 
MUST DRIVE,000,,,,,05/24/2014,13:20
8V19,GABBI CAB CORP,MEDALLION,CUR,8V19B,5TDZK3DC4CS218712,WAV,2012,NAMED 
DRIVER,234,WOODSIDE MANAGEMENT INC.,(718)8999369,,4913 ROOSEVELT AVENUE 
WOODSIDE NY 11377,01/21/2014,13:20
2V31,TIGER TAXI LLC,MEDALLION,CUR,2V31A,4T1BD1FK0EU127547,HYB,2014,NAMED 
DRIVER,000,,,,,02/25/2015,13:20
2V31,TIGER TAXI LLC,MEDALLION,CUR,2V31A,4T1BD1FK0EU127547,HYB,2014,NAMED 
DRIVER,000,,,,,02/25/2015,13:20
5J36,KFAR TAXI INC,MEDALLION,CUR,5J36B,4T1BD1FK9CU029209,HYB,2012,NAMED 
DRIVER,202,,,,,09/21/2013,13:20
3E13,"BABAEV, MAXIM",MEDALLION,CUR,3E13A,1FMCU49H38KC93552,HYB,2008,OWNER 
MUST DRIVE,0,,,,,07/19/2013,13:20

This data snippet has no significance other than the variation in the name field and the fields of the CSV file.

I used awk to extract the name field to a separate file:

awk 'BEGIN { FS = "," }; { print $2 }' < Medallion__Vehicles_-_Authorized.csv > taxi-names

Then I sorted that file and used uniq plus -c (for count), to create a sorted list of the names with the number of times they occur.

sort < tax-names | uniq -c > taxi-unique-names

You will pickup a lot of data entry errors in this view, extra space in a name, etc.

Then because I am interested in names that occur only five (5) times, I re-sort the file to list names by the number of time they occur (this loses the view that reviews data entry errors):

sort -bn < taxi-unique-names > taxi-by-number

The -bn switches tell sort to ignore leading spaces and to sort in numeric order.

I appreciate New York making this available as “open data” but the interface has a number of limitations.

Another way to approach Matthew’s question is to sort on the addresses, assuming TLC is billing a cop address and not 1060 West Addison. 😉

I haven’t tried this but checking the property tax rolls against the TLC records might be way to ferret out the cop driven taxis. Unless the city has someone paying the taxes for them. Along with the usual graft, who would know?

Other ideas or suggestions to help Matthew flush out these cop driven taxis?

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress