FCC Net Neutrality Plan – 800,000 Comments

What can we learn from 800,000 public comments on the FCC’s net neutrality plan? by Bob Lannon and Andrew Pendleton.

From the post:

On Aug. 5, the Federal Communications Commission announced the bulk release of the comments from its largest-ever public comment collection. We’ve spent the last three weeks cleaning and preparing the data and leveraging our experience in machine learning and natural language processing to try and make sense of the hundreds-of-thousands of comments in the docket. Here is a high-level overview, as well as our cleaned version of the full corpus which is available for download in the hopes of making further research easier.

A great story of cleaning dirty data. Beyond eliminating both Les Misérables and War and Peace as comments, the authors detected statements by experts, form letters, etc.

If you’re interested in doing your own analysis with this data, you can download our cleaned-up versions below. We’ve taken the six XML files released by the FCC and split them out into individual files in JSON format, one per comment, then compressed them into archives, one for each of XML file. Additionally, we’ve taken several individual records from the FCC data that represented multiple submissions grouped together, and split them out into individual files (these JSON files will have hyphens in their filenames, where the value before the hyphen represents the original record ID). This includes email messages to openinternet@fcc.gov, which had been aggregated into bulk submissions, as well as mass submissions from CREDO Mobile, Sen. Bernie Sanders’ office and others. We would be happy to answer any questions you may have about how these files were generated, or how to use them.

All the code use in the project is available at: https://github.com/sunlightlabs/fcc-net-neutrality-comments

I first saw this in a tweet by Scott Chamberlain.

Comments are closed.