Chris Albon’s collection of 885,222 tweets (ids only) for the third presidential debate of 2016 proves bad design decisions aren’t only made inside the Capital Beltway.
Chris could not post his tweet collection, only the tweet ids under Twitter’s terms of service.
The terms of service reference the Developer Policy and under that policy you will find:
…
F. Be a Good Partner to Twitter1. Follow the guidelines for using Tweets in broadcast if you display Tweets offline.
2. If you provide Content to third parties, including downloadable datasets of Content or an API that returns Content, you will only distribute or allow download of Tweet IDs and/or User IDs.
a. You may, however, provide export via non-automated means (e.g., download of spreadsheets or PDF files, or use of a “save as” button) of up to 50,000 public Tweets and/or User Objects per user of your Service, per day.
b. Any Content provided to third parties via non-automated file download remains subject to this Policy.
…(emphasis added)
Just to be clear, I find Twitter extremely useful for staying current on CS research topics and think developers should be “…good partners to Twitter.”
However, Chris is prohibited from posting a data set of 885,222 tweets on Gibhub, where users could download it with no impact on Twitter, versus every user who want to explore that data set must submit 885,222 requests to Twitter servers.
Having one hit on Github for 885,222 tweets versus 885,222 on Twitter servers sounds like being a “good partner” to me.
Multiple that by all the researchers who are building Twitter data sets and the drain on Twitter resources grows without any benefit to Twitter.
It’s true that someday Twitter might be able to monetize references to its data collections, but server and bandwidth expenses are present line items in their budget.
Enabling the distribution of full tweet datasets is one step towards improving their bottom line.
PS: Please share this with anyone you know at Twitter. Thanks!