Hungry for more political tweets?
Barometer of congressional mood?
George Lakoff tweeted:
Here’s an example of a “strategic” tweet by Trump.
Donald J. Trump tweets:
Terrible! Just found out that Obama had my “wires tapped” in Trump Tower just before the victory. Nothing found. This is McCarthyism!
For testing purposes, how would you characterize this sample of tweets that are a small part of the 35K replies to Trump’s tweet.
pourmecoffeeVerified account @pourmecoffee
@realDonaldTrump Correct. Making allegations without evidence is the literal definition of McCarthyism.
FFT-Obama for Prison @FemalesForTrump
when will the liars learn. Trump ALWAYS does his homework! The truth will support his tweet in 3, 2, 1 …
@FemalesForTrump @pourmecoffee Yes, I remember that proof that Obama was born in Kenya. And the Bowling Green Massacre.
FFT-Obama for Prison @FemalesForTrump
@ignatzz @pourmecoffee he WAS born in Kenya. Hawaii b/c is a fake. #fact
He didn’t make the bowling green statement. Now go away
Lisa ArmstrongVerified account @LisaArmstrong
@FemalesForTrump You people are still stuck on the lie that Obama was born in Kenya? Why? Where is the proof? #alternativefacts
Jet Black @jetd69
@LisaArmstrong @FemalesForTrump There’s little point in arguing with her. She’s as off her chops as he is. Females for Trump indeed!
Lisa ArmstrongVerified account @LisaArmstrong
@jetd69 @FemalesForTrump I know you’re right. It’s just that the willingness of #Trump supporters to believe flat out lies astounds me.
@LisaArmstrong @jetd69 @FemalesForTrump this goes both ways. Dems want Trump on treason. Based on what facts? What verifiable sources?
Lisa ArmstrongVerified account @LisaArmstrong
@AngieStrader The difference is there’s a long list of shady things Trump has actually done. These are facts. Obama being Kenyan is a lie.
Do you see any strategic tweets in that list or in the other 37K responses (as of Saturday afternoon, 4 March 2017)?
If the point of Trump’s tweet was diversion, I would have to say it succeeded beautifully.
The strategic response to a Trump tweet is ignoring them in favor of propagating your theme.
From the post:
In an apparent first for any American news outlet, the Washington Post released a Chrome plug-in on Friday designed to fact-check posts from a single Twitter account. Can you guess which one?
The new “RealDonaldContext” plug-in for the Google Chrome browser, released by WaPo reporter Philip Bump, adds fact-check summaries to selected posts by President-elect Donald Trump. Users will need to click a post in The Donald’s Twitter feed to see any fact-check information from the Washington Post, which appears as a gray text box beneath the tweet.
I differ with the Washington Post on its slavish reporting of unsubstantiated claims of the US intelligence community, but high marks for the “RealDonaldContext” plug-in for the Google Chrome browser!
What a great alternative to censoring “fake news” on Twitter! Fact check it!
Pointers to source code for similar plug-ins?
From the post:
We wanted to get a better idea of where President-elect Donald Trump gets his information. So we analyzed everything he has tweeted since he launched his campaign to take a look at the links he has shared and the news sources they came from.
Step-by-step guide to the software and analysis Trump’s tweets!
Which public figure’s tweets are you going to track/analyze?
I mentioned yesterday the distinction between muting an account versus the new muting by word or #hashtag at Twitter.
Take a moment to check my sources at Twitter support to make sure I have the rules correctly stated. I’ll wait.
(I’m not a journalist but readers should be enabled to satisfy themselves claims I make are at least plausible.)
No feedback from Twitter on the don’t appear in your timeline vs. do appear in your timeline distinction.
Why would I want to only block notifications of what I think of as hate speech and still have those tweets in my timeline?
Then it occurred to me:
If you can block tweets from appearing in your timeline by word or hashtag, you can block advertising tweets from appearing in your timeline.
You cannot effectively mute hate speech @Twitter because you could also mute advertising.
What about it Twitter?
Must feminists, people of color, minorities of all types be subjected to hate speech in order to preserve your revenue streams?
Not that I object to Twitter having revenue streams from advertising but it needs to be more sophisticated than the Nigerian spammer model now in use. Charge a higher price for targeted advertising that users are unlikely to block.
For example, I would be highly unlikely to block ads for cs theory/semantic integration tomes. On the other hand, I would follow a mute list that blocked histories of famous cricket matches. (Apologies to any cricket players in the audience.)
In my post: Twitter Almost Enables Personal Muting + Roving Citizen-Censors I offer a solution that requires only minor changes based on data Twitter already collects plus regexes for muting. It puts what you see entirely in the hands of users.
That enables Twitter to get out of the censorship business altogether, something it doesn’t do well anyway, and puts users in charge of what they see. A win-win from my perspective.
Alex Hern‘s post: Twitter users to get ability to mute words and conversations prompted this search because I found:
After nine years, Twitter users will finally be able to mute specific conversations on the site, as well as filter out all tweets with a particular word or phrase from their notifications.
The much requested features are being rolled out today, according to the company. Muting conversations serves two obvious purposes: users who have a tweet go viral will no longer have to deal with thousands of replies from strangers, while users stuck in an interminable conversation between people they don’t know will be able to silently drop out of the discussion.
A broader mute filter serves some clear general uses as well. Users will now be able to mute the names of popular TV shows, for instance, or the teams playing in a match they intend to watch later in the day, from showing up in their notifications, although the mute will not affect a user’s main timeline. “This is a feature we’ve heard many of you ask for, and we’re going to keep listening to make it better and more comprehensive over time,” says Twitter in a blogpost.
to be too vague to be useful.
Starting with Advanced muting options on Twitter, you don’t have to read far to find:
Note: Muting words and hashtags only applies to your notifications. You will still see these Tweets in your timeline and via search. The muted words and hashtags are applied to replies and mentions, including all interactions on those replies and mentions: likes, Retweets, additional replies, and Quote Tweets.
That’s the second paragraph and displayed with a high-lighted background.
So, “muting” of words and hashtags only stops notifications.
“Muted” offensive or inappropriate content is still visible “in your timeline and search.”
Perhaps really muting based on words and hashtags will be a paid subscription feature?
The other curious aspect is that “muting” an account carries an entirely different meaning.
The first sentence in Muting accounts on Twitter reads:
Mute is a feature that allows you to remove an account’s Tweets from your timeline without unfollowing or blocking that account.
How lame is that?
Solution That Avoids Censorship
The solution to Twitter’s “hate speech,” which means different things to different people isn’t hard to imagine:
Which means that if I trust N’s judgment on “hate speech,” I can follow their mute list. That saves me the effort of constructing my own mute list and perhaps even encourages the construction of public mute lists.
Twitter has the technical capability to produce such a solution in short order so you have to wonder why they haven’t? I have no delusion of being the first person to have imagined such a solution. Twitter? Comments?
The Alternative Solution – Roving Citizen-Censors
The alternative to a clean and non-censoring solution is covered in the USA Today report Twitter suspends alt-right accounts:
Twitter suspended a number of accounts associated with the alt-right movement, the same day the social media service said it would crack down on hate speech.
Among those suspended was Richard Spencer, who runs an alt-right think tank and had a verified account on Twitter.
The alt-right, a loosely organized group that espouses white nationalism, emerged as a counterpoint to mainstream conservatism and has flourished online. Spencer has said he wants blacks, Asians, Hispanics and Jews removed from the U.S.
[I personally find Richard Spencer’s views abhorrent and report them here only by way of example.]
From the report, Twitter didn’t go gunning for Richard Spencer’s account but the Southern Poverty Law Center (SPLC) did.
The SPLC didn’t follow more than 100 white supermacists to counter their outlandish claims or to offer a counter-narrative. They followed to gather evidence of alleged violations of Twitter’s terms of service and to request removal of those accounts.
Government censorship of free speech is bad enough, enabling roving bands of self-righteous citizen-censors to do the same is even worse.
The counter-claim that Twitter isn’t the government, it’s not censorship, etc., is intellectually and morally dishonest. Technically true in U.S. constitutional law sense but suppression of speech is the goal and that’s censorship, whatever fig leaf the SPLC wants to put on it. They should be honest enough to claim and defend the right to censor the speech of others.
I would not vote in their favor, that is to say they have a right to censor the speech of others. They are free to block speech they don’t care to hear, which is what my solution to “hate speech” on Twitter enables.
Support muting, not censorship or roving bands of citizen-censors.
When I say short on facts, I don’t deny any of the anecdotal accounts of abuse on Twitter and other social media.
Here’s the data problem with abuse at Twitter:
As of May of 2016, Twitter had 310 million active monthly users over 1.3 billion accounts.
Number of Twitter users who are abusive (trolls): unknown
Number of Twitter users who are victims: unknown
Number of abusive tweets, daily/weekly/monthly: unknown
Type/frequency of abusive tweets, language, images, disclosure: unknown
Costs to effectively control trolls: unknown
Trolls and abuse should be opposed both at Twitter and elsewhere, but without supporting data, creating corporate priorities and revenues to effectively block (not end, block) abuse isn’t possible.
Since troll hunting at present is a drain on the bottom line with no return for Twitter, what if Twitter were to monetize its trolls?
That is create a mechanism whereby trolls became the drivers of a revenue stream from Twitter.
One such approach would be to throw off all the filtering that Twitter does as part of its basic service. If you have Twitter basic service, you will see posts from everyone from committed jihadists to the Federal Reserve. Not blocked accounts, no deleted accounts, etc.
Twitter removes material under direct court order only. Put the burden and expense on going to court for every tweet on both individuals and governments. No exceptions.
Next, Twitter creates the Twitter+ account, where for an annual fee, users can access advanced filtering that includes blocking people, language, image analysis of images posted to them, etc.
Price point experiments should set the fees for Twitter+ accounts. Filtering will be a decision based on real revenue numbers. Not flights of fancy by the Guardian or Sales Force.
BTW, the open Twitter I suggest creates more eyes for ads, which should also improve the bottom line at Twitter.
An “open” Twitter will attract more trolls and drive more users to Twitter+ accounts.
Twitter trolls generate the revenue to fight them.
I rather like that.
Chris could not post his tweet collection, only the tweet ids under Twitter’s terms of service.
F. Be a Good Partner to Twitter
1. Follow the guidelines for using Tweets in broadcast if you display Tweets offline.
2. If you provide Content to third parties, including downloadable datasets of Content or an API that returns Content, you will only distribute or allow download of Tweet IDs and/or User IDs.
a. You may, however, provide export via non-automated means (e.g., download of spreadsheets or PDF files, or use of a “save as” button) of up to 50,000 public Tweets and/or User Objects per user of your Service, per day.
b. Any Content provided to third parties via non-automated file download remains subject to this Policy.
Just to be clear, I find Twitter extremely useful for staying current on CS research topics and think developers should be “…good partners to Twitter.”
However, Chris is prohibited from posting a data set of 885,222 tweets on Gibhub, where users could download it with no impact on Twitter, versus every user who want to explore that data set must submit 885,222 requests to Twitter servers.
Having one hit on Github for 885,222 tweets versus 885,222 on Twitter servers sounds like being a “good partner” to me.
Multiple that by all the researchers who are building Twitter data sets and the drain on Twitter resources grows without any benefit to Twitter.
It’s true that someday Twitter might be able to monetize references to its data collections, but server and bandwidth expenses are present line items in their budget.
Enabling the distribution of full tweet datasets is one step towards improving their bottom line.
PS: Please share this with anyone you know at Twitter. Thanks!
As you can see from the transcript, it wasn’t a “debate” in any meaningful sense of the term.
The quality of tweets about that debate are equally questionable.
However, the people behind those tweets vote, buy products, click on ads, etc., so despite my title description as “political noise data,” it is important political noise data.
To conform to Twitter terms of service, Chris provides the relevant tweet ids and a script to enable construction of your own data set.
BTW, Chris includes his Twitter mining scripts.
From the post:
Twitter Inc. is working on a keyword-based tool that will let people filter the posts they see, giving users a more effective way to block out harassing and offensive tweets, according to people familiar with the matter.
The San Francisco-based company has been discussing how to implement the tool for about a year as it seeks to stem abuse on the site, said the people, who asked not to be identified because the initiative isn’t public. By using keywords, users could block swear words or racial slurs, for example, to screen out offenders.
Nice to have good news to report about Twitter!
Suggestions before the code gets set in stone:
A big step towards empowering users!
From the post:
For nearly its entire existence, Twitter has not just tolerated abuse and hate speech, it’s virtually been optimized to accommodate it. With public backlash at an all-time high and growth stagnating, what is the platform that declared itself “the free speech wing of the free speech party” to do? BuzzFeed News talks to the people who’ve been trying to figure this out for a decade.
Warzel’s 6,000 word (5966 by my count) ramble uses “abuse” without ever defining the term. Nor do any of the people quoted in his post. But, like Justice Stewart, they “know it when they see it.”
One of the dangers Warzel’s post is every reader will insert their definition of “abuse.” Hard to find people who disagree that “abuse as they define it” should be blocked by Twitter.
All of Warzel’s examples are “abuse” (IMHO) but even so, I don’t support Twitter blocking that content from being posted. I emphasize posted because being posted on Twitter doesn’t obligate any user to read the content.
I don’t support Twitter censorship of any account, for any reason. Four Horsemen Of Internet Censorship + One.
If Twitter doesn’t block content, then how do to deal with “abuse?”
Why not monetize the blocking of assholes and abuse?
Imagine a Twitter client/app that:
Every user can have a customized definition of “abuse” for their own feed. Without impinging on the definitions of “abuse” of other users.
Twitter clients to support such filtering options are already in place. TweetDeck Versus Hootsuite – The Essential Guide discusses two popular clients. There are hundreds of others, both web and smart phone based.
Circling the question: “Why isn’t Twitter using my personal definition of “abuse” to protect me for free?” generates a lot of discussion, but no viable solutions.
Monetizing filtering of assholes and abuse, resources available in vast quantities, protects both free speech and freedom from unwanted speech.
The only useful question on Twitter abuse is the price point to set for avoiding X amount of abuse?
The link Post Coup Censorship takes you to a list of twenty-three (23) journalist/publicist accounts verified as withheld by Twitter in Turkey.
I have tweeted to Efe Kerem Sözeri about this issue and was advised the censorship is based on IP addresses. Sözeri points out that use of a VPN is one easy means of avoiding the censorship.
Hopefully that was productive than a rant about Twitter’s toadyism and self-anointed role to prevent abuse (as opposed to empowering Twitter users to avoid abuse on their own).
From my point of view, the facts are fairly simple:
Milo Yiannopoulos (formerly @Nero) has been banned from Twitter on the basis of his speech and the speech of others who agree with him.
What more needs to be said?
I have not followed, read, reposted or retweeted any tweets by Milo Yiannopoulos (formerly @Nero). And would not even if someone sent them to me.
I choose to not read that sort of material and so can anyone else. Including the people who complain in Aja’s post.
The Twitter Nanny becomes censor in insisting that no one be able to read tweets from Milo Yiannopoulos (formerly @Nero).
I’ve heard the argument that the First Amendment doesn’t apply to Twitter, which is true, but irrelevant. Only one country in the world has the First Amendment as stated in the US Constitution but that doesn’t stop critics from decrying censorship by other governments.
Or is it only censorship if you agree with the speech being suppressed?
Censorship of speech that I find disturbing, sexist, racist, misogynistic, dehumanizing, transphobic, homophobic, supporting terrorism, is still censorship.
And it is still wrong.
We only have ourselves to blame for empowering Twitter to act as a social media censor. Central point of failure and all that jazz.
Suggestions on a free speech alternative to Twitter?
From the post:
By its very nature, breaking news happens unexpectedly. Simply waiting for something to start trending on Twitter is not an option for journalists – you’ll have to actively seek it out.
The most important rule is to switch perspectives with the eyewitness and ask yourself, “What would I tweet if I were an eyewitness to an accident or disaster?”
To find breaking news on Twitter you have to think like a person who’s experiencing something out of the ordinary. Eyewitnesses tend to share what they see unfiltered and directly on social media, usually by expressing their first impressions and feelings. Eyewitness media can include very raw language that reflects the shock felt as a result of the situation. These posts often include misspellings.
In this article, we’ll outline some search terms you can use in order to find breaking news. The list is not intended as exhaustive, but a starting point on which to build and refine searches on Twitter to find the latest information.
Great collections of starter search terms but those are going to vary depending on your domain of “breaking” news.
Good illustration of use of Twitter search operators.
Other collections of Twitter search terms?
Twitter Suspension Tracker by Lee Johnstone.
From the about page:
This site (Twitter Suspension Monitor) was created to do one purpose, log and track suspended twitter accounts.
The system periodically checks marked suspended accounts for possible reactivation and remarks them accordingly. This allows the system to start tracking how many hours, days or even weeks and months a users twitter account got suspended for. Ontop of site submitted entrys Twitter Suspension Monitor also scrapes data directly from twitter in hope to find many more suspended accounts.
Not transparency but some reflected light on the Twitter account suspension process.
Tweets from suspended accounts disappear.
Stalin would have felt right at home with Twitter’s methods if not its ideology.
Here’s a photo of Stalin for the webpage of the Twitter Trust & Safety Council:
Members of the Twitter Trust & Safety Council should use it as their twitter profile image. Enable all of us to identify Twitter censorship collaborators.
However urgent current hysteria, censors are judged only one way in history.
Is that what you want for your legacy? Twitter, same question.
From the post:
For years, Twitter has positioned itself as a “global town square” that is open to discourse from all. And for years, extremist groups like the Islamic State have taken advantage of that stance, using Twitter as a place to spread their messages.
Twitter on Friday made clear that it was stepping up its fight to stem that tide. The social media company said it had suspended 125,000 Twitter accounts associated with extremism since the middle of 2015, the first time it has publicized the number of accounts it has suspended. Twitter also said it had expanded the teams that review reports of accounts connected to extremism, to remove the accounts more quickly.
“As the nature of the terrorist threat has changed, so has our ongoing work in this area,” Twitter said in a statement, adding that it “condemns the use of Twitter to promote terrorism.” The company said its collective moves had already produced results, “including an increase in account suspensions and this type of activity shifting off Twitter.”
The disclosure follows intensifying pressure on Twitter and other technology companies from the White House, presidential candidates like Hillary Clinton and government agencies to take more action to combat the digital practices of terrorist groups. The scrutiny has grown after mass shootings in Paris and San Bernardino, Calif., last year, because of concerns that radicalizations can be accelerated by extremist postings on the web and social media.
Just so you know what the Twitter rule is:
Violent threats (direct or indirect): You may not make threats of violence or promote violence, including threatening or promoting terrorism. (The Twitter Rules)
Here’s your chance to engage in real data science and help decide the question if Twitter had changed from global town hall to global town censor.
Here’s the data gathering project:
Monitor all the Twitter streams for Republican and Democratic candidates for the U.S. presidency for tweets advocating violence/terrorism.
File requests with Twitter for those accounts to be replaced.
FYI: When you report a message (Reporting a Tweet or Direct Message for violations), it will disappear from Messages inbox.
You must copy every tweet you report (accounts disappear as well) if you want to keep a record of your report.
Keep track of your reports and the tweet you copied before reporting.
Post the record of your reports and the tweets reported, plus any response from Twitter.
Suggestions on how to format these reports?
Or would you rather not know what Twitter is deciding for you?
How much data needs to be collected to move onto part 2 of the project – data analysis?
Suggestions on who at Twitter to contact for a listing of the 125,000 accounts that were silenced along with the Twitter history for each one? (Or the entire history of silenced accounts at Twitter? Who gets censored by topic, race, gender, location, etc., are all open questions.)
That could change the Twitter process from a black box to having marginally more transparency. You would have to guess at why any particular account was silenced.
If Twitter wants to take credit for censoring public discourse then the least it can do is be honest about who was censored and what they were saying to be censored.
I encountered a color-coded map of a single Tweet today:
Either select the image to see it full-size or follow the original link: http://online.wsj.com/public/resources/documents/TweetMetadata.pdf.
I haven’t done a detailed comparison against the Twitter API documentation but suffice it to say this map should not be cited and used only with caution.
I don’t think anything in the map is wrong, but it isn’t complete, missing for example, possibly_sensitive, quoted_status_id, quoted_status_id_str, quoted_status and others.
Suggestions for an updated map of a single Tweet?
Even the out-dated map gives you a good idea of the richness of information that can be transmitted by a single tweet.
Makes me wonder who is using the 140 characters and/or additional data for open but secure communication?
From the post:
Twitter has challenged Turkey in an Ankara court seeking to cancel a $50,000 fine for not removing content from its website, the social media site’s lawyer told Al Jazeera on Thursday.
Turkey temporarily banned access to Twitter several times in the past for failing to comply with requests to remove content. But the 150,000 lira ($50,000) fine imposed by the Information and Communication Technologies Authority (BTK) was the first of its kind imposed by Turkish authorities on Twitter.
A Turkish official told Reuters news agency on Thursday that much of the material in question was related to the Kurdistan Workers Party (PKK), which Ankara called “terrorist propaganda”.
Twitter, in its lawsuit, is arguing the fine goes against Turkish law and should be annulled, the official told Reuters.
Reading about Twitter opposing censorship is like seeing a news account about a man biting a dog. That really is news!
I say that because only a few months ago in Secretive Twitter Censorship Fairy Strikes Again!, I pointed to reports of Twitter silencing 10,000 Islamic State accounts on April 2nd of 2015. More censorship of Islamic State accounts followed but that’s an impressive total for one day.
From all reports, entirely at Twitter’s on initiative. Why Twitter decided to single out accounts that favor the Islamic State over those that favor the U.S. military isn’t clear. The U.S. military is carrying out daily bombing attacks in Iraq and Syria, something you can’t say about the Islamic State.
Now Twitter finds itself in the unhappy position of being an inadequate censor, a censor that violates the fundamental premise of being a common carrier, that is it is open to all opinions, fair and foul, and a censor that has failed a state that is even less tolerant of free speech than Twitter.
Despised by one side for censorship and loathed by the other for being an inadequate toady.
Not an enviable position.
Just my suggestion but Twitter needs to reach out to the telcos and others who provide international connectivity for phones and other services to Turkey.
A 24 to 72 hour black-out of all telecommunications, for banks, media, phone, internet, should give the Turkish government a taste of the economic disruption, to say nothing of disruption of government, that will follow future attempts to censor, fine or block any international common carrier.
The telcos and other have the power to bring outlandish actors such as the Turkish government to a rapid heel.
It’s time that power was put to use.
You see, no bombs, no boots on the ground, no lengthy and tiresome exchanges of blustering speeches, just a quick trip back to the 19th century to remind Turkey’s leaders how painful a longer visit could be.
From the post:
We’ll forgive you if you missed the news, since it was announced on New Year’s Eve: Politwoops, the service which tracks politicians’ deleted tweets, is coming back after Twitter agreed to let it access the service’s API once again.
On Tuesday, the Open State Foundation, the Dutch nonprofit that runs the international editions of Politwoops, said it was functioning again in 25 countries, including the United Kingdom, the Netherlands, Ireland, and Turkey. The American version of Politwoops, operated by the Sunlight Foundation, isn’t back up yet, but the foundation said in a statement that “in the coming days and weeks, we’ll be working behind the scenes to get Politwoops up and running.”
Politwoops will be reporting tweets that politicians send and then suddenly regret.
I don’t disagree with Twitter that any user can delete their tweets but strongly disagree that I can’t capture the original tweet and at a minimum, point to its absence from the “now” Twitter archive.
Politicians should not be allowed to hide from their sporadic truthful tweets.
Five videos on effective use of Twitter for journalism.
The videos are:
The times shown are minutes followed by seconds.
Labeled for journalism but anyone searching Twitter, librarians, authors, researchers, even “fans” (shudder), will find useful information in these videos.
If you don’t know FirstDraftNews, you need to get acquainted.
From the webpage:
Find journalists by what they tweet
Powered by all the tweets since 2006 from more than 1 million journalist & media outlets.
Search for relevant journalists
Search through 1 billion+ real-time and historical tweets (since 2006, when Twitter was born) from 1 million+ journalists and media outlets, to find out all the relevant media contacts that have talked about your product, your business, your competitors, or any other keywords in your industry.
Searches can be limited to tweets, journalists and outlets.
The advanced search interface looks useful:
If you are mining twitter for news sources, this could prove to be very useful.
With the caveat that news sources tend to be highly repetitive. If the New York Times says the OPM hack originated in China, a large number of news lemmings will repeat that without a word of doubt or criticism. Still amounts to one unknown source cited by the New York Times. No matter how many times it is repeated.
From the post:
Five years ago, this column looked into scholarly potential of the Twitter archive the Library of Congress had recently acquired. That potential was by no means self-evident. The incensed “my tax dollars are being used for this?” comments practically wrote themselves, even without the help of Twitter bots.
For what — after all — is the value of a dead tweet? Why would anyone study 140-character messages, for the most part concerning mundane and hyperephemeral topics, with many of them written as if to document the lowest possible levels of functional literacy?
As I wrote at the time, papers by those actually doing the research treated Twitter as one more form of human communication and interaction. The focus was not on the content of any specific message, but on the patterns that emerged when they were analyzed in the aggregate. Gather enough raw data, apply suitable methods, and the results could be interesting. (For more detail, see the original discussion.)
The key thing was to have enough tweets on hand to grind up and analyze. So, yes, an archive. In the meantime, the case for tweet preservation seems easier to make now that elected officials, religious leaders and major media outlets use Twitter. A recent volume called Twitter and Society (Peter Lang, 2014) collects papers on how politics, journalism, the marketplace and (of course) academe itself have absorbed the impact of this high-volume, low-word-count medium.
As far as the Library of Congress archive, Scott reports:
The Library of Congress finds itself in the position of someone who has agreed to store the Atlantic Ocean in his basement. The embarrassment is palpable. No report on the status of the archive has been issued in more than two years, and my effort to extract one elicited nothing but a statement of facts that were never in doubt.
“The library continues to collect and preserve tweets,” said Gayle Osterberg, the library’s director of communications, in reply to my inquiry. “It was very important for the library to focus initially on those first two aspects — collection and preservation. If you don’t get those two right, the question of access is a moot point. So that’s where our efforts were initially focused and we are pleased with where we are in that regard.”
That’s as helpful as the responses I get about the secret ACM committee that determines the fate of feature requests for the ACM digital library. You can’t contact them directly nor can you find any record of their discussions/decisions.
Let’s hope greater attention and funding can move the Library of Congress Twitter Archive towards public access, for all the reasons enumerated by Scott.
One does have to wonder, given the role of the U.S. government in pushing for censorship of Twitter accounts, will the Library of Congress archive be complete and free from censorship? Or will it have dark spots depending upon the whims and caprices of the current regime?
Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Information extraction from tweets is typically performed in a pipeline, comprising consecutive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). In this work, we describe a new Twitter entity disambiguation dataset, and conduct an empirical analysis of named entity recognition and disambiguation, investigating how robust a number of state-of-the-art systems are on such noisy texts, what the main sources of error are, and which problems should be further investigated to improve the state of the art.
The questions addressed by the paper are:
RQ1 How robust are state-of-the-art named entity recognition and linking methods on short and noisy microblog texts?
RQ2 What problem areas are there in recognising named entities in microblog posts, and what are the major causes of false negatives and false positives?
RQ3 Which problems need to be solved in order to further the state-of-the-art in NER and NEL on this difficult text genre?
The ultimate conclusion is that entity recognition in microblog posts falls short of what has been achieved for newswire text but if you need results now or at least by tomorrow, this is a good guide to what is possible and where improvements can be made.
Making the most detailed tweet map ever by Eric Fisher.
From the post:
I’ve been tracking geotagged tweets from Twitter’s public API for the last three and a half years. There are about 10 million public geotagged tweets every day, which is about 120 per second, up from about 3 million a day when I first started watching. The accumulated history adds up to nearly three terabytes of compressed JSON and is growing by four gigabytes a day. And here is what those 6,341,973,478 tweets look like on a map, at any scale you want.
[Static screenshot of a much cooler interactive map at original post.]
I’ve open sourced the tools I used to manipulate the data and did all the design work in Mapbox Studio. Here’s how you can make one like it yourself.
Eric gives a detailed account of how you can start tracking tweets on your own!
This rocks! If you use or adapt Eric’s code, be sure to give him a shout out in your code and/or documentation.
Thanks to the Clojure Community! by Alex Miller.
Today at the Clojure/conj, I gave thanks to many community members for their contributions. Any such list is inherently incomplete – I simply can’t capture everyone doing great work. If I missed someone important, please drop a comment and accept my apologies.
Alex has a list of people with GitHub, website and Twitter URLs.
I have extracted the Twitter URLs and created a Twitter handle followed by a Python comment marker and the users name for your convenience with Twitter feed scripts:
timbaldridge # Tim Baldridge bbatsov # Bozhidar Batsov fbellomi # Francesco Bellomi ambrosebs # Ambrose Bonnaire-Sergeant reborg # Renzo Borgatti reiddraper # Reid Draper colinfleming # Colin Fleming deepbluelambda # Daniel Solano Gomez nonrecursive # Daniel Higginbotham bridgethillyer # Bridget Hillyer heyzk # Zachary Kim aphyr # Kyle Kingsbury alanmalloy # Alan Malloy gigasquid # Carin Meier dmiller2718 # David Miller bronsa_ # Nicola Mometto ra # Ramsey Nasser swannodette # David Nolen ericnormand # Eric Normand petitlaurent # Laurent Petit tpope # Tim Pope smashthepast # Ghadi Shayban stuartsierra # Stuart Sierra
You will, of course, have to delete the blank lines with I retained for ease of human reading. Any mistakes or errors in this listing are solely my responsibility.
Twitter Now Lets You Search For Any Tweet Ever Sent by Cade Metz.
From the post:
This morning, Twitter began rolling out a search service that lets you search for any tweet in its archive.
Though the new Twitter search engine is limited to rather rudimentary keyword searches today, the company plans to expand into more complex queries in the months and years to come. And the foundational search infrastructure laid down by the company will help drive other Twitter tools as well. “It lets us power a lot more things down the road—not just search,” says Gilad Mishne, the Twitter engineering director who helped oversee the project.
Well, that’s both good news and better news!
Good news because of being able to search and link to the full corpus of tweets.
Better news because of the search market gap that Cade reports, which is quite similar to Google’s.
You can search for anything you want, but the results, semantically speaking, are going to be a crap shoot.
Do users really have time for hit or miss search results? Some do, some don’t.
If yours don’t, let’s talk.
neo4apis by Brian Underwood.
From the post:
I’ve been reading a few interesting analyses of Twitter data recently such as this #gamergate analysis by Andy Baio. I thought it would be nice to have a mechanism for people to quickly and easily import data from Twitter to Neo4j for research purposes. Like a good programmer I had to go up at least one level of abstraction. Thus was born the ruby gems neo4apis and neo4apis-twitter (and, incidentally, neo4apis-github just to prove it was repeatable).
neo4apis-twitter gemis easy and can be used either in your ruby code or from the command line.
neo4apistakes care of loading your data efficiently as well as creating database indexes so that you can query it effectively.
Just doing rough numbers, 7,271,955,000 / 228,000,000 = 31.
So if you captured a tweet from every active twitter user, that would be 1/31 of the world’s population.
Not saying you shouldn’t capture tweets or analyze them in Neo4j. I am saying that you should be mindful of the lack of representation in such tweets.
Analysis of Named Entity Recognition and Linking for Tweets by Leon Derczynski, et al.
Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Information extraction from tweets is typically performed in a pipeline, comprising consecutive stages of language identication, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). In this work, we describe a new Twitter entity disambiguation dataset, and conduct an empirical analysis of named entity recognition and disambiguation, investigating how robust a number of state-of-the-art systems are on such noisy texts, what the main sources of error are, and which problems should be further investigated to improve the state of the art.
A detailed review of existing solutions for mining tweets, where they fail along and why.
A comparison to spur tweet research:
|Tweets Per Day||> 500,000,000||Derczynski, p. 2|
|Annotated Tweets||< 10,000||Derczynski, p. 27|
Let’s see: 500,000,000 / 10,000 = 50,000.
The number of tweet per day is more than 50,000 times the number of tweets annotated with named entity types.
It may just be me but that sounds like the sort of statement you would see in a grant proposal to increase the number of annotated tweets.
I first saw this in a tweet by Diana Maynard.
TWeet NLP (Carnegie Mellon)
From the webpage:
We provide a tokenizer, a part-of-speech tagger, hierarchical word clusters, and a dependency parser for tweets, along with annotated corpora and web-based annotation tools.
See the website for further details.
I can understand vendors mining tweets and try to react to every twitch in some social stream but the U.S. military is interested as well.
“Customer targeting” in their case has a whole different meaning.
Assuming you can identify one or more classes of tweets, would it be possible to mimic those patterns, albeit with some deviation in the content of the tweets? That is what tweet content is weighted heavier that other tweet content?
I first saw this in a tweet by Peter Skomoroch.