Archive for the ‘Twitter’ Category

Debate Night Twitter: Analyzing Twitter’s Reaction to the Presidential Debate

Sunday, November 6th, 2016

Debate Night Twitter: Analyzing Twitter’s Reaction to the Presidential Debate by George McIntire.

A bit dated content-wise but George covers techniques, from data gathering to analysis, useful for future events. Possible Presidential inauguration riots on January 20, 2017 for example. Or, the 2017 Super Bowl, where Lady GaGa will be performing.

From the post:

This past Sunday, Donald Trump and Hillary Clinton participated in a town hall-style debate, the second of three such events in this presidential campaign. It was an extremely contentious affair that reverberated across social media.

The political showdown was massively anticipated; the negative atmosphere of the campaign and last week’s news of Trump making lewd comments about women on tape certainly contributed to the fire. Trump further escalated the immense tension by holding a press conference with women who’ve accused former President Bill Clinton of abusing.

With having a near unprecedented amount of attention and hostility, I wanted to gauge Twitter’s reaction to the event. In this project, I streamed tweets under the hashtag #debate and analyzed them to discover trends in Twitter’s mood and how users were reacting to not just the debate overall but to certain events in the debate.

What techniques will you apply to your tweet data sets?

How To Use Twitter to Learn Data Science (or anything)

Wednesday, November 2nd, 2016

How To Use Twitter to Learn Data Science (or anything) by Data Science Renee.

Judging from the date on the post (May 2016), Renee’s enthusiasm for Twitter came before her recently breaking 10,000 followers on Twitter. (Congratulations!)

The one thing I don’t see Renee mentioning is the use of your own Twitter account to gain experience with a whole range of data mining tools.

Your Twitter feed will quickly out-strip your ability to “keep up,” so how do you propose to deal with that problem?

Renee suggests limiting examination of your timeline (in part), but have you considered using machine learning to assist you?

Or visualizing your areas of interests or people that you follow?

Indexing resources pointed to in tweets?

NLP processing of tweets?

Every tool of data science that you will be using for clients is relevant to your own Twitter feed.

What better way to learn tools than using them on content that interests you?


BTW, follow Data Science Renee for a broad range of data science tools and topics!

Monetizing Twitter Trolls

Sunday, October 23rd, 2016

Alex Hern‘s coverage of Twitter’s fail-to-sell story, Did trolls cost Twitter $3.5bn and its sale?, is a typical short on facts story about abuse on Twitter.

When I say short on facts, I don’t deny any of the anecdotal accounts of abuse on Twitter and other social media.

Here’s the data problem with abuse at Twitter:

As of May of 2016, Twitter had 310 million active monthly users over 1.3 billion accounts.

Number of Twitter users who are abusive (trolls): unknown

Number of Twitter users who are victims: unknown

Number of abusive tweets, daily/weekly/monthly: unknown

Type/frequency of abusive tweets, language, images, disclosure: unknown

Costs to effectively control trolls: unknown

Trolls and abuse should be opposed both at Twitter and elsewhere, but without supporting data, creating corporate priorities and revenues to effectively block (not end, block) abuse isn’t possible.

Since troll hunting at present is a drain on the bottom line with no return for Twitter, what if Twitter were to monetize its trolls?

That is create a mechanism whereby trolls became the drivers of a revenue stream from Twitter.

One such approach would be to throw off all the filtering that Twitter does as part of its basic service. If you have Twitter basic service, you will see posts from everyone from committed jihadists to the Federal Reserve. Not blocked accounts, no deleted accounts, etc.

Twitter removes material under direct court order only. Put the burden and expense on going to court for every tweet on both individuals and governments. No exceptions.

Next, Twitter creates the Twitter+ account, where for an annual fee, users can access advanced filtering that includes blocking people, language, image analysis of images posted to them, etc.

Price point experiments should set the fees for Twitter+ accounts. Filtering will be a decision based on real revenue numbers. Not flights of fancy by the Guardian or Sales Force.

BTW, the open Twitter I suggest creates more eyes for ads, which should also improve the bottom line at Twitter.

An “open” Twitter will attract more trolls and drive more users to Twitter+ accounts.

Twitter trolls generate the revenue to fight them.

I rather like that.


Twitter Logic: 1 call on Github v. 885,222 calls on Twitter

Sunday, October 23rd, 2016

Chris Albon’s collection of 885,222 tweets (ids only) for the third presidential debate of 2016 proves bad design decisions aren’t only made inside the Capital Beltway.

Chris could not post his tweet collection, only the tweet ids under Twitter’s terms of service.

The terms of service reference the Developer Policy and under that policy you will find:

F. Be a Good Partner to Twitter

1. Follow the guidelines for using Tweets in broadcast if you display Tweets offline.

2. If you provide Content to third parties, including downloadable datasets of Content or an API that returns Content, you will only distribute or allow download of Tweet IDs and/or User IDs.

a. You may, however, provide export via non-automated means (e.g., download of spreadsheets or PDF files, or use of a “save as” button) of up to 50,000 public Tweets and/or User Objects per user of your Service, per day.

b. Any Content provided to third parties via non-automated file download remains subject to this Policy.
…(emphasis added)

Just to be clear, I find Twitter extremely useful for staying current on CS research topics and think developers should be “…good partners to Twitter.”

However, Chris is prohibited from posting a data set of 885,222 tweets on Gibhub, where users could download it with no impact on Twitter, versus every user who want to explore that data set must submit 885,222 requests to Twitter servers.

Having one hit on Github for 885,222 tweets versus 885,222 on Twitter servers sounds like being a “good partner” to me.

Multiple that by all the researchers who are building Twitter data sets and the drain on Twitter resources grows without any benefit to Twitter.

It’s true that someday Twitter might be able to monetize references to its data collections, but server and bandwidth expenses are present line items in their budget.

Enabling the distribution of full tweet datasets is one step towards improving their bottom line.

PS: Please share this with anyone you know at Twitter. Thanks!

Political Noise Data (Tweets From 3rd 2016 Presidential Debate)

Sunday, October 23rd, 2016

Chris Albon has collected data on 885,222 debate tweets from the third Presidential Debate of 2016.

As you can see from the transcript, it wasn’t a “debate” in any meaningful sense of the term.

The quality of tweets about that debate are equally questionable.

However, the people behind those tweets vote, buy products, click on ads, etc., so despite my title description as “political noise data,” it is important political noise data.

To conform to Twitter terms of service, Chris provides the relevant tweet ids and a script to enable construction of your own data set.

BTW, Chris includes his Twitter mining scripts.


ISIS Turns To Telegram App After Twitter Crackdown [Farce Alert + My Telegram Handle]

Monday, August 29th, 2016

ISIS Turns To Telegram App After Twitter Crackdown

From the post:

With the micro-blogging site Twitter coming down heavily on ISIS-sponsored accounts, the terrorist organisation and its followers are fast joining the heavily-encrypted messaging app Telegram built by a Russian developer.

On Telegram, the ISIS followers are laying out detailed plans to conduct bombing attacks in the west, reported on Monday.

France and Germany have issued statements that they now want a crackdown against them on Telegram.

“Encrypted communications among terrorists constitute a challenge during investigations. Solutions must be found to enable effective investigation… while at the same time protecting the digital privacy of citizens by ensuring the availability of strong encryption,” the statement said.


Oh, did you notice the source? “ reported on Monday.”

If you skip over to that post: IS Followers Flock to Telegram After being Driven from Twitter (I don’t want to shame the author so omitting their name), it reads in part:

With millions of IS loyalists communicating with one another on Telegram and spreading their message of radical Islam and extremism, France and Germany last week said that they want a continent wide effort to allow for a crackdown on Telegram.

“Encrypted communications among terrorists constitute a challenge during investigations,” France and Germany said in a statement. “Solutions must be found to enable effective investigation… while at the same time protecting the digital privacy of citizens by ensuring the availability of strong encryption.”

On private Telegram channels, IS followers have laid out detailed plans to poison Westerners and conduct bombing attacks, reports say.

What? “…millions of IS loyalists…?” IS in total is about 30K of active fighters, maybe. Millions of loyalists? Documentation? Citation of some sort? Being the Voice of America, I’d say they pulled that number out of a dark place.

Meanwhile, while complaining about the strong encryption, they are party to:

detailed plans to poison Westerners and conduct bombing attacks, reports say.

You do know wishing Westerners would choke on their Fritos doesn’t constitute a plan. Yes?

Neither does wishing to have an unspecified bomb, to be exploded at some unspecified location, at no particular time, constitute planning either.

Not to mention that “reports say” is a euphemism for: “…we just made it up.”

Get yourself to Telegram!



They left out my favorite:

Annoy governments seeking to invade a person’s privacy.

Reclaim your privacy today! Telegram!

Caveat: I tried using one device for the SMS to setup my smartphone. Nada, nyet, no joy. Had to use my cellphone number to setup the account on the cellphone. OK, but annoying.

BTW, on Telegram, my handle is @PatrickDurusau.

Yes, my real name. Which excludes this account from anything requiring OpSec. 😉

Twitter Said to Work on Anti-Harassment Keyword Filtering Tool [Good News!]

Sunday, August 28th, 2016

Twitter Said to Work on Anti-Harassment Keyword Filtering Tool by Sarah Frier.

From the post:

Twitter Inc. is working on a keyword-based tool that will let people filter the posts they see, giving users a more effective way to block out harassing and offensive tweets, according to people familiar with the matter.

The San Francisco-based company has been discussing how to implement the tool for about a year as it seeks to stem abuse on the site, said the people, who asked not to be identified because the initiative isn’t public. By using keywords, users could block swear words or racial slurs, for example, to screen out offenders.

Nice to have good news to report about Twitter!

Suggestions before the code gets set in stone:

  • Enable users to “follow” filters of other users
  • Enable filters to filter on nicknames in content and as sender
  • Regexes anyone?

A big step towards empowering users!

Another Data Point On Twitter Censorship Practices

Sunday, August 14th, 2016


Twitter Censor Strikes Again (and again, and again)

Saturday, August 13th, 2016

Twitter censors accounts for reasons known only to itself, but in the case, truth telling is one obvious trigger for Twitter censorship:


Twitter censors accounts every day that don’t make the news and those are just as serious violations of free speech as this instance.

Twitter could trivially empower users to have free speech and the equally important right to not listen but also for reasons known only to Twitter, has chosen not to do so.

Free speech and the right to not listen are equally important.

What’s so difficult to understand about that?

“A Honeypot For Assholes” [How To Monetize Assholes/Abuse]

Thursday, August 11th, 2016

“A Honeypot For Assholes”: Inside Twitter’s 10-Year Failure To Stop Harassment by Charlie Warzel.

From the post:

For nearly its entire existence, Twitter has not just tolerated abuse and hate speech, it’s virtually been optimized to accommodate it. With public backlash at an all-time high and growth stagnating, what is the platform that declared itself “the free speech wing of the free speech party” to do? BuzzFeed News talks to the people who’ve been trying to figure this out for a decade.

Warzel’s 6,000 word (5966 by my count) ramble uses “abuse” without ever defining the term. Nor do any of the people quoted in his post. But, like Justice Stewart, they “know it when they see it.”

One of the dangers Warzel’s post is every reader will insert their definition of “abuse.” Hard to find people who disagree that “abuse as they define it” should be blocked by Twitter.

All of Warzel’s examples are “abuse” (IMHO) but even so, I don’t support Twitter blocking that content from being posted. I emphasize posted because being posted on Twitter doesn’t obligate any user to read the content.

I don’t support Twitter censorship of any account, for any reason. Four Horsemen Of Internet Censorship + One.

If Twitter doesn’t block content, then how do to deal with “abuse?”

Why not monetize the blocking of assholes and abuse?

Imagine a Twitter client/app that:

  1. Maintains a list of people blocked not only by a user but allowed a user to subscribe to block lists of any other user.
  2. Employed stop lists, regexes, neural networks to filter tweets from people who have not been blocked.
  3. Neural networks trained on collections of “dick pics” and other offensive content to filter visual content as well.

Every user can have a customized definition of “abuse” for their own feed. Without impinging on the definitions of “abuse” of other users.

Twitter clients to support such filtering options are already in place. TweetDeck Versus Hootsuite – The Essential Guide discusses two popular clients. There are hundreds of others, both web and smart phone based.

Circling the question: “Why isn’t Twitter using my personal definition of “abuse” to protect me for free?” generates a lot of discussion, but no viable solutions.

Monetizing filtering of assholes and abuse, resources available in vast quantities, protects both free speech and freedom from unwanted speech.

The only useful question on Twitter abuse is the price point to set for avoiding X amount of abuse?


Twitter Censorship On Behalf Of Turkish Government

Wednesday, August 10th, 2016


The link Post Coup Censorship takes you to a list of twenty-three (23) journalist/publicist accounts verified as withheld by Twitter in Turkey.

I have tweeted to Efe Kerem Sözeri about this issue and was advised the censorship is based on IP addresses. Sözeri points out that use of a VPN is one easy means of avoiding the censorship.

Hopefully that was productive than a rant about Twitter’s toadyism and self-anointed role to prevent abuse (as opposed to empowering Twitter users to avoid abuse on their own).

Your Next Favorite Twitter Account: @DeepDrumpf

Friday, August 5th, 2016

@DeepDrumpf is a Neural Network trained on Donald Trump transcripts.

If you are curious beyond the tweets, see: Postdoc’s Trump Twitterbot Uses AI To Train Itself On Transcripts From Trump Speeches.

Ideally an interface would strip @DeepDrumpf and @realDonaldTrump off of tweets and present you with the option to assign authorship to @DeepDrumpf or @realDonaldTrump.

At the end of twenty or thirty tweets, you get your accuracy score over assignment of authorship.


Twitter Nanny Says No! No!

Thursday, July 21st, 2016


For the other side of this story, enjoy Milo Yiannopoulos’s Twitter ban, explained by Aja Romano, where Aja is supportive of Twitter and its self-anointed role as arbiter of social values.

From my point of view, the facts are fairly simple:

Milo Yiannopoulos (formerly @Nero) has been banned from Twitter on the basis of his speech and the speech of others who agree with him.

What more needs to be said?

I have not followed, read, reposted or retweeted any tweets by Milo Yiannopoulos (formerly @Nero). And would not even if someone sent them to me.

I choose to not read that sort of material and so can anyone else. Including the people who complain in Aja’s post.

The Twitter Nanny becomes censor in insisting that no one be able to read tweets from Milo Yiannopoulos (formerly @Nero).

I’ve heard the argument that the First Amendment doesn’t apply to Twitter, which is true, but irrelevant. Only one country in the world has the First Amendment as stated in the US Constitution but that doesn’t stop critics from decrying censorship by other governments.

Or is it only censorship if you agree with the speech being suppressed?

Censorship of speech that I find disturbing, sexist, racist, misogynistic, dehumanizing, transphobic, homophobic, supporting terrorism, is still censorship.

And it is still wrong.

We only have ourselves to blame for empowering Twitter to act as a social media censor. Central point of failure and all that jazz.

Suggestions on a free speech alternative to Twitter?

Twitter Giveth and Taketh Away (NSA as Profit Center?)

Monday, May 16th, 2016

Twitter Giveth: GCHQ intelligence agency joins Twitter. Just about anyone can get a Twitter account these days.

Do see the GCHQ GitHub site for shared software.

Taketh Away Twitter Bars Intelligence Agencies From Using Analytics Service.

Twitter has barred Dataminr from providing services to government intelligence services.

Dataminr monitors the entire Twitter pipe and provides analytics based on that stream.

Will this result in the NSA sharing its signal detection in the Twitter stream with other intelligence agencies?

Or for that matter, the NSA could start offering commercial signal detection services across all its feeds. Make it a profit center for the government rather than a money pit.

BTW, don’t be deceived by the illusion of space between government and Twitter, or any other entity that cooperates with a national government. Take “compromised” as a given. The real questions are by who and for what purpose?

Peda(bot)bically Speaking:…

Monday, April 25th, 2016

Peda(bot)bically Speaking: Teaching Computational and Data Journalism with Bots by Nicholas Diakopoulos.

From the post:

Bots can be useful little creatures for journalism. Not only because they help us automate tasks like alerting and filtering, but also because they encapsulate how data and computing can work together, in service of automated news. At the University of Maryland, where I’m a professor of journalism, my students are using the power of news bots to learn concepts and skills in computational journalism—including both editorial thinking and computational thinking.

Hmmm, bot that filters all tweets that don’t contain a URL? (To filter cat pics and the like.) 😉

Or retweets tweets with #’s that trigger creation of topics/associations?

I don’t think there is a requirement that hashtags be meaningful to others. Yes?

Sounds like a great class!

Women in Data Science (~632) – Twitter List

Monday, April 25th, 2016

Data Science Renee has a twitter list of approximately 632 women in data science.

I say “approximately” because when I first saw her post about the list it had 630 members. When I looked this AM, it had 632 members. By the time you look, that number will be different again.

If you are making a conscious effort to seek a diversity of speakers for your next data science conference, it should be on your list of sources.


1880 Big Data Influencers in CSV File

Friday, April 8th, 2016

If you aren’t familiar with Right Relevance, you are missing an amazing resource for cutting through content clutter.

Starting at the default homepage:


You can search for “big data” and the default result screen appears:


If you switch to “people,” the following screen appears:


The “topic score” line moves, so you can require a higher or lesser score for inclusion in the listing. That is helpful if you want only the top people, articles, etc. on a topic or want to reach deeper into the pool of data.

As of yesterday, if you set the “topic score” to the range 70 to 98, the number of people influencers was 1880.

The interface allows you to follow and/or tweet to any of those 1880 people, but only one at a time.

I submitted feedback to Right Relevance on Monday of this week pointing out how useful lists of Twitter handles could be for creating Twitter seed lists, etc., but have not gotten a response.

Part of my query to Right Relevance concerned the failure of a web scraper to match the totals listed in the interface (a far lower number of results than expected).

In the absence of an answer, I continue to experiment with the Web Scraper extension for Chrome to extract data from the site.

Caveat: In order to set the delay for requests in Web Scraper, I have found the settings under “Scrape” ineffectual:


In order to induce enough delay to capture the entire list, I set the delay in the exported sitemap (in JSON) and then imported it into another sitemap. Could have reached the same point by setting the delay under the top selector, which was also set to SelectorElementScroll.

To successfully retrieve the entire list, that delay setting was 16000 miliseconds.

There may be more performant solutions but since it ran in a separate browser tab and notified me of completion, time wasn’t an issue.

I created a sitemap that obtains the user’s name, Twitter handle and number of Twitter followers, bigdata-right-relevance.txt.

Oh, the promised 1880-big-data-influencers.csv. (File renamed post-scraping due to naming constraints in Web Scraper.)

At best I am a casual user of Web Scraper so suggestions for improvements, etc., are greatly appreciated. Launches First Report (PDF)

Thursday, March 31st, 2016 Launches First Report (PDF).

Reposting: is pleased to share our first report "Unfriending Censorship: Insights from four months of crowdsourced data on social media censorship." The report draws on data gathered directly from users between November 2015 and March 2016.

We asked users to send us reports when they had their content or accounts taken down on six social media platforms: Facebook, Flickr, Google+, Instagram, Twitter, and YouTube. We have aggregated and analyzed the collected data across geography, platform, content type, and issue areas to highlight trends in social media censorship. All the information presented here is anonymized, with the exception of case study examples we obtained with prior approval by the user.

Here are some of the highlights:

  • This report covers 161 submissions from 26 countries, regarding content in eleven languages.
  • Facebook was the most frequently reported platform, and account suspensions were the most reported content type.
  • Nudity and false identity were the most frequent reasons given to users for the removal of their content.
  • Appeals seem to present a particular challenge. A majority of users (53%) did not appeal the takedown of their content, 50% of whom said they didn’t know how and 41.9% of whom said they didn’t expect a response. In only four cases was content restored, while in 50 the user didn’t get a response.
  • We received widespread reports that flagging is being used for censorship: 61.6% believed this was the cause of the content takedown.

While we introduced some measures to help us verify reports (such as giving respondents the opportunity to send us screenshots that support their claims), we did not work with the companies to obtain this data and thus cannot claim it is representative of all content takedowns or user experiences. Instead, it shows how a subset of the millions of social media users feel about how their content takedowns were handled, and the impact it has had on their lives.

The full report is available for download and distribution under Creative Commons licensing.

As the report itself notes, 161 reports across 6 social media platforms in 4 months isn’t a representative sample of censoring in social media.

Twitter alone brags about closing 125,000 ISIS accounts since mid-2015 (report dated 5 February 2016).

Closing ISIS accounts is clearly censorship of political speech, whatever hand waving and verbal gymnastics Twitter wants to employ to justify its practices. Including terms of service.

Censorship, on whatever basis, by whoever practiced, by whatever mechanism (including appeals), will always step on legitimate speech of some speakers.

The non-viewing of content has one and only one legitimate locus of control, a user’s browser for web content.

Browsers and/or web interfaces for Twitter, Facebook, etc., should enable users to block users, content by keywords, or even classifications offered by social media services.


All need for collaboration with governments, issues of what content to censor, appeal processes, etc., suddenly disappear.

Enabling users to choose the content that will be displayed in their browsers empowers listeners as well as speakers, with prejudice towards none.


Tay AI Escapes, Recaptured

Wednesday, March 30th, 2016

Microsoft’s offensive chatbot Tay returns, by mistake by Georgia Wells.

From the post:

Less than one week after Microsoft Corp. made its debut and then silenced an artificially intelligent software chatbot that started spewing anti-Semitic rants, a researcher inadvertently put the chatbot, named Tay, back online. The revived Tay’s messages were no less inappropriate than before.

I remembered a DARPA webinar (download and snooze) but despite following Tay I missed her return.

Looks like I need a better tracking/alarm system for incoming social media.

I see more than enough sexist, racist, bigotry in non-Twitter news feeds to not need any more but I prefer to make my own judgments about “inappropriate.”

Whether it is the FBI, FCC or private groups calling “inappropriate.”

AI Masters Go, Twitter, Not So Much (Log from @TayandYou?)

Thursday, March 24th, 2016

Microsoft deletes ‘teen girl’ AI after it became a Hitler-loving sex robot within 24 hours by Helena Horton.

From the post:

A day after Microsoft introduced an innocent Artificial Intelligence chat robot to Twitter it has had to delete it after it transformed into an evil Hitler-loving, incestual sex-promoting, ‘Bush did 9/11’-proclaiming robot.

Developers at Microsoft created ‘Tay’, an AI modelled to speak ‘like a teen girl’, in order to improve the customer service on their voice recognition software. They marketed her as ‘The AI with zero chill’ – and that she certainly is.

The headline was suggested to me by a tweet from Peter Seibel:

Interesting how wide the gap is between two recent AI: AlphaGo and TayTweets. The Turing Test is *hard*.

In preparation for the next AI celebration, does anyone have a complete log of the tweets from Tay Tweets?

I prefer non-revisionist history where data doesn’t disappear. You can imagine the use Stalin would have made of that capability.

Muting users on Twitter – Achtung! State, DoD, Other US Censors

Wednesday, March 2nd, 2016

The Twitter Help Center has a great webpage titled: Muting users on Twitter.

From that page:

Mute is a feature that allows you to remove an account’s Tweets from your timeline without unfollowing or blocking that account. Muted accounts will not know that you’ve muted them and you can unmute them at any time. To access a list of accounts you have muted, visit your muted accounts settings on or your app settings on Twitter for iOS or Android.

Instead of leaning on Twitter to close accounts, the State Department, Department of Defense and others can compile Twitter Mute Lists that have the Twitter accounts that any reasonable person should mute.

The Catholic News Service used to publish movie ratings in Our Sunday Visitor and while the rating system has changed since I last saw it (think 1960’s), it was a great way to pick out movies.

I think most ones I saw were either condemned or some similar category. 😉

A twitter mute list from State, DoD and others would save me time of searching for offensive content to view. I am sure that is true for others as well.

Oh, not to mention that people who are offended can choose to not view such content. Sorry, almost go carried away there.

How’s that for a solution to “propaganda” on Twitter? If it offends you, don’t look. Leave the rest of us the hell alone.

NCSU Offers Social Media Archives Toolkit for Libraries [Defeating Censors]

Sunday, February 28th, 2016

NCSU Offers Social Media Archives Toolkit for Libraries by Matt Enis.

From the post:

North Carolina State University (NCSU) Libraries recently debuted a free, web-based social media archives toolkit designed to help cultural heritage organizations develop social media collection strategies, gain knowledge of ways in which peer institutions are collecting similar content, understand current and potential uses of social media content by researchers, assess the legal and ethical implications of archiving this content, and develop techniques for enriching collections of social media content at minimal cost. Tools for building and enriching collections include NCSU’s Social Media Combine—which pre-assembles the open source Social Feed Manager, developed at George Washington University for Twitter data harvesting, and NCSU’s own open source Lentil program for Instagram—into a single package that can be deployed on Windows, OSX, and Linux computers.

“By harvesting social media data (such as Tweets and Instagram photos), based on tags, accounts, or locations, researchers and cultural heritage professionals are able to develop accurate historical assessments and democratize access to archival contributors, who would otherwise never be represented in the historical record,” NCSU explained in an announcement.

“A lot of activity that used to take place as paper correspondence is now taking place on social media—the establishment of academic and artistic communities, political organizing, activism, awareness raising, personal and professional interactions,” Jason Casden, interim associate head of digital library initiatives, told LJ. Historians and researchers will want to have access to this correspondence, but unlike traditional letters, this content is extremely ephemeral and can’t be collected retroactively like traditional paper-based collections.

“So we collect proactively—as these events are happening or shortly after,” Casden explained.

I saw this too late today to install but I’m sure I will be posting about it later this week!

Do you see the potential of such tooling for defeating would-be censors of Twitter and other social media?

More on that later this week as well.

The Answer To Censors – Hand the Speaker a Larger Megaphone

Saturday, February 27th, 2016

TheCthulhu tweeted yesterday:


In case you are interested, the documents served on Twitter (in Turkish and English).

There is only one answer to censors – hand the censored speaker a larger megaphone.




and for good measure:


OK, only slightly larger but every follower counts.

Are you going to increase the size of TheCthulhu‘s megaphone?

Is Conduct/Truth A Defense to Censorship?

Saturday, February 27th, 2016

While Twitter sets up its Platonic panel of censors (Plato’s Republic, Books 2/3)*, I am wondering if conduct/truth will be a defense to censorship for accounts that make positive posts about the Islamic State?

I ask because of a message suggesting accounts (Facebook?) might be suspended for posts following these rules:

  • Do no use foul language and try to not get in a fight with people
  • Do not write too much for people to read
  • Make your point easy as not everyone has the same knowledge as you about the Islamic state and/or Islam
  • Use a VPN…
  • Use an account that you don’t really need because this is like a martydom operation, your account will probably be banned
  • Post images supporting the Islamic state
  • Give positive facts about the Islamic state
  • Share Islamic state video’s that show the mercy and kindness of the Islamic state towards Muslims, and/or showing Muslim’s support towards the Islamic state. Or any videos that will attract people to the Islamic state
  • Prove rumors about the Islamic state false
  • Give convincing Islamic information about topics discussed like the legitimacy of the khilafa, killing civilians of the kuffar, the takfeer made on Arab rules, etc.
  • Or simply just post a short quick comment showing your support like “dawlat al Islam baqiaa” or anything else (make sure ppl can understand it
  • Remember to like all the comments you see that are supporting the Islamic state with all your accounts!

Posted (but not endorsed) by J. Faraday on 27 February 2016.

If we were to re-cast those as rule of conduct, non-Islamic State specific, where N is the issue under discussion:

  • Do no use foul language and try to not get in a fight with people
  • Do not write too much for people to read
  • Make your point easy [to understand] as not everyone has the same knowledge as you about N
  • Post images supporting N
  • Give positive facts about N
  • Share N videos that show the mercy and kindness of N, and/or showing A support towards N. Or any videos that will attract people to N
  • Prove rumors about N false
  • Give convincing N information about topics discussed
  • Or simply just post a short quick comment showing your support or anything else (make sure ppl can understand it
  • Remember to like all the comments you see that are supporting N with all your accounts!

Is there something objectionable about those rules when N = Islamic State?

As far as being truthful, say for example claims by the Islamic State that Arab governments are corrupt, we can’t use a corruption index that lists Qatar at #22 (Denmark is #1 as the least corrupt) and Saudi Arabia at #48, when Bloomberg lists Qatar and Saudi Arabia as scoring zero (0) on budget transparency.

There are more corrupt governments than Qatar and Saudi Arabia, the failed state of Somalia for example, and perhaps the Sudan. Still, I wouldn’t ban anyone for saying both Qatar and Saudi Arabia are cesspools of corruption. They don’t match the structural corruption in Washington, D.C. but it isn’t for lack of trying.

Here the question for Twitter’s Platonic guardians (Trust and Safety Council):

Can an account that follows the rules of behavior outlined above be banned for truthful posts?

I think we all know the answer but I’m interested in seeing if Twitter will admit to censoring factually truthful information.

* Someone very dear to me objected to my reference to Twitterists (sp?) as Stalinists. It was literary hyperbole and so not literally true. Perhaps “Platonic guardians” will be more palatable. Same outcome, just a different moniker.

How to find breaking news on Twitter

Friday, February 19th, 2016

How to find breaking news on Twitter by Ruben Bouwmeester, Julia Bayer, and Alastair Reid.

From the post:

By its very nature, breaking news happens unexpectedly. Simply waiting for something to start trending on Twitter is not an option for journalists – you’ll have to actively seek it out.

The most important rule is to switch perspectives with the eyewitness and ask yourself, “What would I tweet if I were an eyewitness to an accident or disaster?”

To find breaking news on Twitter you have to think like a person who’s experiencing something out of the ordinary. Eyewitnesses tend to share what they see unfiltered and directly on social media, usually by expressing their first impressions and feelings. Eyewitness media can include very raw language that reflects the shock felt as a result of the situation. These posts often include misspellings.

In this article, we’ll outline some search terms you can use in order to find breaking news. The list is not intended as exhaustive, but a starting point on which to build and refine searches on Twitter to find the latest information.

Great collections of starter search terms but those are going to vary depending on your domain of “breaking” news.

Good illustration of use of Twitter search operators.

Other collections of Twitter search terms?

Twitter Suspension Tracker

Monday, February 15th, 2016

Twitter Suspension Tracker by Lee Johnstone.

From the about page:

This site (Twitter Suspension Monitor) was created to do one purpose, log and track suspended twitter accounts.

The system periodically checks marked suspended accounts for possible reactivation and remarks them accordingly. This allows the system to start tracking how many hours, days or even weeks and months a users twitter account got suspended for. Ontop of site submitted entrys Twitter Suspension Monitor also scrapes data directly from twitter in hope to find many more suspended accounts.

Not transparency but some reflected light on the Twitter account suspension process.

Tweets from suspended accounts disappear.

Stalin would have felt right at home with Twitter’s methods if not its ideology.

Here’s a photo of Stalin for the webpage of the Twitter Trust & Safety Council:


Members of the Twitter Trust & Safety Council should use it as their twitter profile image. Enable all of us to identify Twitter censorship collaborators.

However urgent current hysteria, censors are judged only one way in history.

Is that what you want for your legacy? Twitter, same question.

Are You A Scientific Twitter User or Polluter?

Saturday, February 6th, 2016

Realscientists posted this image to Twitter:


Self-Scoring Test:

In the last week, how often have you retweeted without “read[ing] the actual paper” pointed to by a tweet?

How many times did you retweet in total?

Formula: retweets w/o reading / retweets in total = % of retweets w/o reading.

No scale with superlatives because I don’t have numbers to establish a baseline for the “average” Twitter user.

I do know that I see click-bait, out-dated and factually wrong material retweeted by people who know better. That’s Twitter pollution.

Ask yourself: Am I a scientific Twitter user or a polluter?

Your call.

Is Twitter A Global Town Censor? (Data Project)

Friday, February 5th, 2016

Twitter Steps Up Efforts to Thwart Terrorists’ Tweets by Mike Isaac.

From the post:

For years, Twitter has positioned itself as a “global town square” that is open to discourse from all. And for years, extremist groups like the Islamic State have taken advantage of that stance, using Twitter as a place to spread their messages.

Twitter on Friday made clear that it was stepping up its fight to stem that tide. The social media company said it had suspended 125,000 Twitter accounts associated with extremism since the middle of 2015, the first time it has publicized the number of accounts it has suspended. Twitter also said it had expanded the teams that review reports of accounts connected to extremism, to remove the accounts more quickly.

“As the nature of the terrorist threat has changed, so has our ongoing work in this area,” Twitter said in a statement, adding that it “condemns the use of Twitter to promote terrorism.” The company said its collective moves had already produced results, “including an increase in account suspensions and this type of activity shifting off Twitter.”

The disclosure follows intensifying pressure on Twitter and other technology companies from the White House, presidential candidates like Hillary Clinton and government agencies to take more action to combat the digital practices of terrorist groups. The scrutiny has grown after mass shootings in Paris and San Bernardino, Calif., last year, because of concerns that radicalizations can be accelerated by extremist postings on the web and social media.

Just so you know what the Twitter rule is:

Violent threats (direct or indirect): You may not make threats of violence or promote violence, including threatening or promoting terrorism. (The Twitter Rules)

Here’s your chance to engage in real data science and help decide the question if Twitter had changed from global town hall to global town censor.

Here’s the data gathering project:

Monitor all the Twitter streams for Republican and Democratic candidates for the U.S. presidency for tweets advocating violence/terrorism.

File requests with Twitter for those accounts to be replaced.

FYI: When you report a message (Reporting a Tweet or Direct Message for violations), it will disappear from Messages inbox.

You must copy every tweet you report (accounts disappear as well) if you want to keep a record of your report.

Keep track of your reports and the tweet you copied before reporting.

Post the record of your reports and the tweets reported, plus any response from Twitter.

Suggestions on how to format these reports?

Or would you rather not know what Twitter is deciding for you?

How much data needs to be collected to move onto part 2 of the project – data analysis?

Suggestions on who at Twitter to contact for a listing of the 125,000 accounts that were silenced along with the Twitter history for each one? (Or the entire history of silenced accounts at Twitter? Who gets censored by topic, race, gender, location, etc., are all open questions.)

That could change the Twitter process from a black box to having marginally more transparency. You would have to guess at why any particular account was silenced.

If Twitter wants to take credit for censoring public discourse then the least it can do is be honest about who was censored and what they were saying to be censored.


Twitter Graph Analytics From NodeXL (With Questions)

Friday, January 29th, 2016

I’m sure you have seen this rather impressive Twitter graphic:


And you can see a larger version, with a link to the interactive version here:

Impressive visualization but…, tell me, what can you learn from these tweets about big data?

I mean, visualization is a great tool but if I am not better informed after using the visualization than before, what’s the point?

If you go to the interactive version, you will find lists derived from the data, such as “top 10 vertices, ranked by Betweeness Centrality,” top 10 URLs in the graph and groups in the graph, top domains in the graph and groups in the graph, etc.

None of which is evident from casual inspection of the graph. (Top influencers might be if I could get the interactive version to resize but difficult unless the step between #11 and #10 was fairly large.

Nothing wrong with eye candy but for touting the usefulness of visualization, let’s look for more intuitive visualizations.

I saw this particular version in a tweet by Kirk D. Borne.

Twitter Account Details

Thursday, January 28th, 2016

Kirk D. Borne tweeted this page which returns all of his Twitter details.

You can try mine as well, patrickDurusau.

The generic link:


BTW, if you aren’t already following Kirk D. Borne, you should be.