How to Refine and Visualize Twitter Data by Cheryle Custer.
From the post:
He loves me, he loves me not… using daisies to figure out someone’s feelings is so last century. A much better way to determine whether someone likes you, your product or your company is to do some analysis on Twitter feeds to get better data on what the public is saying. But how do you take thousands of tweets and process them? We show you how in our video – Understand your customers’ sentiments with Social Media Data – that you can capture a Twitter stream to do Sentiment Analysis.
Twitter Sentiment VisualizationNow, when you boot up your Hortonworks Sandbox today, you’ll find Tutorial 13: Refining and Visualizing Sentiment Data as the companion step-by-step guide to the video. In this Hadoop tutorial, we will show you how you can take a Twitter stream and visualize it in Excel 2013 or you could use your own favorite visualization tool. Note you can use any version of Excel, but Excel 2013 allows you do plot the data on a map where other versions will limit you to the built-in charting function.
(…)
A great tutorial from Hortonworks as always!
My only reservation is the acceptance of Twitter data for sentiment analysis.
True, it is easy to obtain, not all that difficult to process, but that isn’t the same thing as having any connection with sentiment about a company or product.
Consider that a now somewhat dated report (2012) reported that 51% of all Internet traffic is “non-human.”
If that is the case or has worsened since then, how do you account for that in your sentiment analysis?
Or if you are monitoring the Internet for Al-Qaeda threats, how do you distinguish threats from Al-Qaeda bots from threats by Al-Qaeda members?
What if threat levels are being gamed by Al-Qaeda bot networks?
Forcing expenditure of resources on a global scale at a very small cost.
A new type of asymmetric warfare?
I think this is THE issue with internet data. In our love affair with Big (Internet) Data, we seem to completely ignore the quality of the data. Of the 49% of internet traffic that is from humans, how much of the “sentiment” is real and how is paid for?
Years ago, there were rumors of Zappos using Mechanical Turk to generate good reviews for their company. Given that Amazon has turned that function to a product, it is not too hard to imagine that practice being widespread.
https://requester.mturk.com/applications/app/Content_Moderator
Then, there is a flood of stories in the media about Zappos use of Mechanical Turk for completely benign purposes.
http://www.teleread.com/ebooks/zappos-uses-mechanical-turk-to-proofread-five-million-product-reviews/
It might not have reached the sphere of terror networks yet but the battle for mindshare on the internet has been raging for a while now.
Comment by clemp — September 15, 2013 @ 6:26 am
Not to mention the issues in Big data and the “Big Lie”:…, http://tm.durusau.net/?p=45760.
I have heard rumors, only rumors mind you, that the IRS plants all the tax seizure horror stories in the news during tax season.
Curious, in a key/value store, how do you talk about the reliability of any data on the data side of that pair?
Not that you have to in all cases, for performance reasons but having the ability to talk about its reliability seems crucial to me.
Comment by Patrick Durusau — September 27, 2013 @ 9:33 am