Analyzing Twitter Data with Hadoop by Jon Natkins
From the post:
Social media has gained immense popularity with marketing teams, and Twitter is an effective tool for a company to get people excited about its products. Twitter makes it easy to engage users and communicate directly with them, and in turn, users can provide word-of-mouth marketing for companies by discussing the products. Given limited resources, and knowing we may not be able to talk to everyone we want to target directly, marketing departments can be more efficient by being selective about whom we reach out to.
In this post, we’ll learn how we can use Apache Flume, Apache HDFS, Apache Oozie, and Apache Hive to design an end-to-end data pipeline that will enable us to analyze Twitter data. This will be the first post in a series. The posts to follow to will describe, in more depth, how each component is involved and how the custom code operates. All the code and instructions necessary to reproduce this pipeline is available on the Cloudera Github.
Looking forward to more posts in this series!
Social media is a focus for marketing teams for obvious reasons.
Analysis of snaps, crackles and pops en masse.
What if you wanted to communicate securely with others using social media?
Thinking of something more robust and larger than two (or three) lovers agreeing on code words.
How would you hide in a public data stream?
Or the converse, how would you hunt for someone in a public data stream?
How would you use topic maps to manage the semantic side of such a process?