Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 16, 2012

Analyzing Twitter Data with Hadoop, Part 2: Gathering Data with Flume

Filed under: Cloudera,Flume,Hadoop,Tweets — Patrick Durusau @ 9:15 am

Analyzing Twitter Data with Hadoop, Part 2: Gathering Data with Flume by Jon Natkins.

From the post:

This is the second article in a series about analyzing Twitter data using some of the components of the Hadoop ecosystem available in CDH, Cloudera’s open-source distribution of Hadoop and related projects. In the first article, you learned how to pull CDH components together into a single cohesive application, but to really appreciate the flexibility of each of these components, we need to dive deeper.

Every story has a beginning, and every data pipeline has a source. So, to build Hadoop applications, we need to get data from a source into HDFS.

Apache Flume is one way to bring data into HDFS using CDH. The Apache Flume website describes Flume as “a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.” At the most basic level, Flume enables applications to collect data from its origin and send it to a resting location, such as HDFS. At a slightly more detailed level, Flume achieves this goal by defining dataflows consisting of three primary structures: sources, channels and sinks. The pieces of data that flow through Flume are called events, and the processes that run the dataflow are called agents.

A very good introduction to the use of Flume!

Does it seem to you that the number of examples using Twitter, not just for “big data” but in general seems to be on the rise?

Just a personal observation and subject to all the flaws, “all the buses were going the other way,” of such.

Judging from the state of my inbox, some people are still writing more than 140 characters at a time.

Will it make a difference in our tools/thinking if we focus on shorter strings as opposed to longer ones?

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress