Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 13, 2013

Analyzing Twitter: An End-to-End Data Pipeline Recap

Filed under: BigData,Cloudera,Mahout,Tweets — Patrick Durusau @ 10:32 am

Analyzing Twitter: An End-to-End Data Pipeline Recap by Jason Barbour.

Jason reviews presentations at a recent Data Science MD meeting:

Starting off the night, Joey Echeverria, a Principal Solutions Architect, first discussed a big data architecture and how a key components of relational data management system can be replaced with current big data technologies. With Twitter being increasingly popular with marketing teams, analyzing Twitter data becomes a perfect use case to demonstrate a complete big data pipeline.

(…)

Following Joey, Sean Busbey, a Solutions Architect at Cloudera, discussed working with Mahout, a scalable machine learning library for Hadoop. Sean first introduced the three C’s of machine learning: classification, clustering, and collaborative filtering. With classification, learning from a training set supervised, and new examples can be categorized. Clustering allows examples to be grouped together with common features, while collaborative filtering allows new candidates to be suggested.

Great summaries, links to additional resources and the complete slides.

Check the DC Data Community Events Calendar if you plan to visit the DC area. (I assume residents already do.)

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress