Analyzing Twitter: An End-to-End Data Pipeline Recap by Jason Barbour.
Jason reviews presentations at a recent Data Science MD meeting:
Starting off the night, Joey Echeverria, a Principal Solutions Architect, first discussed a big data architecture and how a key components of relational data management system can be replaced with current big data technologies. With Twitter being increasingly popular with marketing teams, analyzing Twitter data becomes a perfect use case to demonstrate a complete big data pipeline.
(…)
Following Joey, Sean Busbey, a Solutions Architect at Cloudera, discussed working with Mahout, a scalable machine learning library for Hadoop. Sean first introduced the three C’s of machine learning: classification, clustering, and collaborative filtering. With classification, learning from a training set supervised, and new examples can be categorized. Clustering allows examples to be grouped together with common features, while collaborative filtering allows new candidates to be suggested.
Great summaries, links to additional resources and the complete slides.
Check the DC Data Community Events Calendar if you plan to visit the DC area. (I assume residents already do.)