Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 3, 2013

Summingbird [Twitter open sources]

Filed under: Hadoop,Storm,Summingbird,Tweets — Patrick Durusau @ 5:59 pm

Twitter open sources Storm-Hadoop hybrid called Summingbird by Derrick Harris.

I look away for a few hours to review a specification and look what pops up:

Twitter has open sourced a system that aims to mitigate the tradeoffs between batch processing and stream processing by combining them into a hybrid system. In the case of Twitter, Hadoop handles batch processing, Storm handles stream processing, and the hybrid system is called Summingbird. It’s not a tool for every job, but it sounds pretty handy for those it’s designed to address.

Twitter’s blog post announcing Summingbird is pretty technical, but the problem is pretty easy to understand if you think about how Twitter works. Services like Trending Topics and search require real-time processing of data to be useful, but they eventually need to be accurate and probably analyzed a little more thoroughly. Storm is like a hospital’s triage unit, while Hadoop is like longer-term patient care.

This description of Summingbird from the project’s wiki does a pretty good job of explaining how it works at a high level.

(…)

While the Summingbird announcement is heavy sledding, it is well written. The projects spawned by Summingbird are rife with possibilities.

I appreciate Derrick’s comment:

It’s not a tool for every job, but it sounds pretty handy for those it’s designed to address.

I don’t know of any tools “for every job,” the opinions of some graph advocates notwithstanding. 😉

If Summingbird fits your problem set, spend some serious time seeing what it has to offer.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress