GearPump (GitHub)
From the wiki homepage:
GearPump is a lightweight, real-time, big data streaming engine. It is inspired by recent advances in the Akka framework and a desire to improve on existing streaming frameworks. GearPump draws from a number of existing frameworks including MillWheel, Apache Storm, Spark Streaming, Apache Samza, Apache Tez, and Hadoop YARN while leveraging Akka actors throughout its architecture.
What originally caught my attention was this passage on the GitHub page:
Per initial benchmarks we are able to process 11 million messages/second (100 bytes per message) with a 17ms latency on a 4-node cluster.
Think about that for a second.
Per initial benchmarks we are able to process 11 million messages/second (100 bytes per message) with a 17ms latency on a 4-node cluster.
The GitHub page features a word count example and pointers to the wiki with more examples.
What if every topic “knew” the index value of every topic that should merge with it on display to a user?
When added to a topic map it broadcasts its merging property values and any topic with those values responds by transmitting its index value.
When you retrieve a topic, it has all the IDs necessary to create a merged view of the topic on the fly and on the client side.
There would be redundancy in the map but de-duplication for storage space went out with preferences for 7-bit character values to save memory space. So long as every topic returns the same result, who cares?
Well, it might make a difference when the CIA want to give every contractor full access to its datastores 24×7 via their cellphones. But, until that is an actual requirement, I would not worry about the storage space overmuch.
I first saw this in a tweet from Suneel Marthi.