Spark and Elasticsearch by Barnaby Gray.
From the post:
If you work in the Hadoop world and have not yet heard of Spark, drop everything and go check it out. It’s a really powerful, intuitive and fast map/reduce system (and some).
Where it beats Hadoop/Pig/Hive hands down is it’s not a massive stack of quirky DSLs built on top of layers of clunky Java abstractions – it’s a simple, pure Scala functional DSL with all the flexibility and succinctness of Scala. And it’s fast, and properly interactive – query, bam response snappiness – not query, twiddle fingers, wait a bit.. response.
And if you’re into search, you’ll no doubt have heard of Elasticsearch – a distributed restful search engine built upon Lucene.
They’re perfect bedfellows – crunch your raw data and spit it out into a search index ready for serving to your frontend. At the company I work for we’ve built the google-analytics-esque part of our product around this combination.
It so fast, it flies – we can process raw event logs at 250,000 events/s without breaking a sweat on a meagre EC2 m1.large instance. (bold emphasis added)
Don’t you just hate it when bloggers hold back? 😉
I’m not endorsing this solution but I do appreciate a post with attitude and useful information.
Enjoy!