Real-time Analytics with HBase
From the post:
Here are slides from another talk we gave at both Berlin Buzzwords and at HBaseCon in San Francisco last month. In this presentation Alex describes one approach to real-time analytics with HBase, which we use at Sematext via HBaseHUT. If you like these slides you will also like HBase Real-time Analytics Rollbacks via Append-based Updates.
The slides come in a long and short version. Both are very good but I suggest the long version.
I particularly liked the “Background: pre-aggregation” slide (8 in the short version, 9 in the long version).
Aggregation as a form of merging.
What information is lost as part of aggregation? (That assumes we know the aggregation process. Without that, can’t say what is lost.)
What information (read subjects/relationships) do we want to preserve through an aggregation process?
What properties should those subjects/relationships have?
(Those are topic map design/modeling questions.)