First experiments with Apache Spark at Snowplow by Justin Courty.
From the post:
As we talked about in our May post on the Spark Example Project release, at Snowplow we are very interested in Apache Spark for three things:
- Data modeling i.e. applying business rules to aggregate up event-level data into a format suitable for ingesting into a business intelligence / reporting / OLAP tool
- Real-time aggregation of data for real-time dashboards
- Running machine-learning algorithms on event-level data
We’re just at the beginning of our journey getting familiar with Apache Spark. I’ve been using Spark for the first time over the past few weeks. In this post I’ll share back with the community what I’ve learnt, and will cover:
- Loading Snowplow data into Spark
- Performing simple aggregations on Snowplow data in Spark
- Performing funnel analysis on Snowplow data
I’ve tried to write the post in a way that’s easy to follow-along for other people interested in getting up the Spark learning curve.
What a great post to find just before the weekend!
You will enjoy this one and others in this series.
Have you every considered aggregation into business dashboard to include what is known about particular subjects? We have all seen the dashboards with increasing counts, graphs, charts, etc. but what about non-tabular data?
A non-tabular dashboard?