Building an AirPair Dashboard Using Apache Spark and Clojure by Sébastien Arnaud.
From the post:
Have you ever wondered how much you should charge per hour for your skills as an expert on AirPair.com? How many opportunities are available on this platform? Which skills are in high demand? My name is Sébastien Arnaud and I am going to introduce you to the basics of how to gather, refine and analyze publicly available data using Apache Spark with Clojure, while attempting to generate basic insights about the AirPair platform.
1.1 Objectives
Here are some of the basic questions we will be trying to answer through this tutorial:
- What is the lowest and the highest hourly rate?
- What is the average hourly rate?
- What is the most sought out skill?
- What is the skill that pays the most on average?
1.2 Chosen tools
In order to make this journey a lot more interesting, we are going to step out of the usual comfort zone of using a standard iPython notebook using pandas with matplotlib. Instead we are going to explore one of the hottest data technologies of the year: Apache Spark, along with one of the up and coming functional languages of the moment: Clojure.
As it turns out, these two technologies can work relatively well together thanks to Flambo, a library developed and maintained by YieldBot's engineers who have improved upon clj-spark (an earlier attempt from the Climate Corporation). Finally, in order to share the results, we will attempt to build a small dashboard using DucksBoard, which makes pushing data to a public dashboard easy.
…
In addition to illustrating the use of Apache Spark and Clojure, Sébastien also covers harvesting data from Twitter and processing it into a useful format.
Definitely worth some time over a weekend!