I have been collecting the pointers to this series of posts for some time now.
It is a good series on analysis/visualization, with lessons that you can transfer to other data sets.
Mash-up Airlines Performance Data with Historical Weather Data to Pinpoint Weather Related Delays
For this exercise, I combined following four separate blogs that I did on BigData, R and SAP HANA. Historical airlines and weather data were used for the underlying analysis. The aggregated output of this analysis was outputted in JSON which was visualized in HTML5, D3 and Google Maps. The previous blogs on this series are:
- Big Data, R and SAP HANA: Analyze 200 Million Data Points and Later Visualize in HTML5 Using D3 – Part II
- Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps
- Getting Historical Weather Data in R and SAP HANA
- Tracking SFO Airport’s Performance Using R, HANA and D3
In this blog, I wanted to mash-up disparate data sources in R and HANA by combining airlines data with weather data to understand the reasons behind the airport/airlines delay. Why weather – because weather is one of the commonly cited in the airlines industry for flight delays. Fortunately, the airlines data breaks up the delay by weather, security, late aircraft etc., so weather related delays can be isolated and then the actual weather data can be mashed-up to validate the airlines’ claims. However, I will not be doing this here, I will just be displaying the mashed-up data.
I have intentionally focused on the three bay-area airports and have used last 4 years of historical data to visualize the airport’s performance using a HTML5 calendar built from scratch using D3.js. One can use all 20 years of data and for all the airports to extend this example. I had downloaded historical weather data for the same 2005-2008 period for SFO and SJC airports as shown in my previous blog (For some strange reasons, there is no weather data for OAK, huh?). Here is how the final result will look like in HTML5: