5 real-world uses of big data (?)
I ran across this post by David Smith that starts off well enough:
In the past year, big data has emerged as one of the most closely watched trends in IT. Organizations today are generating more data in a single day than that the entire Internet was generated as recently as 2000. The explosion of “big data”–much of it in complex and unstructured formats–has presented companies with a tremendous opportunity to leverage their data for better business insights through analytics.
Wal-Mart was one of the early pioneers in this field, using predictive analytics to better identify customer preferences on a regional basis and stock their branch locations accordingly. It was an incredibly effective tactic that yielded strong ROI and allowed them to separate themselves from the retail pack. Other industries took notice of Wal-Mart’s tactics — and the success they gleaned from processing and analyzing their data — and began to employ the same tactics.
But then none of the examples were Big Data:
- Afghanistan War Diaries (I don’t remember there being “terabytes” of data. Gigabytes, yes, but not terabytes.)
- Guatemala’s National Police records, some 80 million, to document Mayan descent genocide (another smallish data set)
- Bill James (he of Moneyball fame) is a well-known figure in the world of both baseball and statistics at this point, but that has not always been the case. (But it was in 2003 when he went to work for the Boston Red Sox, a bit out of range for a Big Data story.)
- BP Oil Spill – “NIST used the open source R language to run an uncertainty analysis that harmonized the estimates from various sources to come up with actionable intelligence around which disaster response efforts could be coordinated.” (Big Disaster, yes, but not Big Data.)
- CardioDX – “researchers at the company analyzed over 100 million gene samples to ultimately identify the 23 primary predictive genes for coronary artery disease.” (True, but not all at one time. Over a period of weeks if not months, running R routines to analyze data. Time consuming but not a Big Data story.)
All are great examples of data analysis and should be celebrated as such. But let’s reserve Big Data for data sets that pose storage or processing challenges that are not routinely met by the average desktop machine.
A day’s output from the Large Hadron Collider or one of the all-sky survey telescopes, something that undoubtedly is Big Data.
Whether “Big” or “small,” or “in-between” data, the real key is useful analysis.