Eight (No, Nine!) Problems With Big Data by Gary Marcus and Ernest Davis.
From the post:
The first thing to note is that although big data is very good at detecting correlations, especially subtle correlations that an analysis of smaller data sets might miss, it never tells us which correlations are meaningful. A big data analysis might reveal, for instance, that from 2006 to 2011 the United States murder rate was well correlated with the market share of Internet Explorer: Both went down sharply. But it’s hard to imagine there is any causal relationship between the two. Likewise, from 1998 to 2007 the number of new cases of autism diagnosed was extremely well correlated with sales of organic food (both went up sharply), but identifying the correlation won’t by itself tell us whether diet has anything to do with autism.
…
If you or your manager is drinking the “big data” kool-aid you may want to skip this article. Or if you stand to profit for the sale of “big data” appliances and/or services.
No point in getting confused about issues your clients aren’t likely to raise.
On the other hand, if you are a government employee who is tired of seeing the public coffers robbed for less than useful technology, you probably need to print out this article by Marcus and Davis.
Don’t quote from it but ask questions about any proposed “big data” project from each of the nine problem areas.
“Big data” and its tools have a lot of potential.
But consumers are responsible for preventing that potential being their pocketbooks.
Perhaps “caveat emptor” should now be written: “CAVEAT EMPTOR (Big Data).”
What do you think?