Rich Sherman writes in Big Data & The Wizard of Oz Syndrome:
An excellent article in the Wall Street Journal, “Big Data, Big Blunders,” discussed five mistakes commonly made by enterprises when initiating their first Big Data projects. The technology hype cycle, which reminds me a lot of The Wizard of Oz, is a contributing factor in these blunders. I’ll briefly summarize the WSJ’s points, and will suggest, based on my experience helping clients, why enterprises make these blunders.
Rick summarizes these points from the WSJ story:
- Data for Data’s Sake
- Talent Gap
- Data, Data Everywhere
- Aiming Too High
Rick says that advocates of new technologies promise to solve problems with prior technology advances, leading to unrealistic expectations.
I agree but there is a persistent failure to recognize the uncertainty principle for data.
How would you know if data is clean and uniform?
By your use case for the data. Yes?
That would explain why data scientists estimate they spend 60-80% of their time munging data (cleaning, transforming, etc.).
They are making data clean and uniform for their individual use cases.
And they do that task over and over again.
The definition of clean and uniform data is like the uncertainty principle in physics.
You can have clean and uniform data for one purpose, but making it so makes it dirty and non-uniform for another purpose.
Unless a technology outlines how it obtains clean and uniform data, from its perspective, it has told you only part of the cost of its use.