Automated science, deep data and the paradox of information…
Bradley Voytek writes:
A lot of great pieces have been written about the relatively recent surge in interest in big data and data science, but in this piece I want to address the importance of deep data analysis: what we can learn from the statistical outliers by drilling down and asking, “What’s different here? What’s special about these outliers and what do they tell us about our models and assumptions?”
The reason that big data proponents are so excited about the burgeoning data revolution isn’t just because of the math. Don’t get me wrong, the math is fun, but we’re excited because we can begin to distill patterns that were previously invisible to us due to a lack of information.
That’s big data.
Of course, data are just a collection of facts; bits of information that are only given context — assigned meaning and importance — by human minds. It’s not until we do something with the data that any of it matters. You can have the best machine learning algorithms, the tightest statistics, and the smartest people working on them, but none of that means anything until someone makes a story out of the results.
And therein lies the rub.
Do all these data tell us a story about ourselves and the universe in which we live, or are we simply hallucinating patterns that we want to see?
I reformulate Bradley’s question into:
We use data to tell stories about ourselves and the universe in which we live.
Which means that his rules of statistical methods:
…
- The more advanced the statistical methods used, the fewer critics are available to be properly skeptical.
- The more advanced the statistical methods used, the more likely the data analyst will be to use math as a shield.
- Any sufficiently advanced statistics can trick people into believing the results reflect truth.
are sources of other stories “about ourselves and the universe in which we live.”
If you prefer Bradley’s original question:
Do all these data tell us a story about ourselves and the universe in which we live, or are we simply hallucinating patterns that we want to see?
I would answer: And the difference would be?