Big Data Garbage In, Even Bigger Garbage Out by Alex Woodie.
From the post:
People are doing some truly amazing things with big data sets and analytic tools. Tools like Hadoop have given us astounding capabilities to drive insights out of huge expanses of loosely structured data. And while the big data breakthroughs are expected to continue, don’t expect any progress to be made against that oldest of computer adages: “garbage in, garbage out.”
In fact, big data may even exacerbate the GIGO problem, according to Andrew Anderson, CEO of Celaton, a UK company that makes software designed to prevent bad data from being introduced into customer’s accounting systems.
“The ideal payoff for accumulating data is rapidly compounding returns,” Anderson writes in an essay on Economia, a publication of a UK accounting association. “By gaining more data on your own business, your clients, and your prospects, the idea is that you can make more informed decisions about your business and theirs based on clear insight. Too often however, these insights are based on invalid data, which can lead to a negative version of this payoff, to the power of ten.”
The problem may compound to the power of 100 if bad data is left to fester. Anderson calls this the “1-10-100 rule.” If a clerk makes a mistake entering data, it costs $1 to fix it immediately. After an hour–when the data has begun propagating across the system–the cost to fix it increases to $10.
Several months later, after the piece of data has become part of the company’s data reality and mailings have gone out to the wrong people and invoices have gone unpaid and new clients have not been contacted about new services, the cost of that single data error balloons to $100.
If you read the essay in Economia, you will find the 1-10-100 rule expressed in British pounds. With the current exchange rate, the cost would be higher here in the United States.
Still, the point is a valid one.
Decisions made on faulty data may be the correct decisions, but your odds worsen as the quality of the data goes down.