“Who asks the questions?” is a section header in Follow The Data Down The Rabbit Hole by Mark Gazit.
The question of human bias hangs like a shadow over the accuracy and efficiency of big data analytics, and thus the viability of answers obtained thereof. If different humans can look at the same data and come to different conclusions, just how reliable can those deductions be?
There is no question that using data science to extract knowledge from raw data provides tremendous value and opportunity to organizations in any sector, but the way it is analyzed has crucial bearing on that value.
In order to extract meaningful answers from big data, data scientists must decide which questions to ask of the algorithms. However, as long as humans are the ones asking the questions, they will forever introduce unintentional bias into the equation. Furthermore, the data scientists in charge of choosing the queries are often much less equipped to formulate the “right questions” than the organization’s specialized domain experts.
For example, a compliance manager would ask much better questions about her area than a scientist who has no idea what her day-to-day work entails. The same goes for a CISO or the executive in charge of insider threats. Does this mean that your data team will have to involve more people all the time? And what happens if one of those people leaves the company?
Data science is necessary and important, and as data grows, so does the need for experienced data scientists. But at the same time, leaving all the computational work to humans makes it slower, less scientific, and quick to degrade in quality because the human mind cannot keep up with the quantum leap that big data is undergoing. (emphasis added)
I find Big Data hype such as:
But at the same time, leaving all the computational work to humans makes it slower, less scientific, and quick to degrade in quality because the human mind cannot keep up with the quantum leap that big data is undergoing. (emphasis added)
deeply problematic.
The “human mind” is responsible for the creation of “big data” and our biases and assumptions are built into the hardware and software that process it.
Why should that be any different for the questions we ask of “Big Data?” Who is there to pose questions other than the “human mind?” Or to set into motion a framework that asks questions within the parameters of a framework that originated with a “human mind?”
Claims that “…the human mind cannot keep up…” are references to “your” human mind and not the “human maid” of the person making the statement. They are about to claim to have the correct interpretation of some data set. Just as statisticians (or to be fair, people claiming to be statisticians) for years claimed there was no link between smoking and lung cancer.
Claims about the human (your) brain are always made with an agenda. An agenda that puts some “fact,” some policy, some principle beyond being questioned. Identify that “fact,” policy, or principle because it is where their evidence is the weakest, else they would not try to put it beyond question.