The myth of the aimless data explorer by Enrico Bertini.
From the post:
There is a sentence I have heard or read multiple times in my journey into (academic) visualization: visualization is a tool people use when they don’t know what question to ask to their data.
I have always taken this sentence as a given and accepted it as it is. Good, I thought, we have a tool to help people come up with questions when they have no idea what to do with their data. Isn’t that great? It sounded right or at least cool.
But as soon as I started working on more applied projects, with real people, real problems, real data they care about, I discovered this all excitement for data exploration is just not there. People working with data are not excited about “playing” with data, they are excited about solving problems. Real problems. And real problems have questions attached, not just curiosity. There’s simply nothing like undirected data exploration in the real world.
I think Enrico misses the reason why people use/like the phrase: visualization is a tool people use when they don’t know what question to ask to their data.
Visualization privileges the “data” as the source of whatever result is displayed by the visualization.
It’s not me! That’s what the data says!
Hardly. Someone collected the data. Not at random, stuffing whatever bits came along in a bag. Someone cleaned the data with some notion of what “clean” meant. Someone choose the data that is now being called upon for a visualization. And those are clumsy steps that collapse many distinct steps into only three.
To put it another way, data never exists without choices being made. And it is the sum of those choices that influence the visualizations that are even possible from some data set.
The short term for what Enrico overlooks is bias.
I would recast his title to read: The myth of the objective data explorer.
Having said that, I don’t mean that all bias is bad.
If I were collecting data on Ancient Near Eastern (ANE) languages, I would of necessity be excluding the language traditions of the entire Western Hemisphere. It could even be that data from the native cultures of the Western Hemisphere will be lost while I am preserving data from the ANE.
So we have bias and a bad outcome, from someone’s point of view because of that bias. Was that a bad thing? I would argue not.
It isn’t every possible to collect all the potential data that can be collected. We all make values judgments about the data we choose to collect and what we choose to ignore.
Rather than pretending that we possess objectivity in any meaningful sense, we are better off to state our biases to the extent we know them. At least others will be forewarned that we are just like them.