A Challenge to Data Scientists

A Challenge to Data Scientists by Renee Teate.

From the post:

As data scientists, we are aware that bias exists in the world. We read up on stories about how cognitive biases can affect decision-making. We know that, for instance, a resume with a white-sounding name will receive a different response than the same resume with a black-sounding name, and that writers of performance reviews use different language to describe contributions by women and men in the workplace. We read stories in the news about ageism in healthcare and racism in mortgage lending.

Data scientists are problem solvers at heart, and we love our data and our algorithms that sometimes seem to work like magic, so we may be inclined to try to solve these problems stemming from human bias by turning the decisions over to machines. Most people seem to believe that machines are less biased and more pure in their decision-making – that the data tells the truth, that the machines won’t discriminate.

Renee’s post summarizes a lot of information about bias, inside and outside of data science and issues this challenge:

Data scientists, I challenge you. I challenge you to figure out how to make the systems you design as fair as possible.

An admirable sentiment but one hard part is defining “…as fair as possible.”

Being professionally trained in a day to day “hermeneutic of suspicion,” as opposed to Paul Ricoeur‘s analysis of texts (Paul Ricoeur and the Hermeneutics of Suspicion: A Brief Overview and Critique by G.D. Robinson.), I have yet to encounter a definition of “fair” that does not define winners and losers.

Data science relies on classification, which has as its avowed purpose the separation of items into different categories. Some categories will be treated differently than others. Otherwise there would be no reason to perform the classification.

Another hard part is that employers of data scientists are more likely to say:

Analyze data X for market segments responding to ad campaign Y.

As opposed to:

What do you think about our ads targeting tweens by the use of sexual-content for our unhealthy product A?

Or change the questions to fit those asked of data scientists at any government intelligence agency.

The vast majority of data scientists are hired as data scientists, not amateur theologians.

Competence in data science has no demonstrable relationship to competence in ethics, fairness, morality, etc. Data scientists can have opinions about the same but shouldn’t presume to poach on other areas of expertise.

How you would feel if a competent user of spreadsheets decided to label themselves a “data scientist?”

Keep that in mind the next time someone starts to pontificate on “ethics” in data science.

PS: Renee is in the process of creating and assembling high quality resources for anyone interested in data science. Be sure to explore her blog and other links after reading her post.

2 Responses to “A Challenge to Data Scientists”

  1. Renee says:

    Hi Patrick,

    Got your pingback, and I appreciate your thoughtful response to my article and compliments about my site!

    I agree that “fairness” is not well defined, and it’s not only the data scientist’s job to keep an eye out for bias, and often the goal of an analysis task isn’t going to include fairness.

    (There is some discussion on what fairness means to individuals vs groups in this article http://www.socialsciencespace.com/2015/08/beware-big-data-is-not-free-of-discrimination/ )

    My goal in writing that article is to get data scientists to at least think about biases we can introduce in our work, and provide some info about ways to avoid it. Any system that affects one group disproportionately will affect the outcome for the business question as well.

    For instance, if there is a business objective to optimize to reduce false positives a model generates, and your overall confusion matrix shows one false positive error rate for a classification algorithm, but if you look at just the Hispanic population subset, you realize the false positives are much worse for them, there might be something going on in your model that you need to understand before finalizing your results. It could go all the way back to how the data was collected. And maybe you can’t fix it, but when you present the outcome, you could mention that finding. What if your company was about to launch a new service to a Hispanic community? At least you now know that the general predictive model you built is not as accurate for that population and can suggest a new model be built for that particular application.

    I was in a cognitive systems engineering class, and a fellow engineering student (masters level) admitted that his classes had taught him to optimize for manufacturability, cost, strength, etc, but until that class he hadn’t ever thought about the human using whatever he was designing, and especially not the cognitive load an interface might impose (we were learning about reducing “human error”).

    I am challenging data scientists to at least think about the people in their data, especially when a system is being built that will make decisions that have a major impact on people’s lives, so hopefully they will at least consider these types of questions in addition to mathematical optimization.


  2. Patrick Durusau says:


    All valid points and I have touched on bias in algorithms that isn’t evident until you consider the mechanics of the algorithm and the incoming data. See Non-News: Algorithms Are Biased and Watch Hilary Mason discredit the cult of the algorithm, for instance.

    Your comment about “think about the people in their data,” reminds me of a contract law professor, who after a semester of drilling us on the technical side of the law, said “…to always remember the plaintiff had a face and the defendant had a name.”

    In the same context, however, I had long debates with natural law theorists who would argue for their version of natural law, which just so happened to coincide with their personal beliefs as well. I think they were being completely honest and could not see they were trapped by their own presumptions.

    That is not to say that I or anyone can stand outside of presumptions, although many do in fact make that claim. It is to say that thinking about “why” a person is advocating a particular position is as least as important as the position itself.

    Keep up the good work! I look forward to seeing more of it.