Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 6, 2015

Racist algorithms: how Big Data makes bias seem objective

Filed under: BigData,Ethics — Patrick Durusau @ 8:46 pm

Racist algorithms: how Big Data makes bias seem objective by Cory Doctorow.

From the post:

The Ford Foundation’s Michael Brennan discusses the many studies showing how algorithms can magnify bias — like the prevalence of police background check ads shown against searches for black names.

What’s worse is the way that machine learning magnifies these problems. If an employer only hires young applicants, a machine learning algorithm will learn to screen out all older applicants without anyone having to tell it to do so.

Worst of all is that the use of algorithms to accomplish this discrimination provides a veneer of objective respectability to racism, sexism and other forms of discrimination.

Cory has a good example of “hidden” bias in data analysis and has suggestions for possible improvement.

Although I applaud the notion of “algorithmic transparency,” the issue of bias in algorithms may be more subtle than you think.

Lauren J. Young reports in Computer Scientists Find Bias in Algorithms that the bias problem can be especially acute with self-improving algorithms. Algorithms, like users have experiences and those experiences can lead to bias.

Lauren’s article is a good introduction to the concept of bias in algorithms, but for the full monty, see: Certifying and removing disparate impact by Michael Feldman, et al.

Abstract:

What does it mean for an algorithm to be biased? In U.S. law, unintentional bias is encoded via disparate impact, which occurs when a selection process has widely different outcomes for different groups, even as it appears to be neutral. This legal determination hinges on a definition of a protected class (ethnicity, gender, religious practice) and an explicit description of the process.

When the process is implemented using computers, determining disparate impact (and hence bias) is harder. It might not be possible to disclose the process. In addition, even if the process is open, it might be hard to elucidate in a legal setting how the algorithm makes its decisions. Instead of requiring access to the algorithm, we propose making inferences based on the data the algorithm uses.

We make four contributions to this problem. First, we link the legal notion of disparate impact to a measure of classification accuracy that while known, has received relatively little attention. Second, we propose a test for disparate impact based on analyzing the information leakage of the protected class from the other data attributes. Third, we describe methods by which data might be made unbiased. Finally, we present empirical evidence supporting the effectiveness of our test for disparate impact and our approach for both masking bias and preserving relevant information in the data. Interestingly, our approach resembles some actual selection practices that have recently received legal scrutiny.

Bear in mind that disparate impact is only one form of bias for a selected set of categories. And that bias can be introduced prior to formal data analysis.

Rather than say data or algorithms can be made unbiased, say rather that known biases can be reduced to acceptable levels, for some definition of acceptable.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress