Bill Gates is naive, data is not objective by Cathy O’Neil.
From the post:
In his recent essay in the Wall Street Journal, Bill Gates proposed to “fix the world’s biggest problems” through “good measurement and a commitment to follow the data.” Sounds great!
Unfortunately it’s not so simple.
Gates describes a positive feedback loop when good data is collected and acted on. It’s hard to argue against this: given perfect data-collection procedures with relevant data, specific models do tend to improve, according to their chosen metrics of success. In fact this is almost tautological.
As I’ll explain, however, rather than focusing on how individual models improve with more data, we need to worry more about which models and which data have been chosen in the first place, why that process is successful when it is, and – most importantly – who gets to decide what data is collected and what models are trained.
Cathy makes a compelling case for data not being objective and concludes:
Don’t be fooled by the mathematical imprimatur: behind every model and every data set is a political process that chose that data and built that model and defined success for that model.
Sounds a lot like identifying subjects.
No identification is objective. They all occur as part of social processes and are bound by those processes.
No identification is “better” than another one, although is some contexts, particular identifications may be more useful that others.
I first saw this in Four short links: 4 February 2013 by Nat Torkington.