Deep-Fried Data by Maciej Ceglowski. (paper) (video of same presentation) Part of Collections as Data event at the Library of Congress.
If the “…money laundering for bias…” quote doesn’t capture your attention, try:
…
I find it helpful to think of algorithms as a dim-witted but extremely industrious graduate student, whom you don’t fully trust. You want a concordance made? An index? You want them to go through ten million photos and find every picture of a horse? Perfect.You want them to draw conclusions on gender based on word use patterns? Or infer social relationships from census data? Now you need some adult supervision in the room.
Besides these issues of bias, there’s also an opportunity cost in committing to computational tools. What irks me about the love affair with algorithms is that they remove a lot of the potential for surprise and serendipity that you get by working with people.
If you go searching for patterns in the data, you’ll find patterns in the data. Whoop-de-doo. But anything fresh and distinctive in your digital collections will not make it through the deep frier.
We’ve seen entire fields disappear down the numerical rabbit hole before. Economics came first, sociology and political science are still trying to get out, bioinformatics is down there somewhere and hasn’t been heard from in a while.
…
A great read and equally enjoyable presentation.
Enjoy!