The Elements of Statistical Learning (2nd ed.) by Trevor Hastie, Robert Tibshirani and Jerome Friedman. (PDF)
The authors note in the preface to the first edition:
The field of Statistics is constantly challenged by the problems that science and industry brings to its door. In the early days, these problems often came from agricultural and industrial experiments and were relatively small in scope. With the advent of computers and the information age, statistical problems have exploded both in size and complexity. Challenges in the areas of data storage, organization and searching have led to the new field of “data mining”; statistical and computational problems in biology and medicine have created “bioinformatics.” Vast amounts of data are being generated in many fields, and the statistician’s job is to make sense of it all: to extract important patterns and trends, and understand “what the data says.” We call this learning from data.
I’m sympathetic to that sentiment but with the caveat that it is our semantic expectations of the data that give it any meaning to be “learned.”
Data isn’t lurking outside our door with “meaning” captured separate and apart from us. Our fancy otherwise obscures our role in the origin of “meaning” that we attach to data. In part to bolster the claim that the “facts/data say….”
It is us who take up the gauge for our mute friends, facts/data, and make claims on their behalf.
If we recognized those as our claims, perhaps we would be more willing to listen to the claims of others. Perhaps.
I first saw this in a tweet by Michael Conover.