First complex, then simple by James D Malley and Jason H Moore. (BioData Mining 2014, 7:13)
Abstract:
At the start of a data analysis project it is often suggested that the researcher look first at multiple simple models. That is, always begin with simple, one variable at a time analyses, such as multiple single-variable tests for association or significance. Then, later, somehow (how?) pull all the separate pieces together into a single comprehensive framework, an inclusive data narrative. For detecting true compound effects with more than just marginal associations, this is easily defeated with simple examples. But more critically, it is looking through the data telescope from wrong end.
I would have titled this article: “Data First, Models Later.”
That is the author’s start with no formal theories about what data will prove and upon finding signals in the data, then generate simple models to explain the signals.
I am sure their questions of the data are driven by a suspicion of what the data may prove, but that isn’t the same thing as asking questions designed to prove a model generated before the data is queried.