Banning p < .05 In Psychology [Null Hypothesis Significance Testing Procedure (NHSTP)]

The recent banning of the Null Hypothesis Significance Testing Procedure (NHSTP) in psychology should be a warning to would be data scientists that even “well established” statistical procedures may be deeply flawed.

Sorry, you may not have seen the news. In Basic and Applied Social Psychology (BASP), Banning Null Hypothesis Significance Testing Procedure (NHSTP) (2015) David Trafimow and Michael Marks write

The Basic and Applied Social Psychology (BASP) 2014 Editorial emphasized that the null hypothesis significance testing procedure (NHSTP) is invalid, and thus authors would be not required to perform it (Trafimow, 2014). However, to allow authors a grace period, the Editorial stopped short of actually banning the NHSTP. The purpose of the present Editorial is to announce that the grace period is over. From now on, BASP is banning the NHSTP.

You may be more familiar with seeing p < .05 rather than Null Hypothesis Significance Testing Procedure (NHSTP).

David Trafimow cites in the 2014 editorial warning about NHSTP his earlier work, Hypothesis Testing and Theory Evaluation at the Boundaries: Surprising Insights From Bayes’s Theorem (2003) as justifying non-use and the later ban of NHSTP.

His argument is summarized in the introduction:

Despite a variety of different criticisms, the standard nullhypothesis significance-testing procedure (NHSTP) has dominated psychology over the latter half of the past century. Although NHSTP has its defenders when used “properly” (e.g., Abelson, 1997; Chow, 1998; Hagen, 1997; Mulaik, Raju, & Harshman, 1997), it has also been subjected to virulent attacks (Bakan, 1966; Cohen, 1994; Rozeboom, 1960; Schmidt, 1996). For example, Schmidt and Hunter (1997) argue that NHSTP is “logically indefensible and retards the research enterprise by making it difficult to develop cumulative knowledge” (p. 38). According to Rozeboom (1997), “Null-hypothesis significance testing is surely the most bone-headedly misguided procedure ever institutionalized in the rote training of science students” (p. 336). The most important reason for these criticisms is that although one can calculate the probability of obtaining a finding given that the null hypothesis is true, this is not equivalent to calculating the probability that the null hypothesis is true given that one has obtained a finding. Thus, researchers are in the position of rejecting the null hypothesis even though they do not know its probability of being true (Cohen, 1994). One way around this problem is to use Bayes’s theorem to calculate the probability of the null hypothesis given that one has obtained a finding, but using Bayes’s theorem carries with it some problems of its own, including a lack of information necessary to make full use of the theorem. Nevertheless, by treating the unknown values as variables, it is possible to conduct some analyses that produce some interesting conclusions regarding NHSTP. These analyses clarify the relations between NHSTP and Bayesian theory and quantify exactly why the standard practice of rejecting the null hypothesis is, at times, a highly questionable procedure. In addition, some surprising findings come out of the analyses that bear on issues pertaining not only to hypothesis testing but also to the amount of information gained from findings and theory evaluation. It is possible that the implications of the following analyses for information gain and theory evaluation are as important as the NHSTP debate.

The most important lines for someone who was trained with the null hypothesis as an undergraduate many years ago:

The most important reason for these criticisms is that although one can calculate the probability of obtaining a finding given that the null hypothesis is true, this is not equivalent to calculating the probability that the null hypothesis is true given that one has obtained a finding. Thus, researchers are in the position of rejecting the null hypothesis even though they do not know its probability of being true (Cohen, 1994).

If you don’t know the probability of the null hypothesis, any conclusion you draw is on very shaky grounds.

Do you think any of the big data “shake-n-bake” mining/processing services are going to call that problem to your attention? True enough, such services may “empower” users but if “empowerment” means producing meaningless results, no thanks.

Trafimow cites Jacob Cohen’s The Earth is Round (p < .05) (1994) in his 2003 work. Cohen is angry and in full voice as only a senior scholar can afford to be.

Take the time to read both Trafimow and Cohen. Many errors are lurking outside your door but that will help you recognize this one.

Comments are closed.