Statistical Reporting Errors in Psychology (1985–2013) [1 in 8]

Do you remember your parents complaining about how far the latest psychology report departed from their reality?

Turns out there may be a scientific reason why those reports were as far off as your parents thought (or not).

The prevalence of statistical reporting errors in psychology (1985–2013) by Michèle B. Nuijten , Chris H. J. Hartgerink, Marcel A. L. M. van Assen, Sacha Epskamp, Jelte M. Wicherts, reports:

This study documents reporting errors in a sample of over 250,000 p-values reported in eight major psychology journals from 1985 until 2013, using the new R package “statcheck.” statcheck retrieved null-hypothesis significance testing (NHST) results from over half of the articles from this period. In line with earlier research, we found that half of all published psychology papers that use NHST contained at least one p-value that was inconsistent with its test statistic and degrees of freedom. One in eight papers contained a grossly inconsistent p-value that may have affected the statistical conclusion. In contrast to earlier findings, we found that the average prevalence of inconsistent p-values has been stable over the years or has declined. The prevalence of gross inconsistencies was higher in p-values reported as significant than in p-values reported as nonsignificant. This could indicate a systematic bias in favor of significant results. Possible solutions for the high prevalence of reporting inconsistencies could be to encourage sharing data, to let co-authors check results in a so-called “co-pilot model,” and to use statcheck to flag possible inconsistencies in one’s own manuscript or during the review process.

This is an open access article so dig in for all the details discovered by the authors.

The R package statcheck: Extract Statistics from Articles and Recompute P Values is quite amazing. The manual for statcheck should have you up and running in short order.

I did puzzle over the proposed solutions:

Possible solutions for the high prevalence of reporting inconsistencies could be to encourage sharing data, to let co-authors check results in a so-called “co-pilot model,” and to use statcheck to flag possible inconsistencies in one’s own manuscript or during the review process.

All of those are good suggestions but we already have the much valued process of “peer review” and the value-add of both non-profit and commercial publishers. Surely those weighty contributions to the process of review and publication should be enough to quell this “…systematic bias in favor of significant results.”

Unless, of course, dependence on “peer review” and the value-add of publishers for article quality is entirely misplaced. Yes?

What area with “p-values reported as significant” will fall to statcheck next?

One Response to “Statistical Reporting Errors in Psychology (1985–2013) [1 in 8]”

  1. […] article, Statistical Reporting Errors in Psychology (19852013) [1 in 8], first appeared on Another Word For […]