A recent commentary in Nature has the provocative title “Retire Statistical Significance” (1, with a list of more than 800 signatories) and has been widely interpreted as a call for the entire concept of statistical significance to be abandoned. Closer reading of the commentary suggests that the main message of the paper is a call to stop the use of P values or confidence intervals in a categorical or binary sense in order to be absolute as to whether a result supports or refutes a scientific hypothesis. This remains a radical proposal but perhaps does not signal the end for statistical tests in biomedical research just yet.
For pharmacologists, particularly those who wish to publish in the British Journal of Pharmacology (BJP), the proposals in Amrhein et al. (1) are a problem. They appear to directly contradict advice given in the guidelines for publication in the BJP, introduced by Curtis et al. (2), namely: “when comparing groups, a level of probability (P) deemed to constitute the threshold for statistical significance should be defined in Methods, and not varied later in Results (by presentation of multiple levels of significance).” In other words, statistical tests must produce a categorical outcome based on a P value of a defined threshold (normally as P = 0.05, or a 95% confidence interval) for all data sets in the paper.
So, which is correct? How should potential future authors in BJP and elsewhere approach this? In the spirit of the Amrhein et al. (1) article, I do not propose to make a binary choice here. After all, in the wider sense, both approaches seek to address the same issues of reliability and reproducibility in scientific research; issues which are particularly problematic in the area of biomedical science and thus pharmacology. The BJP approach is based around objectivity and removal of bias (whether unconscious or not). Here, decisions are largely taken away from the experimenter with a predefined statistical threshold coupled to a number of guidance statements around experimental design. There is much merit in this approach, and the journal does encourage authors to make appropriate caveats (3) but, inevitably, when such absolute, categorical decisions are made, P = 0.04 will take science in a different direction to P = 0.06. As Colquhoun (4) and others have shown, much too often this will be the wrong direction.
For this reason, I prefer the Amrhein et al. (1) proposals, but, to my mind, they come with at least two requirements. One of these requirements is data transparency and availability. If authors do not provide a statement about statistical significance, it is incumbent on them to make their data freely available so others, particularly those researchers working closely in the field, can study the data in detail in order to support or refute the messages of the paper, ideally, perhaps, in the form of post-publication peer review. A second requirement is trust. In the absence of a statistical significance rule book or convention (however flawed), authors must provide a subjective narrative around the results and readers must expect that they can trust this narrative to be both informed and unbiased. However transparent and available the underlying data, most readers will rely on the authors to guide their understanding and interpretation of the research. In an environment where “researchers’ careers depend more on publishing results with ‘impact’ than on publishing results that are correct” (5), this is surely the big challenge.
Comments by Alistair Mathie (@AlistairMathie), The Medway School of Pharmacy
(1) Amrhein V, Greenland S & McShane B. (2019). Scientists rise up against statistical significance. Nature, 567(7748):305-307. doi: 10.1038/d41586-019-00857-9. [PMID:30894741]
(2) Curtis MJ et al. (2019). Experimental design and analysis and their reporting: new guidance for publication in BJP. Br J Pharmacol, 172(14):3461-71. doi: 10.1111/bph.12856. [PMID:26114403]
(3) Curtis MJ et al. (2019). Experimental design and analysis and their reporting II: updated and simplified guidance for authors and peer reviewers. Br J Pharmacol, 175(7):987-993. doi: 10.1111/bph.14153. [PMID:29520785]
(4) Colquhoun D. (2019). An investigation of the false discovery rate and the misinterpretation of p-values. R Soc Open Sci, 1(3):140216. doi: 10.1098/rsos.140216. eCollection 2014 Nov. [PMID:26064558]
(5) Casadevall A. (2019). Duke University’s huge misconduct fine is a reminder to reward rigour. Nature, 568(7). [World View: Article]
Leave a Reply