Abstract
Keywords
After moving to a system of having a statistician present at every meeting, none of the editorial team could imagine moving back to a system where they were not present.
Scientific claims in psychology often rely on a scaffold of statistical analyses that support inductive inferences from samples of data (Rosnow & Rosenthal, 1989). The appropriate selection, implementation, reporting, and interpretation of these analyses is necessary for the validity of the associated claims (Cook & Campbell, 1979; García-Pérez, 2012). Readers of the peer-reviewed literature may assume that reported statistical analyses have been closely scrutinized for quality. But serious concerns about the credibility of psychological research have been raised (Baker, 2016; Pashler & Wagenmakers, 2012), and the misunderstanding and misuse of statistical methods has been implicated as an important cause (Button et al., 2013; Gigerenzer, 2018; Munafò et al., 2017; Simmons, Nelson, & Simonsohn, 2011).
In this article, we explore whether psychology journals could ameliorate some of the field’s statistical ailments by adopting
Disclosures
Data, materials, and online resources
All data (https://osf.io/nquws/files/), survey materials (https://osf.io/tmah8/files/), and analysis scripts (https://osf.io/4zurk/files/) related to this study are publicly available on the Open Science Framework. To facilitate reproducibility, we wrote this manuscript by interleaving regular prose and analysis code, using knitr (Xie, 2018) and papaja (Aust & Barth, 2019), and have made the manuscript available in a software container (https://doi.org/10.24433/CO.8241121.v3) that re-creates the computational environment in which the original analyses were performed. Detailed methods and results for the survey of psychology editors is provided in the Supplemental Material (available online at http://journals.sagepub.com/doi/suppl/10.1177/2515245919858428).
Reporting
The survey data reported here represent the subsample of psychology journals included in a broader survey of statistical-reviewing policies at biomedical journals. The findings for biomedical journals will be reported elsewhere (Hardwicke & Goodman, 2019), and the findings for psychology journals are reported here for the first time. We report how we determined our sample size, all data exclusions, all manipulations, and all measures in the study.
Ethical approval
This study was approved by the institutional review board of the Stanford University School of Medicine.
What Do Psychology-Journal Editors Think About Statistical Review? Results of a Survey
To gauge the current use of statistical review, we surveyed a sample of high-impact psychology journals (full methods and results are provided in the Supplemental Material available online). We received responses from editors (all but one an editor-in-chief) at 39 of 118 psychology journals representing 13 subfields (Fig. S1 in the Supplemental Material). We asked respondents about the frequency of statistical review in their journal, the nature of their statistical reviewers and how they are chosen, the procedures and outcomes of statistical review, their ability and willingness to use statistical review, and their perception of the value of statistical review.
An unexpected observation both complicated interpretation of the data and motivated this commentary; 17 (44%) respondents stated that no additional specialized statistical review was warranted and that regular peer reviewers are both capable of evaluating and expected to evaluate the statistical aspects of submitted manuscripts (see Results in the Supplemental Material). These views contrast starkly with those of biomedical editors and statisticians (Wasserstein & Lazar, 2016), who almost universally accept the notion that statistical errors or suboptimal analyses can go undetected by regular peer review, and that specialized and targeted statistical review is required (Hardwicke & Goodman, 2019).
Does Psychology Need Statistical Review?
Researchers have highlighted a litany of statistical ailments that afflict the psychology literature, ranging from simple reporting errors to wholesale misunderstanding and misapplication of fundamental statistical concepts and techniques (see Table 1). One striking example is the pervasive problem of inadequate statistical power that persists in several domains of psychology. Many published psychology studies have such small sample sizes that statistical tests are unlikely to be sufficiently powered to detect plausible effects (e.g., Button et al., 2013; Cohen, 1962; Fraley & Vazire, 2014; Sedlmeier & Gigerenzer, 1989; Stanley, Carter, & Doucouliagos, 2018; Szucs & Ioannidis, 2017; Vankov, Bowers, & Munafò, 2014). Smaldino and McElreath (2016) examined 44 studies of statistical power in the social and behavioral sciences and found that the average power to detect small-size effects (
Statistical Ailments in the Published Psychology (and Related) Literature, With References Providing Further Detail and Empirical Evidence
The pervasiveness of statistical ailments in the published literature suggests that peer review in psychology journals is not sufficient to identify and minimize those problems. Quantitative training programs in psychology are typically slow to incorporate contemporary developments, avoid advanced topics, and provide only superficial treatment of fundamental statistical concepts (Aiken, West, & Millsap, 2008; Aiken, West, Sechrest, & Reno, 1990). Much quantitative training in psychological science neglects historical and philosophical foundations (Gigerenzer, 2004, 2018), proliferating confusion about core statistical concepts and facilitating widespread adoption of suboptimal practices (Wasserstein & Lazar, 2016). Statistical misconceptions are prevalent among instructors and deeply embedded in mainstream research-methods curricula (Brewer, 1985; Haller & Krauss, 2002; for a review, see Gigerenzer, 2018). Some research practices taught to undergraduates are now recognized as questionable (Bem, 2004; Wagenmakers, Wetzels, Borsboom, Maas, & van der Kievit, 2012).
Why Is Statistical Review Used in Medicine?
Leading biomedical journals have been adopting statistical review and refining their policies since the 1970s (Altman, 1982, 1994, 1998; Smith, 2005). Most biomedical-journal editors in our survey (Hardwicke & Goodman, 2019) indicated that they believed statistical review provides substantial incremental value beyond regular peer review and results in important changes to manuscripts around 60% of the time—even though many biomedical articles have Ph.D.-level methodologists among the authors. This view is supported by empirical work evaluating leading medical journals, including
To our knowledge, there has been only one randomized control trial designed to evaluate the effectiveness of statistical review (Cobo et al., 2007). That study, conducted at the biomedical journal
Providing statistical guidelines for authors makes a journal’s expectations transparent and may help to improve statistical practice (Bailar & Mosteller, 1988; Smith, 2005). Many psychology journals indicate that authors should adhere to statistical-reporting guidelines, such as those from the American Psychological Association (APA Publications and Communications Board Working Group on Journal Article Reporting Standards, 2008; Wilkinson & APA Task Force on Statistical Inference, 1999) and the Psychonomic Society (2019).
The evidence for the effectiveness of statistical guidelines in biomedical journals is mixed (Dexter & Shafer, 2017). In psychology, the introduction of journal-specific statistical guidelines at the journal
In summary, there is a reasonable body of evidence to suggest that specialized statistical review in biomedicine has been effective in preventing many serious analytic and inferential errors from reaching the published literature. Could psychology journals improve the validity and reproducibility of their content by adopting a similar model?
How Would Statistical Review Work in Psychology?
In the biomedical domain, there is no single model for statistical review (Hardwicke & Goodman, 2019), and policies have evolved gradually (Altman, 1998; Cobo et al., 2007; Gardner & Bond, 1990; Goodman et al., 1994; Gore et al., 1992; Prescott & Civil, 2013; Schor & Karten, 1966; Smith, 2005). Drawing from that experience, we address four key logistical issues: Who should conduct statistical review, which manuscripts should undergo statistical review, at what stage should statistical review be performed, and how should statistical review be incorporated into the editorial process?
Who should conduct statistical review?
Psychology statistical reviewers do not necessarily need to be statisticians per se, but should have advanced (Ph.D.-level) quantitative training (Goodman et al., 1998; Hardwicke & Goodman, 2019). Ideally, they should understand the terminology, conventions, and practices of psychology research. The majority of psychology-journal editors responding to our survey reported that difficulty in finding appropriate reviewers affected their willingness to conduct statistical review (see Fig. S3 in the Supplemental Material). It is not clear whether this difficulty reflected a lack of potential reviewers or problems identifying them.
The number of statistical reviewers required for a journal will depend on the model of statistical review employed. Just over half of the biomedical journals we surveyed indicated that they typically relied on around two statistical experts on their internal editorial teams to conduct all of the statistical review (Hardwicke & Goodman, 2019), although the largest journals tended to have more. Just over a third relied on a pool of external reviewers, with a median size of 11 members. In our psychology survey, the majority of respondents who reported using statistical review indicated that their statistical reviewers were typically identified on an ad hoc basis (58%; see Fig. S4a in the Supplemental Material). Many relied on from 1 to 40 editorial-team members (
A starting point for psychology journals could be to recruit one statistical expert to serve on the editorial board or be retained as a regular consultant. If the expert is not someone who would see this as a professional service or a vehicle for career advancement, compensation might be required. Whereas about half of biomedical journals pay their statistical reviewers, only one respondent in our psychology survey did so (Fig. S4c in the Supplemental Material).
Which manuscripts should undergo statistical review?
Optimally, all likely-to-be-accepted manuscripts with relevant statistical content should undergo statistical review (George, 1985; Schor & Karten, 1966). In our psychology survey (see Fig. S2 in the Supplemental Material), 15 (38%) respondents indicated that statistical review was used for all relevant articles, and 15 (38%) respondents indicated that statistical review was rare (≤ 10% of articles). However, free-text comments indicated that at least 8 of the 15 who said all manuscripts received such review did not differentiate it from regular peer review; some of these editors indicated that peer reviewers had sufficient statistical training.
If all articles cannot be statistically reviewed, editors have to prioritize. Smith (2005) noted that it took 5 to 10 years for
Targeting manuscripts with complex methods for statistical review makes some sense, but a number of commentators in the biomedical domain have noted that routine statistical analyses tend to be the most problematic (Schor & Karten, 1966; Smith, 2005). Sophisticated analyses may be conducted by individuals with more statistical expertise (Schor & Karten, 1966). Many of the statistical ailments in the psychology literature relate to foundational issues, not advanced techniques. Consequently, the most impactful contribution of statistical review might come from evaluating what appear to be routine analyses.
At what stage should statistical review be performed?
An important question for journals is, at what stage of the publication process should manuscripts undergo statistical review? In our survey of biomedical journals, the majority of respondents indicated that statistical review was either solicited at the same time as regular peer review (36%) or after regular peer review and before a provisional acceptance decision (27%). In our psychology survey, although the majority of respondents (71%) indicated that statistical review was solicited at the same time as regular peer review (Fig. S5c), many did not differentiate between regular and statistical review. The model will ultimately be journal-specific, dependent on the journal’s capacity for statistical review.
How should statistical review be incorporated into the editorial process?
How editors should incorporate the input of statistical reviewers is an important issue, particularly for journals unused to such review. Smith (2005) described the slow process of mutual education that had to occur at We worried that the gulf between medical editors and statisticians with no knowledge of medical research would be unbridgeable. . . . In the early days we made the mistake of thinking that statistics was a much more exact science than clinical research and that we had to go along with exactly what the statisticians advised. Eventually we learnt that there was room for negotiation over what was acceptable . . . recognizing the inevitable trade-offs between statistical purity [and] what can actually be done in clinical research. . . . (p. 2)
Smith’s observations illustrate that effective statistical review requires not only the addition of a statistical reviewer, but also “cross-cultural” education and communication, which takes time. The reviewers need to understand and absorb the values of the research community they are serving, and that community, and the editors, needs to understand how the changes requested by such reviewers are improving the validity of its research. External statistical reviewers who are not part of the journal can make unrealistic requests, which must be adjudicated or modified by an internal editor. Statistical experts directly incorporated into the editorial process absorb journal and disciplinary norms and are also able to educate editors.
Statistical Review, Open Science, and Metaresearch
Psychological science is in the midst of a credibility revolution (Nelson et al., 2018; Vazire, 2018), and this is an opportune time for journal editors to consider adoption of statistical review. There is growing awareness that the credibility of scientific claims depends on transparent reporting (Klein et al., 2018; Munafò et al., 2017). Statistical review is likely to be most effective when reviewers have access to all of the raw research artifacts (materials, data, analysis scripts, and preregistered protocols when relevant), which enable a fully informed assessment. Having access to data, and ideally analysis scripts, enables verification of analytic reproducibility (Hardwicke et al., 2018; Sakaluk, Williams, & Biernat, 2014) and assessment of analytic robustness (LeBel, McCarthy, Earp, Elson, & Vanpaemel, 2018; Localio et al., 2018 Steegen, Tuerlinckx, Gelman, & Vanpaemel, 2016), and it can facilitate detection of fraud (Simonsohn, 2013; Smith, 2005). Research materials can convey statistically relevant information about data collection, and availability of survey instruments can help reviewers raise questions about psychometric issues (McPherson & Mohr, 2005). Preregistration of study protocols (Nosek, Ebersole, DeHaven, & Mellor, 2018) could facilitate identification of questionable research practices such as
Statistical review might be enhanced by the use of computer algorithms to automatically screen for and flag potential errors in submitted manuscripts. The free software
Increasing input from quantitative experts before a study begins could be an especially impactful approach to improving the quality of statistical scaffolding. A growing number of psychology journals, such as
Finally, psychologists not only are driving the development of new reform initiatives, but also are conducting empirical investigations to evaluate the effectiveness of these initiatives in order to iteratively improve upon them (e.g., Hardwicke & Ioannidis, 2018; Hardwicke et al., 2018; Kidwell et al., 2016; Nuijten et al., 2017). These exercises in metaresearch (Hardwicke et al., 2019; Ioannidis, Fanelli, Dunne, & Goodman, 2015) should be extended to statistical review. A series of prospectively registered randomized control trials designed to evaluate various models of statistical review would be a valuable tool for gathering evidence relevant to this issue.
Conclusion
In this article, we have advocated that psychology journals consider adopting specialized statistical review to complement regular peer review. We have been partly informed by the results of a survey of psychology-journal editors; however, given the small number of respondents, likelihood of self-selection bias, and reliance on self-report, only tentative inferences can be drawn from these data. Our arguments are mainly based on the apparent benefits of statistical review in the biomedical domain and the documented statistical problems pervading the psychology research literature. We contend that there is sufficient evidence to support pilot testing expert statistical review in psychology journals, with concomitant monitoring and evaluation.
Statistical review will not cure all of psychology’s statistical ailments, just as it is no panacea in biomedicine. The most effective antidote is likely to involve efforts to improve statistical competence among psychology researchers (Aiken et al., 2008), and to promote more open science, which would enable more effective postpublication review. This will require nontrivial reforms in training curricula and normative structures surrounding design, analysis, and inference. If psychology is to break free of problematic statistical rituals (Salsburg, 1985) and make better use of the analysis toolbox (Gigerenzer, 2014, 2018), it will require an infusion of fresh thinking from well-trained quantitative experts at all stages of the teaching, research, funding, and publication pipeline.
Supplemental Material
Hardwicke_Open_Practices_Disclosure_Rev – Supplemental material for Should Psychology Journals Adopt Specialized Statistical Review?
Supplemental material, Hardwicke_Open_Practices_Disclosure_Rev for Should Psychology Journals Adopt Specialized Statistical Review? by Tom E. Hardwicke, Michael C. Frank, Simine Vazire and Steven N. Goodman in Advances in Methods and Practices in Psychological Science
Supplemental Material
Hardwicke_Rev_Supplemental_Material – Supplemental material for Should Psychology Journals Adopt Specialized Statistical Review?
Supplemental material, Hardwicke_Rev_Supplemental_Material for Should Psychology Journals Adopt Specialized Statistical Review? by Tom E. Hardwicke, Michael C. Frank, Simine Vazire and Steven N. Goodman in Advances in Methods and Practices in Psychological Science
Footnotes
Action Editor
Author Contributions
Declaration of Conflicting Interests
Funding
Open Practices
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
