Abstract
Ensuring the credibility of research findings hinges on the crucial role of reproduction and replication. By testing and verifying published research, both reproducibility and replication initiatives play vital roles in shaping scientific knowledge. These efforts enable us to evaluate the robustness of findings, transforming science into a self-correcting system that identifies and rectifies inaccuracies, ultimately influencing policy-making in significant ways.
When attempting to reproduce published results, researchers often face roadblocks (Colliard, 2021). Published assessment studies generally report reproducibility rates below 50%, and sometimes the success rate is single-digit (Avelino et al., 2021; Gertler Galiani and Romero, 2018; McCullough et al., 2006). This may be due to the data not being publicly available because of their nature: administrative, proprietary, and/or copyrighted data (Christensen and Miguel 2018). Furthermore, for many other studies, the required computer code is not available or incomplete (Chang and Li, 2022; Gertler et al., 2018).
A few large-scale replication projects have taken place recently, including one in psychology (Open Science Collaboration, 2015), one in experimental economics (Camerer, 2016) and a social science replication project (Camerer, 2018). Replication here means that the original study’s main significant result was the focus of a new study using similar methods and tests on a fresh sample. Pooling the results of these large replication projects yielded a replication rate of about 50%.
Beyond lab experiments and especially for studies based on observational data, large-scale reproduction and replication projects have not been attempted. Instead, most reproductions and replications involve reproducing or replicating the claims of one original study, often evoking lengthy debates about the interpretation of results. Yet, some recent reviews point at systematic problems such as p-hacking in studies based on observational data and that these problems are worse than for experimental/RCT data (see e.g., Brodeur et al., 2020; Young, 2017), which, if true, would translate into even lower replicability rates than for experimental studies. Experimental studies, in turn, are often said to be more prone to external and construct validity concerns, with obvious implications for their replicability in new settings (Esterling, 2023; Findley, 2021, Peters et al., 2018).
Low reproducibility and replicability rates may be due to many factors. First, many previous studies have been performed on small sample sizes or look at small effects, implying low statistical power (Ioannidis et al., 2017). Arel-Bundock (2022) assess statistical power for about 2000 articles in political science. They report that the median analysis has about 10% power. Second, there are typically many ways of testing a hypothesis, giving researchers many “researcher degrees of freedom” in their analysis (Simmons et al., 2011). Specification searching (or “p-hacking”) has been found to be a problem in political science and related disciplines (Brodeur, 2016; Gerber and Malhotra, 2008). Third, researchers might be tempted to select their hypotheses after the results are known (called “HARKing”) on the basis of whether they yield significant results (Kerr, 1998). All these factors make it hard to disentangle true results from false positive and false negative ones.
Reproduction and replication studies are an important part of changing incentives and improving the quality and credibility of original research. They may themselves also generate new knowledge and findings. Other strategies to enhance transparency and credibility in scientific research include pre-registration of hypotheses and data analysis plans. Reproduction and pre-registration are complements, not substitutes. Pre-analysis plans can address the problem of p-hacking datasets and generating false positive results. Reproduction and replication, the focus of this paper, address the integrity and robustness of data and findings, and may allow for collecting additional data and testing new hypotheses. Recent efforts have sought to combine replication and pre-analysis plans, as exemplified by initiatives like the Metaketa initiative (https://egap.org/our-work/the-metaketa-initiative/), where research teams coordinate their efforts, pre-register their analyses, and often replicate the core treatments and outcomes in different settings.
In addition to the technical and logistical hurdles that prevent researchers from reproducing past evidence, the current publication incentives remain unfavorable to reproductions (Coffman et al., 2017; Clemens, 2017). Publication outlets may tend to favor novel conceptual insights over new tests of a published idea, regardless of what these tests find. Furthermore, it is possible that researchers aiming to publish reproductions as standalone projects may face incentives to engage in selective reporting, implying that reproduction efforts might also suffer from p-hacking and other questionable research practices (QRPs, see Bryan et al., 2019).
In this article, we first provide definitions for reproducibility and replicability. Next, we review data availability journal policies. We then present the results from a survey of editors of leading political science journals about their willingness to publish comments and replications. We discuss new initiatives that seek to promote and generate high-quality reproductions and replications. Last, we make the case for standards and practices that may help increase data availability, reproducibility, and replicability in political science.
Definitions of reproducibility and replicability in political science
Several definitions of reproducibility and replicability have been used and proposed (see, e.g., Clemens 2017; Christensen and Miguel 2018; Ankel-Peters et al., 2023a). Dreber and Johannesson (2023) recently proposed definitions and indicators for economics that we summarize here and believe can be useful also for political science. As mentioned above,
Reproducibility is furthermore divided into three types.
Replicability is divided into two types.
Lack of reproducibility and replicability in political science
Reproducibility and replication efforts contribute in essential ways to the production of scientific knowledge. Within the social sciences, political scientists have pushed the frontier of research transparency on several dimensions, such as raising the issue (King, 1995), developing guidance to rigorously document research designs (Blair, 2019), and being early adopters of data and code availability for replication (see, for instance, Bueno de Mesquita, 2003). Other contributions include developing innovative methodologies to combat p-hacking (Breznau, 2022; Young and Holsteen, 2017), proposing standard operating procedures to address omissions or ambiguities in pre-analysis plans (Lin and Green, 2016) and establishing a trusted repository to archive time-stamped registrations (EGAP).
Despite the importance of reproductions and replications for the production of scientific knowledge, progress has been slow. Existing reviews of published reproduction activities mostly document small or even miniscule replication rates (Mueller-Langer et al., 2019). The present situation is unsurprising in light of the many barriers that prevent researchers from assessing the reliability of existing research. Indeed, access to data, codes, and protocols is to date not universal in political science (Dafoe, 2014).
Data availability policy and data editor.
With regard to the second question, four of the journals have dedicated data/replication editors, namely, Journal of Politics, Political Analysis, Political Communication, Political Science Research and Methods, and Quarterly Journal of Political Science. The American Journal of Political Science also has a verification process carried out by the Odum Institute for Research in Social Science at the University of North Carolina at Chapel Hill. This is an independent verification process that computationally reproduces numerical results for accepted articles. With regard to the first question, just one of the journals has no information on its website with regard to data or codes—Comparative Politics. Of the remaining 27 journals that were sampled, six encourage the sharing of data/codes and 21 mandate it.
It is worth emphasizing that some journals also recommend or mandate reporting standards. For instance, the
Survey of editors
To examine the demand for replications among journals, we surveyed editors of leading journals in Political Science about their journals’ policies to publish replications and comments (henceforth “replications,” see Figure 1 for the exact wording used in the survey). The editors were approached by email by late May 2023 and a reminder was sent early June for those who did not initially respond. The journals were selected through a crowdsourcing procedure—asking the political science Institute for Replication (I4R) board members to nominate journals and review the list. In total, 19 of 28 contacted editors responded. Figure 1 summarizes the results; the responses disaggregated by journal can be found in Table A1 in the Appendix. Most editors, 63%, stated that their journal would generally publish reproductions/replications of papers originally published in their own journal. Of those, 47% responded that they would also consider replications of papers in other journals (although for both questions, some constrained this by further criteria like the replicated paper being very relevant to the journal’s readership; see the detailed responses on I4R’s website—https://i4replication.org/publishing.html). In addition, we checked the websites of these 28 journals for whether their Aims & Scope or Guide for Authors state that replications or comments are considered for publication: nine of 28 journals do so. See Table A1 for journal specific details.) Survey among editors of leading Political Science journals. Notes: The exact phrasing of the questions were: (1) Do you publish comments in <insert journal name here>? By comment we mean a paper that discusses and potentially challenges the empirical results from another paper, for example, based on a reanalysis or additional robustness checks. (2) If yes, do you only publish comments on original papers that have previously been published in <insert journal name here> or do you also publish comments on original papers that have been published elsewhere?
Moreover, it is noteworthy that recently new opportunities to publish replications have emerged. Several journals, some new and some established, now prominently invite submissions of replications in their Aims & Scope (e.g., Research & Politics invites authors to consider submitting a paper that is along the lines of one or more of the following replication types: Theoretical replication: The submitted article argues that the original theoretical model is missing at least one key element. The missing element(s) are addressed and included in the empirical analysis. Technical replication: The submitted article identifies faults in the original research design or analysis, thereby arguing that the original results might not hold; and/or Concept replication: The submitted article questions the validity of the original study. An alternative measurement or operationalisation is proposed which yields different substantive results.
Although taking these steps represents progress, practical challenges hinder the widespread publication of replications. These obstacles include entrenched biases favoring the status quo and difficulties in securing reviewers willing to assess debates laden with intricate technical details, demanding substantial effort. Additionally, previous efforts by journals to signal a need for replications have not invariably translated into an increased supply of replication studies. In economics, the
Generating and promoting reproductions and replications
Several authors of this article founded the Institute for Replication to address the above issues by promoting and generating reproductions and replications on an ongoing basis. I4R’s main goals are to assess and improve the computational reproducibility of research and its replicability.
As of 2023, I4R reproduces and replicates studies published in the
To assist with the recruitment of replicators, I4R set up a board of editors from various research fields and with various institutional ties, thus allowing it to cast a very wide net. An editor’s task is specifically to identify potential replicators. The institute currently has a board of editors for economics, finance, and political science, which actively recruits and selects replicators for studies recently published in top journals in each field. Of note, replicators may be faculty members or graduate students.
I4R also recently developed replication games to generate reproductions and replications in political science. Replication games are meet-ups (“hackathons”) open to faculty, post-docs, graduate students, and other researchers. Participants join a small team and are asked to first computationally reproduce, then to carry out additional reproducibility analyses of a published paper or study in their field of interest. In practice, teams work during the event and the following weeks on testing the robustness of the results of a prior study using the same data but different analytical decisions than made by the original investigator. All replication reports are then combined into (mega) meta-papers, and all replicators are offered co-authorship.
Reproducibility and replicability in class
We argue that reproduction and replication of research by graduate students plays a pivotal role in upholding the integrity and credibility of scientific inquiry, laying the foundation for the advancement of knowledge. Reproducing and replicating the work of others is a fundamental and essential aspect of graduate education (Janz, 2016). Every year, students, and more generally, researchers around the world, carry out reproduction exercises, generating important pieces of new knowledge. Unfortunately, those reproductions and replications exercises are rarely publicly documented or rewarded. One recent platform developed to deal with this issue is the Social Science Reproduction Platform (SSRP). This resource standardizes and crowdsources exercises of computational reproducibility, and provides extensive guidance on how to carry out a reproduction exercise. First, students and researchers typically verify the existence of reproduction materials for an article. Second, they assess how reproducible these materials are. Third, they might make some improvements to these materials (from fixing file paths and libraries, to translating code into a different programming language). Finally, they often explore different specifications to see which results may or may not robustly hold.
How to make adversarial exchanges more collaborative
Reproductions and replications in academia can sometimes become adversarial. The process can potentially lead to tensions between the replicators and the original authors if the replication study fails to replicate the original findings (Laitin and Reich, 2017). This can occur due to various reasons such as differences in sample characteristics, variations in experimental conditions, or even methodological limitations of the replication study itself. When the replication results contradict the original findings, it may challenge the credibility and impact of the original study, leading to a defensive response from the original authors.
Another factor that can contribute to adversarial relations is that the original authors may perceive the replication as an attempt to undermine their work, and as a result, may respond defensively or dismissively, seeking to protect their intellectual contributions.
To mitigate adversarial dynamics, fostering open communication, transparency, and collaboration between replicators and original authors is crucial. I4R, for instance, deals with communication between original authors and replicators (Brodeur, 2023). By acting as an intermediary between authors and replicators, it helps researchers collectively contribute to a more robust and reliable body of knowledge and makes exchanges less adversarial. An additional approach for enhancing the efficiency of conflict resolution might involve embracing the framework of adversarial collaborations as proposed by Kahneman and Klein (2009) which are increasingly being used in different areas of the social sciences (e.g., Clark and Tetlock, 2023).
Incentives for replicators
Engaging in replication studies can carry potential negative consequences for the career of replicators. These consequences can arise from a variety of factors, including the prevailing limited incentives.
First, replicators may find it challenging to gain recognition and visibility for their work, as replication studies may be difficult to publish and are often less recognized by their peers. Additionally, replication studies can elicit negative reactions from original authors, as discussed earlier. If replicators challenge or refute the original findings, they may face criticism, or even personal attacks from the authors or their supporters. These adversarial interactions can create a hostile environment for replicators and potentially damage their professional relationships within the academic community.
Second, dedicating time and resources to replication studies may divert replicators’ attention from pursuing their original research agendas. The time spent replicating studies and addressing potential challenges can slow down their career progression and limit their ability to build a unique research portfolio.
To mitigate some of these negative consequences, we make the case for a growing recognition of the importance of replication studies within political science. One solution is to combine replications into large meta-papers. Being granted co-authorship to a meta-paper encourages researchers to replicate studies and changes incentives and the way replication is conducted. This is in part due to the fact that replicators work in teams and are not as pressured to show that the original findings are not replicable or robust. Moreover, meta-papers allow for estimating a replication rate within a discipline or subfield. Inferring the replication rate from published one-on-one reproductions and replications is not possible since problematic reproductions and replications are more inclined to be conducted in the first place and also more likely to be published. Last, one key editorial policy at I4R is that replicators may remain anonymous and still get co-authorship on a meta-paper. However, the identity of the replicator is known to the editorial board that vetted this person and their work.
Conclusion and recommendations
Leading political science journals have recently adopted innovative open science practices, incorporating policies that emphasize the availability of data and code, along with the inclusion of reproducibility analysts. We believe more journals ought to implement similar policies in a way that does not stifle creativity and that minimizes excess burden for researchers, editors and journal staff.
A key question going forward, which we have not addressed is “Which papers should be replicated?” We believe greater reflection is warranted on this matter. Interesting options include crowd forecasts to determine which papers are likely to run into replication failures. Crowdsourcing may also be helpful in prioritizing the papers that most demand replication (either because the papers are very important or because their findings are especially dubious).
We urge researchers, journal editors, and funders to start holding political science to higher open science standards, and supporting and facilitating the conduct and publication of replication and reproducibility studies. We make three recommendations: (1) We call on the (2) We urge the creation of an outlet dedicated to replications backed up by one of the large disciplinary professional associations (for which impact factor might not be a primary consideration). (3) We also recommend that more political science journals start using data editors to improve computational reproducibility.
Supplemental Material
Supplemental Material - Promoting Reproducibility and Replicability in Political Science
Supplemental Material for Promoting Reproducibility and Replicability in Political Science by Abel Brodeur, Kevin Esterling, Jörg Ankel-Peters, Natália S. Bueno, Scott Desposato, Anna Dreber, Federica Genovese, Donald P. Green, Matthew Hepplewhite, Fernando Hoces de la Guardia, Magnus Johannesson, Andreas Kotsadam, Edward Miguel, Yamil R. Velez and Lauren Young in Research and Politics
Footnotes
Declaration of conflicting interests
Funding
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
