Sage Journals: Discover world-class research

Abstract

Clionadh Raleigh, Roudabeh Kishi, and Andrew Linke recently compared their own Armed Conflict Location and Events Dataset (ACLED) to three other conflict event datasets in Humanities & Social Sciences Communications. In this article, we investigate their claims about what drives differences between the two researcher led projects the Uppsala Conflict Data Program (UCDP) and ACLED. In the process, we address some general issues that arise in event data collection, including the importance of stable definitions, how demands on sourcing varies with the type of data collected, and how strategies for dealing with uncertainty and ambiguity impact the data. Contrary to the claims made in the target article, the differences between ACLED and UCDP in the cases put forth by the authors are not primarily due to differences in sourcing or inclusion thresholds. Analyzing the same cases, we show that most of the differences are due to auxiliary coding rules, standards for source evaluation and misrepresentations of UCDP data in the original article.

Keywords

Conflict data event data data collection sourcing ACLED UCDP

Introduction

In an article by Clionadh Raleigh, Roudabeh Kishi, and Andrew Linke (RKL for short) published in Humanities & Social Sciences Communications (Raleigh et al., 2023), the authors claim to engage in a systematic assessment of the validity and reliability of four different conflict event datasets. They start from the premise that all four datasets are aiming to measure the same thing as the Armed Conflict Location and Events Data Project (ACLED) and proceed with a series of comparisons based on assertions about the contents, definitions, coding rules, sourcing and methods of the Uppsala Conflict Data Program Georeferenced Event Dataset (UCDP GED) (see Öberg, 2025). They conclude that ACLED is a more valid measure of conflict because UCDP has stable definitions, higher fatality thresholds and more restrictive criteria for inclusion, and allegedly a limited sourcing strategy as compared to ACLED.

Comparing validity and reliability across datasets is not really possible without having access to ground truth in a large sample of cases. What we can do is investigate what drives differences across datasets, and whether in fact those differences are attributable to higher fatality thresholds, more restrictive inclusion criteria, and/or differences in sourcing. Thus, in this article, we make some more general points about what drives differences in the data provided by UCDP GED and ACLED under a variety of circumstances. We do this by revisiting all of the five empirical cases used in the original article (selected by RKL) and one additional case from an earlier ACLED report comparing data (Raleigh and Kishi, 2019).¹

What the evidence does suggest is that, when comparing like for like, differences between ACLED and UCDP data are driven more by how the projects deal with ambiguity and uncertainty in poor information environments. This accounts for most of the differences between ACLED and UCDP in the cases selected by RKL. Inclusion criteria plays some role (as it should), but there is really no evidence for what is one of RKL’s main claims: that UCDP sourcing strategy makes a difference. ACLED’s high reliance on partisan sources does make a substantial difference in some cases, indicating that different standards for source evaluation also drive some of the differences between UCDP and ACLED data.

Assessing validity and reliability

In the relatively rare cases that researchers have access to something approaching ground truth we may assess the validity and reliability of different data collection projects (cf Baum and Zhukov, 2015; Croicu and Kreutz, 2017; Davenport and Ball, 2002; Dawkins, 2021; Dietrich and Eck, 2020; Price and Ball, 2015; Weidmann, 2015, 2016). Having no data approximating ground truth RKL’s claims about the validity and reliability of different datasets rest on no empirical grounds, and their conceptual discussion of validity and reliability does not bolster their case (see Öberg, 2025: pp. 3–7).

What RKL does is to assess the other datasets relative to ACLED data (as if it was the ground truth) based on the incorrect premise that they all aim to measure the same thing as ACLED measures: political violence very broadly defined (Raleigh et al., 2023: p. 3; see also Öberg, 2025). However, what UCDP GED aims to measure is state-based armed conflict, non-state armed conflict, and one-sided violence and thus it’s validity and reliability should be assessed against how well and how reliably it measures these three things. If RKL had used standard definitions of reliability and validity and applied them in the conventional way to assess empirically how well a dataset measures what it aims to measure, their conclusions would not follow.

The critique of stable definitions

A major theme in RKL’s critique of UCDP GED is that UCDP applies its definitions consistently over time and across cases. Pointing to the UCDP RKL claims that for datasets that collect data on a “wider, comprehensive remit, prioritizing stasis in definitions of a shifting phenomenon like political violence, or centering sources that are only intermittently stable, creates invalid and systemic biases…” (Raleigh et al., 2023: p. 3). The “centering sources that are only intermittently stable” is not a true claim about UCDP. Invalid bias has no known meaning. Systemic bias has a meaning, but it is not clear how it applies here except as a pejorative. It is true that UCDP has stable definitions and apply them rigorously, but stasis in definitions is normally considered a cardinal virtue in measurement. Having yards of varying lengths is not a hallmark good measurement, but RKL seem to imply that because political violence is “a shifting phenomenon” we should have an adjustable yardstick. If a data collection project applies its definitions flexibly, shifting across cases and over time, how would one even assess if the measure is reliable or valid? Would it even be possible to measure trends in “political conflict” if what is measured at one time and place is not the same as what is measured at another time or place?

Sourcing

The first thing to note when discussing sourcing is that the sourcing requirements vary greatly across datasets and are a function of the exact type of data being collected. For example, the sourcing requirements for collecting data on interstate wars, as in the Correlates of War project (Singer and Small, 1972), are far less demanding than collecting data on lethal events in armed conflicts like the UCDP GED (Sundberg and Melander, 2013), which in turn is far less demanding of sources than collecting data on, for example, “…individuals and groups who peacefully demonstrate against a political entity, government institution, policy, group, tradition, businesses or other private institutions” (ACLED, 2021: p. 13). The former is less demanding than the latter because of how information about events typically propagate through various information channels and the news ecosystem (Cf. Galtung and Ruge, 1965).

In general, the threshold for reporting on events goes down, and the granularity of the reporting goes up, the closer the reporting actor is to the event, and so does the likelihood of partisan bias (Demarest and Langer, 2018, 2022; Öberg and Sollenberg, 2011). The actors closest to an event are the actors participating in the events, and bystanders who themselves directly observe the events. They are the first-hand sources. News organizations (and other violence reporting organizations), sometimes have reporters on the ground that are first-hand sources, but more often they get their information from first-hand sources or by relaying what other news organizations who have first-hand sources are reporting. So, news organizations who do not have to have their own reporters on the ground often still report on an event. News about events propagate up to national and international levels if news organizations at these levels think that their respective audiences are interested (Öberg and Sollenberg, 2011). Local media serving local audiences have lower thresholds for what is a newsworthy conflict event, but they are also more likely to have a bias in the conflict. Thus, for event data collectors the need for local sourcing is a function of how large the audience is for the type of events the project is collecting data on. Deadly events that are part of large scale organized violent conflicts have much larger audiences than peaceful protests about local issues and so the informational demands on datasets that aim to collect data on the latter (i.e., ACLED) are orders of magnitude higher than the demands on datasets that collect data on the former (i.e., UCDP GED).

The second thing to note is that it is difficult to measure how sourcing affects the missingness and bias in conflict datasets because researchers only rarely have access to anything approximating ground truth that can be used as a benchmark. Studies that do compare conflict event data to direct observation data (approaching ground truth), show that non-lethal conflict events suffer from serious underreporting and also that lethal events are more likely to propagate up the news food chain (cf Croicu and Eck 2022; Demarest and Langer 2018). Croicu and Eck (2022) furthermore suggest that among non-lethal conflict events, the less coercive events are even less likely to be reported and hence picked up by event datasets. The difficulty of collecting data on low scale events like protests may also be reinforced by coder sampling error (Demarest and Langer 2022:648).

Recent research comparing ground truth data to ACLED data is suggestive of how much more demanding the information requirements are for collecting data on non-violent conflict events. Using data collected by the UNAMID Joint Mission Analysis Center in Darfur, Croicu and Eck do not find a single overlap between troop movement events registered by UNAMID JMAC and troop movement events registered by ACLED (Croicu and Eck 2022:464-465).

The differences in the demands on information implied by the different types of events collected by ACLED compared to UCDP GED is not something RKL discuss in their article. They focus instead on the sourcing strategies and practices describing how ACLED use multiple forms of media and information, including new media, local language sources, local source networks, and integrating new sources when available (Raleigh et al., 2023: p. 10). All the while they incorrectly allege that UCDP does not do these things and instead rely on English language newswires and traditional media, using a fixed set of sources and so missing out on new sources (Raleigh et al., 2023: pp. 10–11). In fact, the UCDP’s sourcing strategy is much the same as ACLED’s, only the UCDP relies less on sources belonging to warring parties reflecting differences in standards for source evaluation. Looking at the sources ACLED actually used in recent times one study finds that 77% of their sources were news media and only 6% were what they classify as local partners (Croicu, 2025: p. 36 footnote 20). We find that since 2015 over 75% of all events in ACLED used traditional media sources and slightly less than 10% used local partners.

Sourcing strategies or source evaluation standards?

After portraying UCDP GED as being based on what they term ‘traditional media’, RKL illustrates how dramatically different a map of conflict events in Syria in 2017, looks if in addition to what they call “traditional media,” one adds “new media,” “local partners,” and “other” sources. The contrasting maps both use ACLED data (Raleigh et al., 2023: p. 11 Figure 2). Had they compared to actual UCDP data and sourcing, they would have found that UCDP GED used all the sources they mention as sources ACLED used (Raleigh et al., 2023: p. 11) and produced a similar map as shown in Figure 1. Thus, this example provides no support for RKL’s claims about UCDP sourcing producing a significantly different or distorted picture.

Figure 1.

Syria in 2017: UCDP GED vs ACLED

One difference in sourcing practices that RKL does not discuss but which sometimes do affect the data substantially is how the projects deal with partisan sources, including sources controlled by warring parties and the warring parties themselves. Partisan sources may provide valuable information, but they have a clear bias and for example fatality figures provided by warring parties cannot be taken at face value. Differences in source evaluation standards are evident in the case of Yemen where ACLED claims UCDP is severely underreporting the fatality rates in 2015-2018, allegedly because of poor sourcing practices compared to ACLED (Raleigh and Kishi, 2019: pp. 20–22). Figure 2 below displays the number of fatalities per source in Yemen 2015-2018, comparing UCDP GED in the top panel to ACLED in the bottom. It shows that ACLED registers 64 315 more fatalities in Yemen during this time period and that in this case (unlike in Syria above) UCDP relies more on wire services than ACLED do. So does that account for the difference in fatalities? No, most of the difference is accounted for by differences in how the two projects apply source evaluation and deal with vague numbers (Figure 4 below).²

Figure 2.

Fatalities per source in Yemen 2015–2018, UCDP GED vs ACLED.

More than one third of the difference in fatality numbers between UCDP GED and ACLED in Yemen 2015-2018 is assignable to sources belonging to one of the warring parties. The Yemen News Agency SABA is Houthi controlled and Ansar Allah is the Houthi movement. The Yemen News Agency SABA is by far the most important source for ACLED fatalities. It is also among the sources read by UCDP, but the UCDP relies on it far more sparingly, suggesting that a difference in source evaluation standards may explain as much as one third of the difference in fatalities between the UCDP GED and ACLED. In sum, these cases were selected by ACLED to showcase how ACLED’s allegedly superior sourcing strategies make a difference in the data. We find instead that sourcing strategies are not significantly different, but source evaluation standards are a significant source of difference in the data. Next, we turn to RKL’s claim that inclusion thresholds and criteria produce differences between UCDP and ACLED data.

Inclusion thresholds and criteria or rules for dealing with ambiguity and uncertainty?

Definitions are always important for how events are classified, but whether they are important for what is picked up depends on the information environment. The importance of conflict definitions in general, and fatalities thresholds in particular, maximizes in information rich environments where coverage is good, reporting thresholds low and information unambiguous. This is rarely the situation in conflict ridden countries. By contrast, in poorer information environments various auxiliary coding rules and source evaluation standards explain most of the differences. In the Syrian example above more than 90% of the 15 849 difference in fatality count between ACLED and UCDP is explained by a single auxiliary coding rule for dealing with vague fatality numbers. When a report states, for example, that “a vehicle was blown up killing the people inside” UCDP counts 2 fatalities while ACLED’s codebook suggests counting 10 fatalities for the same event (ACLED, 2021: p. 32).³ Figure 3 shows the number of events on the Y-axis and the number of fatalities per event on the X-axis, illustrating quite clearly how this auxiliary coding rule accounts for almost all of the difference in fatality counts between UCDP and ACLED in Syria in 2017. Thus, it is not sourcing as RKL implies, nor is it conflict definitions or inclusion thresholds, that explains the difference.

Figure 3.

Syria 2017: UCDP vs ACLED number of events(Y) by number of fatalities per event(X).

In the case of Yemen 2015-2018 we see a similar pattern (Figure 4). More than one third of the difference in fatalities between UCDP GED and ACLED can be attributed to the auxiliary coding rule used for vague numbers, where ACLED counts 10 fatalities.

Figure 4.

Yemen 2015–2018: UCDP GED vs ACLED number of fatalities (Y) by number of fatalities per event(X).

Another case where RKL suggests UCDP distorts the picture, this time by undercounting civilian fatalities, is Mexico in 2021. Here, differences in categorizations are driven by how ACLED and UCDP deal with a different kind of ambiguity. ACLED records 6739 civilians, 81% of all fatalities, while RKL claim that UCDP GED only records 28 civilian fatalities (Raleigh et al., 2023: p. 2). This is based on a misunderstanding of UCDP data. UCDP reports 110 civilian fatalities, 262 gang member fatalities, and close to 15 000 unknown fatalities. In Mexico sources rarely contain information about the identity of the fatalities in organized crime violence. Under these circumstances UCDP defaults to code fatalities as battle related deaths in the unknown category, while ACLED defaults to civilian fatalities most cases because ACLED’s classification is based on the characteristics of the event and not the identity of the victim as in UCDP (ACLED, 2021: p. 12).

Note also that the UCDP records more than twice as many fatalities as ACLED in Mexico 2021, in spite of RKL’s claims about UCDP making violence unseen due to higher inclusion threshold, consistent application of more restrictive definitions, and an allegedly poorer sourcing strategy. The same is true for the Pokot case in Kenya between 1997 and 2020 which RKL incorrectly claims UCDP missed entirely due to fatality thresholds (Raleigh et al., 2023: p. 8). In fact, in the Pokot case UCDP GED registers more than twice as many fatalities as ACLED in the same time period.⁴

Another example RKL use in their article is the Philippines in 2020. Again, they misrepresent UCDP GED data in the maps showing events in the Philippines in 2020 (Raleigh et al., 2023: p. 10). Here RKL compare a combination of non-fatal and fatal events from ACLED in 2020 to fatal events only from UCDP GED in 2019, but fail to acknowledge either difference between the two panels in the figure. Instead, they misleadingly use UCDP GED categorizations in the figure legends for both panels—as if they were measuring the same thing.⁵ Figure 5 reproduces the maps but with fatal events only and the data from correct year in both panels. The differences are smaller than RKL’s maps suggest, but the Philippines in 2020 is a case where the differences between UCDP and ACLED are to a significant extent due to ACLED’s broader definition of political violence. Even so, more than half of the difference can still be directly attributed to differences in how the two projects deal with uncertainty—in this case about actor identity. ACLED allows events where the actor identity is uncertain (ACLED, 2021: pp. 23–24), UCDP does not. In Duterte’s war on drugs the identities of the actors and victims were often uncertain, something future International Criminal Court proceedings could help clear up.

Figure 5.

The Philippines in 2020: UCDP GED vs ACLED.

The final case in the RKL article is Madagascar in 2018 and 2020 where ACLED registers a few fatal events. The relevant incidents are recorded in UCDP servers, in 2018 registering a conflict between the Government of Madagascar and the opponents of then-President Hery Rajaonarimampianina, demanding the resignation of the president. This fits the UCDP definition of violent political protest event (i.e., in UCDP but not UCDP GED, see Svensson et al., 2022) albeit failing to reach the 25-fatalities threshold. The 2020 incidents are registered as Government of Madagascar - Malagasy Cattle Rustlers with 25 fatalities but no stated incompatibility. Thus, in the Madagascar case ACLED and UCDP GED differ due to threshold and inclusion criteria.

Conclusions

Based on the empirical cases above, RKL make large claims about how using definitions rigorously (like UCDP) make violence disappear, deaths vanish, and conflicts become unseen (Raleigh et al., 2023: p. 13). The evidence put forth in the article by RKL provide no grounds for such claims considering UCDP registers significantly more fatalities than ACLED does in two of the cases they selected to make their points (Mexico, Pokot). In the three other cases most of the difference is attributable to auxiliary coding rules for dealing with ambiguity, and/or source evaluation standards (Syria, Yemen, Indonesia). Users will have to decide which source evaluation standards, auxiliary coding rules and practices are more reasonable. Only in one case, Madagascar 2018-2020 is UCDP’s inclusion threshold and criteria the main reason for the difference between UCDP and ACLED.

The claims RKL make about sourcing practices, consequences of definitional thresholds and inclusion criteria, could and should have been evaluated systematically and empirically. They were not. By revisiting their claims and the empirical cases they selected we hope to have shown why and how their claims are flawed while also contributing to the understanding of what actually drives differences between the two human-led event datasets, UCDP and ACLED. The way the two datasets handle ambiguities, uncertainty, and source evaluation explains most of the difference across these cases—and likely in most cases.

Footnotes

Acknowledgments

We are grateful to Therese Petersson,Lotta Themnér,Stina Högbladh,Garoun Engström,Margareta Sollenberg and two anonymous reviewers for comments and suggestions on earlier versions.

Declaration of conflicting interests

The author(s) declared the following potential conflicts of interest with respect to the research,authorship,and/or publication of this article: Both authors are affiliated with the Uppsala Conflict Data Program,one of the data programs discussed in the article.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: This work was supported by the Vetenskapsrådet;2021-00162.

ORCID iD

Magnus Öberg

Supplemental Material

Supplemental material for this article is available online.

The replication files are available at:

Carnegie Corporation of New York Grant

This publication was made possible (in part) by a grant from the Carnegie Corporation of New York. The statements made and views expressed are solely the responsibility of the author.

Notes

References

ACLED . (2021). Armed conflict location & event data project (ACLED): codebook (version 1). Madison: ACLED.

Baum

Zhukov

(2015) Filtering revolution: reporting bias in international newspaper coverage of the Libyan civil war. Journal of Peace Research 52(3): 384–400.

Croicu

(2025) Forecasting battles: new machine learning methods for predicting armed conflict. https://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-545176

Croicu

Eck

(2022) Reporting of non-fatal conflict events. International Interactions 48(3): 450–470.

Croicu

Kreutz

(2017) Communication technology and reports on political violence: cross-national evidence using african events data. Political Research Quarterly 70(1): 19–31.

Davenport

Ball

(2002) Views to a kill: exploring the implications of source selection in the case of Guatemalan state terror, 1977-1995. Journal of Conflict Resolution 46(3): 427–450.

Dawkins

(2021) The problem of the missing dead. Journal of Peace Research 58(5): 1098–1116.

Demarest

Langer

(2018) The study of violence and social unrest in Africa: a comparative analysis of three conflict event datasets. African Affairs 117(467): 310–325.

Demarest

Langer

(2022) How events enter (or not) data sets: the pitfalls and guidelines of using newspapers in the study of conflict. Sociological Methods & Research 51(2): 632–666.

10.

Dietrich

Eck

(2020) Known unknowns: media bias in the reporting of political violence. International Interactions 46(6): 1043–1060.

11.

Galtung

Ruge

(1965) The structure of foreign news: the presentation of the Congo, Cuba and Cyprus crises in four Norwegian newspapers. Journal of Peace Research 2(1): 64–90.

12.

Öberg

(2025) Measurement Sophistry: a Response to “Political instability patterns are obscured by conflict dataset scope conditions, sources, and coding choices”. In: Humanities & Social Sciences Communications. Uppsala: Uppsala University. https://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-553982

13.

Öberg

Sollenberg

(2011) Gathering conflict information using news resources. In: Höglund

Öberg

(eds) Understanding Peace Research: Methods and Challenges. Oxfordshire: Routledge.

14.

Price

Ball

(2015) Selection bias and the statistical patterns of mortality in conflict. Statistical Journal of the IAOS 31(2): 263–272.

15.

Raleigh

Kishi

(2019) Comparing conflict data: similarities and differences across conflict datasets. acleddata.com. https://acleddata.com/2019/08/01/working-paper-comparing-conflict-data/

16.

Raleigh

Kishi

Linke

(2023) Political instability patterns are obscured by conflict dataset scope conditions, sources, and coding choices. Humanities and Social Sciences Communications 10(1): 74.

17.

Singer

Small

(1972) The Wages of War, 1816-1965: A Statistical Handbook. Hoboken: Wiley.

18.

Sundberg

Melander

(2013) Introducing the UCDP georeferenced event dataset. Journal of Peace Research 50(4): 523–532.

19.

Svensson

Schaftenaar

Allansson

(2022) Violent political protest: introducing a new Uppsala conflict data Program data set on organized violence, 1989-2019. Journal of Conflict Resolution 66(9): 1703–1730.

20.

Weidmann

(2015) On the accuracy of media-based conflict event data. Journal of Conflict Resolution 59(6): 1129–1149.

21.

Weidmann

(2016) A closer look at reporting bias in conflict event data. American Journal of Political Science 60(1): 206–218.