Abstract
Introduction
In an article by Clionadh Raleigh, Roudabeh Kishi, and Andrew Linke (RKL for short) published in
Comparing validity and reliability across datasets is not really possible without having access to ground truth in a large sample of cases. What we can do is investigate what drives differences across datasets, and whether in fact those differences are attributable to higher fatality thresholds, more restrictive inclusion criteria, and/or differences in sourcing. Thus, in this article, we make some more general points about what drives differences in the data provided by UCDP GED and ACLED under a variety of circumstances. We do this by revisiting all of the five empirical cases used in the original article (selected by RKL) and one additional case from an earlier ACLED report comparing data (Raleigh and Kishi, 2019). 1
What the evidence does suggest is that, when comparing like for like, differences between ACLED and UCDP data are driven more by how the projects deal with ambiguity and uncertainty in poor information environments. This accounts for most of the differences between ACLED and UCDP in the cases selected by RKL. Inclusion criteria plays some role (as it should), but there is really no evidence for what is one of RKL’s main claims: that UCDP sourcing strategy makes a difference. ACLED’s high reliance on partisan sources does make a substantial difference in some cases, indicating that different standards for source evaluation also drive some of the differences between UCDP and ACLED data.
Assessing validity and reliability
In the relatively rare cases that researchers have access to something approaching ground truth we may assess the validity and reliability of different data collection projects (cf Baum and Zhukov, 2015; Croicu and Kreutz, 2017; Davenport and Ball, 2002; Dawkins, 2021; Dietrich and Eck, 2020; Price and Ball, 2015; Weidmann, 2015, 2016). Having no data approximating ground truth RKL’s claims about the validity and reliability of different datasets rest on no empirical grounds, and their conceptual discussion of validity and reliability does not bolster their case (see Öberg, 2025: pp. 3–7).
What RKL does is to assess the other datasets relative to ACLED data (as if it was the ground truth) based on the incorrect premise that they all aim to measure the same thing as ACLED measures: political violence very broadly defined (Raleigh et al., 2023: p. 3; see also Öberg, 2025). However, what UCDP GED aims to measure is state-based armed conflict, non-state armed conflict, and one-sided violence and thus it’s validity and reliability should be assessed against how well and how reliably it measures these three things. If RKL had used standard definitions of reliability and validity and applied them in the conventional way to assess empirically how well a dataset measures what it aims to measure, their conclusions would not follow.
The critique of stable definitions
A major theme in RKL’s critique of UCDP GED is that UCDP applies its definitions consistently over time and across cases. Pointing to the UCDP RKL claims that for datasets that collect data on a “wider, comprehensive remit, prioritizing stasis in definitions of a shifting phenomenon like political violence, or centering sources that are only intermittently stable, creates invalid and systemic biases…” (Raleigh et al., 2023: p. 3). The “centering sources that are only intermittently stable” is not a true claim about UCDP. Invalid bias has no known meaning. Systemic bias has a meaning, but it is not clear how it applies here except as a pejorative. It is true that UCDP has stable definitions and apply them rigorously, but stasis in definitions is normally considered a cardinal virtue in measurement. Having yards of varying lengths is not a hallmark good measurement, but RKL seem to imply that because political violence is “a shifting phenomenon” we should have an adjustable yardstick. If a data collection project applies its definitions flexibly, shifting across cases and over time, how would one even assess if the measure is reliable or valid? Would it even be possible to measure trends in “political conflict” if what is measured at one time and place is not the same as what is measured at another time or place?
Sourcing
The first thing to note when discussing sourcing is that the sourcing requirements vary greatly across datasets and are a function of the exact type of data being collected. For example, the sourcing requirements for collecting data on interstate wars, as in the Correlates of War project (Singer and Small, 1972), are far less demanding than collecting data on lethal events in armed conflicts like the UCDP GED (Sundberg and Melander, 2013), which in turn is far less demanding of sources than collecting data on, for example, “…individuals and groups who peacefully demonstrate against a political entity, government institution, policy, group, tradition, businesses or other private institutions” (ACLED, 2021: p. 13). The former is less demanding than the latter because of how information about events typically propagate through various information channels and the news ecosystem (Cf. Galtung and Ruge, 1965).
In general, the threshold for reporting on events goes down, and the granularity of the reporting goes up, the closer the reporting actor is to the event,
The second thing to note is that it is difficult to measure how sourcing affects the missingness and bias in conflict datasets because researchers only rarely have access to anything approximating ground truth that can be used as a benchmark. Studies that do compare conflict event data to direct observation data (approaching ground truth), show that non-lethal conflict events suffer from serious underreporting and also that lethal events are more likely to propagate up the news food chain (cf Croicu and Eck 2022; Demarest and Langer 2018). Croicu and Eck (2022) furthermore suggest that among non-lethal conflict events, the less coercive events are even less likely to be reported and hence picked up by event datasets. The difficulty of collecting data on low scale events like protests may also be reinforced by coder sampling error (Demarest and Langer 2022:648).
Recent research comparing ground truth data to ACLED data is suggestive of how much more demanding the information requirements are for collecting data on non-violent conflict events. Using data collected by the UNAMID Joint Mission Analysis Center in Darfur, Croicu and Eck do not find a single overlap between troop movement events registered by UNAMID JMAC and troop movement events registered by ACLED (Croicu and Eck 2022:464-465).
The differences in the demands on information implied by the different types of events collected by ACLED compared to UCDP GED is not something RKL discuss in their article. They focus instead on the sourcing strategies and practices describing how ACLED use multiple forms of media and information, including new media, local language sources, local source networks, and integrating new sources when available (Raleigh et al., 2023: p. 10). All the while they incorrectly allege that UCDP does not do these things and instead rely on English language newswires and traditional media, using a fixed set of sources and so missing out on new sources (Raleigh et al., 2023: pp. 10–11). In fact, the UCDP’s sourcing strategy is much the same as ACLED’s, only the UCDP relies less on sources belonging to warring parties reflecting differences in standards for source evaluation. Looking at the sources ACLED actually used in recent times one study finds that 77% of their sources were news media and only 6% were what they classify as local partners (Croicu, 2025: p. 36 footnote 20). We find that since 2015 over 75% of all events in ACLED used traditional media sources and slightly less than 10% used local partners.
Sourcing strategies or source evaluation standards?
After portraying UCDP GED as being based on what they term ‘traditional media’, RKL illustrates how dramatically different a map of conflict events in Syria in 2017, looks if in addition to what they call “traditional media,” one adds “new media,” “local partners,” and “other” sources. The contrasting maps both use ACLED data (Raleigh et al., 2023: p. 11 Figure 2). Had they compared to actual UCDP data and sourcing, they would have found that UCDP GED used Syria in 2017: UCDP GED vs ACLED
One difference in sourcing practices that RKL does not discuss but which sometimes do affect the data substantially is how the projects deal with partisan sources, including sources controlled by warring parties and the warring parties themselves. Partisan sources may provide valuable information, but they have a clear bias and for example fatality figures provided by warring parties cannot be taken at face value. Differences in source evaluation standards are evident in the case of Yemen where ACLED claims UCDP is severely underreporting the fatality rates in 2015-2018, allegedly because of poor sourcing practices compared to ACLED (Raleigh and Kishi, 2019: pp. 20–22). Figure 2 below displays the number of fatalities per source in Yemen 2015-2018, comparing UCDP GED in the top panel to ACLED in the bottom. It shows that ACLED registers 64 315 more fatalities in Yemen during this time period and that in this case (unlike in Syria above) UCDP relies more on wire services than ACLED do. So does that account for the difference in fatalities? No, most of the difference is accounted for by differences in how the two projects apply source evaluation and deal with vague numbers (Figure 4 below).
2
Fatalities per source in Yemen 2015–2018, UCDP GED vs ACLED.
More than one third of the difference in fatality numbers between UCDP GED and ACLED in Yemen 2015-2018 is assignable to sources belonging to one of the warring parties. The Yemen News Agency SABA is Houthi controlled and Ansar Allah is the Houthi movement. The Yemen News Agency SABA is by far the most important source for ACLED fatalities. It is also among the sources read by UCDP, but the UCDP relies on it far more sparingly, suggesting that a difference in source evaluation standards may explain as much as one third of the difference in fatalities between the UCDP GED and ACLED. In sum, these cases were selected by ACLED to showcase how ACLED’s allegedly superior sourcing strategies make a difference in the data. We find instead that sourcing strategies are not significantly different, but source evaluation standards are a significant source of difference in the data. Next, we turn to RKL’s claim that inclusion thresholds and criteria produce differences between UCDP and ACLED data.
Inclusion thresholds and criteria or rules for dealing with ambiguity and uncertainty?
Definitions are always important for how events are classified, but whether they are important for what is picked up depends on the information environment. The importance of conflict definitions in general, and fatalities thresholds in particular, maximizes in information rich environments where coverage is good, reporting thresholds low and information unambiguous. This is rarely the situation in conflict ridden countries. By contrast, in poorer information environments various auxiliary coding rules and source evaluation standards explain most of the differences. In the Syrian example above more than 90% of the 15 849 difference in fatality count between ACLED and UCDP is explained by a single auxiliary coding rule for dealing with vague fatality numbers. When a report states, for example, that “a vehicle was blown up killing the people inside” UCDP counts 2 fatalities while ACLED’s codebook suggests counting 10 fatalities for the same event (ACLED, 2021: p. 32).
3
Figure 3 shows the number of events on the Syria 2017: UCDP vs ACLED number of events(Y) by number of fatalities per event(X).
In the case of Yemen 2015-2018 we see a similar pattern (Figure 4). More than one third of the difference in fatalities between UCDP GED and ACLED can be attributed to the auxiliary coding rule used for vague numbers, where ACLED counts 10 fatalities. Yemen 2015–2018: UCDP GED vs ACLED number of fatalities (Y) by number of fatalities per event(X).
Another case where RKL suggests UCDP distorts the picture, this time by undercounting civilian fatalities, is Mexico in 2021. Here, differences in categorizations are driven by how ACLED and UCDP deal with a different kind of ambiguity. ACLED records 6739 civilians, 81% of all fatalities, while RKL claim that UCDP GED only records 28 civilian fatalities (Raleigh et al., 2023: p. 2). This is based on a misunderstanding of UCDP data. UCDP reports 110 civilian fatalities, 262 gang member fatalities, and close to 15 000 unknown fatalities. In Mexico sources rarely contain information about the
Note also that the UCDP records more than twice as many fatalities as ACLED in Mexico 2021, in spite of RKL’s claims about UCDP making violence unseen due to higher inclusion threshold, consistent application of more restrictive definitions, and an allegedly poorer sourcing strategy. The same is true for the Pokot case in Kenya between 1997 and 2020 which RKL incorrectly claims UCDP missed entirely due to fatality thresholds (Raleigh et al., 2023: p. 8). In fact, in the Pokot case UCDP GED registers more than twice as many fatalities as ACLED in the same time period. 4
Another example RKL use in their article is the Philippines in 2020. Again, they misrepresent UCDP GED data in the maps showing events in the Philippines in 2020 (Raleigh et al., 2023: p. 10). Here RKL compare a combination of The Philippines in 2020: UCDP GED vs ACLED.
The final case in the RKL article is Madagascar in 2018 and 2020 where ACLED registers a few fatal events. The relevant incidents are recorded in UCDP servers, in 2018 registering a conflict between the Government of Madagascar and the opponents of then-President Hery Rajaonarimampianina, demanding the resignation of the president. This fits the UCDP definition of violent political protest event (i.e., in UCDP but not UCDP GED, see Svensson et al., 2022) albeit failing to reach the 25-fatalities threshold. The 2020 incidents are registered as Government of Madagascar - Malagasy Cattle Rustlers with 25 fatalities but no stated incompatibility. Thus, in the Madagascar case ACLED and UCDP GED differ due to threshold and inclusion criteria.
Conclusions
Based on the empirical cases above, RKL make large claims about how using definitions rigorously (like UCDP) make violence disappear, deaths vanish, and conflicts become unseen (Raleigh et al., 2023: p. 13). The evidence put forth in the article by RKL provide no grounds for such claims considering UCDP registers significantly more fatalities than ACLED does in two of the cases they selected to make their points (Mexico, Pokot). In the three other cases most of the difference is attributable to auxiliary coding rules for dealing with ambiguity, and/or source evaluation standards (Syria, Yemen, Indonesia). Users will have to decide which source evaluation standards, auxiliary coding rules and practices are more reasonable. Only in one case, Madagascar 2018-2020 is UCDP’s inclusion threshold and criteria the main reason for the difference between UCDP and ACLED.
The claims RKL make about sourcing practices, consequences of definitional thresholds and inclusion criteria, could and should have been evaluated systematically and empirically. They were not. By revisiting their claims and the empirical cases they selected we hope to have shown why and how their claims are flawed while also contributing to the understanding of what actually drives differences between the two human-led event datasets, UCDP and ACLED. The way the two datasets handle ambiguities, uncertainty, and source evaluation explains most of the difference across these cases—and likely in most cases.
