Abstract
Understanding how useful any particular set of event data might be for conflict research requires appropriate methods for assessing validity when ground truth data about the population of interest do not exist. We argue that a total error framework can provide better leverage on these critical questions than previous methods have been able to deliver. We first define a total event data error approach for identifying 19 types of error that can affect the validity of event data. We then address the challenge of applying a total error framework when authoritative ground truth about the actual distribution of relevant events is lacking. We argue that carefully constructed gold standard datasets can effectively benchmark validity problems even in the absence of ground truth data about event populations. To illustrate the limitations of conventional strategies for validating event data, we present a case study of Boko Haram activity in Nigeria over a 3-month offensive in 2015 that compares events generated by six prominent event extraction pipelines—ACLED, SCAD, ICEWS, GDELT, PETRARCH, and the Cline Center’s SPEED project. We conclude that conventional ways of assessing validity in event data using only published datasets offer little insight into potential sources of error or bias. Finally, we illustrate the benefits of validating event data using a total error approach by showing how the gold standard approach used to validate SPEED data offers a clear and robust method for detecting and evaluating the severity of temporal errors in event data.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
