Abstract
Risk assessment is an essential task in the criminal justice system. Interventions to manage or reduce risk to reoffend are most effective when they are proportionate and tailored to an individual’s risk to reoffend, following the risk and need principles of the risk/need/responsivity model of effective correctional practices (Bonta & Andrews, 2017). Lower intensity interventions are not sufficient for high risk individuals and higher intensity interventions are inefficient and can have negative effects on low risk individuals (Andrews & Dowden, 2006; Lovins et al., 2009; Lowenkamp & Latessa, 2005). Consequently, while it is important to prioritize high risk individuals, we must ensure low risk people are not overmanaged.
The development, validation, and implementation of risk assessment tools to predict sexual offense recidivism have proliferated in the last 30 years (Hanson & Morton-Bourgon, 2009; Kelley et al., 2020). However, given the recency of widespread internet technologies and the length of time required to conduct recidivism studies with adequate follow-up, there is comparably less research and fewer risk assessment tools available for individuals convicted of internet sex offenses, particularly related to the possession and distribution of child sexual exploitation materials (CSEM; also sometimes referred to as child pornography or indecent images involving children). This gap in research and knowledge requires attention because CSEM offending in particular has increased rapidly (Allen, 2016; Statistics Canada, 2021).
Risk Assessment for Men With CSEM Offenses
Virtually all empirically based risk assessment tools for sexual recidivism were developed with samples of men who had few to no CSEM offenses in their criminal history because the sampling timeframes predated the widespread adoption of the internet, which is itself linked to the rapid increase in CSEM-related offending (Martin, 2021; Seto, 2013). For example, the original Static-99 and Risk Matrix were developed using individuals released between 1959 and the early 1990s (Hanson & Thornton, 2000; Thornton et al., 2003). However, lack of inclusion in the development research does not necessarily require separate risk tools for this population. The 2014 Standards for Educational and Psychological Testing from the Joint Committee of the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education (hereafter referred to as the Joint Committee) provide guidance for applying existing assessment tools to new populations. Although not every possible subgroup or unique case requires separate validation research, the Joint Committee (2014) emphasized that test development and validation must consider relevant subgroups (see Chapter 3).
The question then becomes whether individuals with CSEM offenses are a relevant subgroup requiring separate validation research and potentially unique risk tools. Early research found these individuals are often at lower risk to sexually reoffend than those who committed contact sexual offenses (Seto, 2013). There is variation, however, as those with both CSEM and offline sexual offenses have higher rates of reoffense compared with individuals with only CSEM offenses (Babchishin et al., 2015, 2022; Seto et al., 2011). In addition, individuals with CSEM offenses also differ in demographic and psychological characteristics compared with those with contact sexual offenses, for example, being more likely to show evidence of atypical sexual interests such as pedophilia (for reviews, see Babchishin et al., 2015, 2018; Henshaw et al., 2020).
To our knowledge, only one risk assessment tool has been developed and validated to predict reoffense among individuals with CSEM offenses: the Child Pornography Offender Risk Tool (Seto & Eke, 2015). However, generic risk assessment tools have been examined for this offense group as well, including the Risk Matrix 2000 (Thornton et al., 2003, 2023) The Risk Matrix 2000 is routinely used in the United Kingdom and has been examined in several risk assessment studies following individuals who have committed CSEM offenses. Research on both tools will be discussed in turn. Overall, there is insufficient research on the applicability of generic sexual recidivism risk tools for those whose only sexual offenses are CSEM offenses. Moreover, for those with mixed online and offline sexual offenses, it is unknown whether CSEM-specific or generic sexual recidivism risk tools are more accurate.
Child Pornography Offender Risk Tool
The Child Pornography Offender Risk Tool (CPORT) is a risk assessment tool with seven items: age at the time of the index investigation (35 or younger), any prior criminal history, any failure on conditional release, any contact sexual offending, indication (admission or diagnosis) of sexual interest in children, more boy than girl content in child pornography, and more boy than girl content in other child-related materials (e.g., images of nude or partially clothed children). It can be scored primarily using criminal history and police investigative data (Eke et al., 2018). In the Canadian development study (Seto & Eke, 2015), the CPORT had a large effect size in predicting sexual recidivism for the overall sample (area under the curve [AUC] = .74,
Risk Matrix 2000
The Risk Matrix 2000 (RM2000) is an actuarial static risk scale for adult men convicted of sexual offenses (Thornton et al., 2003, 2023). The RM2000 consists of three scales: Sex (designed to predict sexual recidivism), Violent (designed to predict nonsexual violent recidivism), and Combined (designed to predict any violent recidivism, including sexual). This study will consider only the Sex scale (RM2000/S), which has seven items across two steps of coding, resulting in placement in one of four risk levels. Unlike Static-99R, which is not intended to be used for individuals whose sole sexual offense history relates to possessing or distributing CSEM materials (Phenix et al., 2016), RM2000 does not have this exclusion and the coding manual includes guidance for scoring CSEM cases (Thornton et al., 2023).
Three studies have examined predictive accuracy of the RM2000/S for internet CSEM offenses in the United Kingdom (Barnett et al., 2010; Elliott et al., 2019; Wakeling et al., 2011). However, there is considerable overlap among these samples, with Wakeling et al. (2011) subsuming most (if not all) of the other two samples. Similar to the CPORT (Seto & Eke, 2015), although the RM2000/S predicted reasonably well for the full sample with internet offenses, accuracy was notably reduced when divided into subgroups based on noninternet sexual offending history. This demonstrates the effects of reduced heterogeneity on accuracy (Howard, 2017).
In addition, RM2000/S only distinguishes between four risk levels: below average, average, above average, and well above average. The group with internet sex offenses solely had virtually no cases at the two highest risk levels, and the group with contact and noncontact sexual offenses had virtually no cases in the low risk level. This supports Babchishin et al.’s (2015) meta-analytic findings that individuals with mixed online/offline sex offenses are meaningfully higher risk than individuals with online sex offenses only.
Purpose of Current Study
The CPORT is the only risk tool specifically developed to predict sexual recidivism among men convicted for CSEM offenses. It is new and purpose-built; however, research on the CPORT is ongoing and stable recidivism norms have not yet been established. In addition, some of the items needed to score the CPORT (e.g., related to CSEM and other child content; see “Measures” section below) may not be routinely available. This has created challenges for missing data in validation studies, although it is possible that over time the increased use of the tool may motivate police and corrections staff to report this information more regularly. The RM2000 is a preexisting tool that can and is being applied to these individuals. It already comes with a history of research and normative data (Helmus et al., 2013; Lehmann et al., 2016). However, it is necessary to separately validate this tool for CSEM offenses (see Joint Committee, 2014).
There are seven predictive accuracy studies of the CPORT (including the development study) and only one (albeit large) validation of the RM2000/S in a CSEM sample. The effect size for the RM2000/S is higher than some studies on the CPORT and lower than others. However, it is hard to directly compare the accuracy of the two risk assessment tools because they were from different studies with potentially important differences between the samples or in the methodology. For example, validation studies from Europe tend to have significantly higher predictive accuracy compared with studies from North America (Hanson & Morton-Bourgon, 2009; Helmus et al., 2022). This may be due to higher quality criminal records in Europe (Helmus et al., 2011). In addition, many CPORT studies had substantial missing information.
This study provides the first direct comparison of the predictive accuracy of the CPORT and the RM2000/S among individuals with convictions for CSEM offenses. We hypothesized that the CPORT would have higher accuracy because it is a specialized tool. We also examined two analytic approaches that have been recently used to compare recidivism prediction scales or models (Bayesian information criterion [BIC] and the Delong test; described in the “Methods” section). We also explored whether the recidivism norms for the RM2000/S (Lehmann et al., 2016), which were not developed on CSEM samples, are applicable to this sample of individuals with CSEM offenses; no hypotheses were made. Finally, we provided a cumulative meta-analysis of the CPORT and RM2000/S validation research to date with CSEM populations.
Method
Sample
This study used 365 cases from the combined CPORT development and validation samples from Seto and Eke (2015) and Eke et al. (2019). Data were from 10 police services in the most populous province in Canada (Ontario) and included regional, municipal, and provincial police service data. The sample consisted of adult (18 years of age or older) males convicted of one or more CSEM
1
offense(s) (i.e., possession, accessing, distributing, or making/production). To be included in the current sample, the case had to have sufficient information available to score it on both the CPORT (following the missing data rules in the coding manual) and the RM2000/S. Conviction dates ranged between 1993 and 2010, with the vast majority occurring from 2000 onward. Most of the sample (99%) had at least one index charge for possession of child pornography, over a third (37%) had distribution charges, and fewer had production charges (21%) or accessing (21%) charges. Although many of the charges for production involved direct victimization of a child (e.g., taking images during contact sexual offenses), production charges can also be laid for transferring material from one electronic storage device to another. The average age at index investigation was 38.1 (
Individuals were classified into one of two groups based on their offense history: The child sexual exploitation materials/no-contact group (CSEM/NC;
Measures
CPORT
The seven CPORT items are scored dichotomously, as yes or no. CPORT total scores can therefore range from 0 to 7, with a maximum of one missing item (Eke et al., 2018). When information on admission/diagnosis of sexual interest in children is unavailable, this item can be replaced by a score of 3 or higher on the Correlates of Admission of Sexual Interest in Children (Seto & Eke, 2017). As reviewed earlier, CPORT predicts sexual recidivism with moderate to high accuracy in most studies. In addition, it has good interrater reliability in both research and field contexts (Eke et al., 2019; Hermann et al., 2019; Savoie et al., 2021; Seto & Eke, 2015).
Risk Matrix 2000/Sex
Scoring the RM2000/S involves two steps; in the first step, three items are scored: age at commencement of risk to reoffend, sexual crime court appearances, and general crime court appearances. Based on these scores, the individual is assigned one of four preliminary risk levels: below average, average, above average, and well above average. Then in Step 2, four aggravating items are scored dichotomously: male victim, stranger victim, never lived with a lover for 2 years, and noncontact sex offense. Stranger victim is not scored on the basis of CSEM materials (Thornton et al., 2023). Noncontact sex offense is only scored for internet offenses if the individual also has an offline sex offense. Finally, male victims are only scored on the basis of CSEM materials if there is evidence that the individual deliberately sought boys in the material. In this study, we operationalized this based on whether they had more boy than girl content in their CSEM collection. After Step 2, the individual’s initial risk level is increased by one category for every two aggravating factors that apply. The scale predicts well among diverse samples of men convicted of primarily offline sex offenses (Helmus et al., 2013).
Recidivism
Data on recidivism were collected from two main sources: police occurrence reports and a national database of criminal charges and convictions maintained by the Canadian Police Information Centre (CPIC), a service of the Royal Canadian Mounted Police (RCMP). Recidivism was defined as a new charge or conviction for any sexual offense (including CSEM, noncontact sexual offenses such as exhibitionism, and contact sexual offenses) and any CSEM offense. Follow-up began at the date of first release from the index CSEM charge(s) (e.g., release on bail, release at conviction) and ended at the date when criminal records were checked (summer of 2012 for the development sample and summer 2015 for the validation sample), or date of death, whichever was sooner. Time in custody (e.g., time in jail for the index or any subsequent offense) was subtracted, so follow-up time represented the person’s opportunity to offend while residing in the community (
Procedure
The CPORT and RM2000/S were scored from police files that included criminal history records, police occurrence reports, interview notes or transcription, police officer notes, forensic computer analysis reports, details about the size and content of the pornographic and child material, and in most cases, either videos or transcripts of police interviews. The categorization of the CSEM materials was obtained from forensic and police notes. Permission to access case file information was obtained from the participating police services. This research was approved by the institutional review board of the Royal Ottawa Health Care Group.
Transparency and Openness
The current study sample contains protected information maintained by the Ontario Provincial Police and cannot be shared outside the service; however, requests for additional analyses and data verification can be submitted for review for accommodation on-site. The meta-analysis data sets (and syntax template) are available from Open Science Framework (https://osf.io/pn9d6/?view_only=da2fa684366645d9b6db5a14c72464c5). Materials needed to score CPORT are available from ResearchGate and materials to score the Risk Matrix 2000 are available from www.saarna.org.
Overview of Analyses
Discrimination and Calibration
Discrimination and calibration are two types of predictive accuracy that can be examined for risk assessment tools (also sometimes referred to as relative and absolute prediction, respectively; Helmus & Babchishin, 2017). Discrimination examines how well the tool distinguishes recidivists from nonrecidivists (i.e., the extent to which higher risk scores are associated with higher likelihoods of recidivism). There are several statistics commonly used to assess discrimination; we reported AUCs as well as Harrell’s C and hazard ratios from Cox regression analyses. The AUC from receiver operating characteristic curve (ROC) analyses can range between 0 and 1, with values between .50 and 1 indicating positive predictive accuracy (higher scoring individuals are more likely to recidivate than lower scoring individuals), values of .50 indicating no predictive accuracy, and values below .50 indicating negative predictive accuracy. AUCs of .56, .64, and .71 were considered small, moderate, and large effect sizes, respectively, as they roughly correspond to Cohen’s
We also used Cox regression (Singer & Willett, 2003), which accounts for varying follow-up periods. Cox regression provides hazard ratios, quantifying increases in recidivism with each one-point increase on the risk scale, averaged across time. Hazard ratios, however, cannot easily be compared across scales unless they have the same possible range of scores, because scales with more points are expected to have smaller differences in the outcome between adjacent values. We, therefore, also reported Harrell’s C values (Harrell et al., 1996), which were derived from the Cox regression model. Harrell’s C is an analogue of AUCs for survival data, and can be interpreted in the same way (e.g., .56, .64, and .71 reflecting small, moderate, and large values).
Calibration refers to the correspondence between observed and predicted rates of recidivism. We could not examine calibration of the CPORT because we used the same sample used to report preliminary CPORT recidivism estimates. Analyses of the calibration of the RM2000/S were conducted with the E/O index (Gail & Pfeiffer, 2005; Rockhill et al., 2003), comparing with the estimates obtained from Lehmann et al. (2016). The E/O index is the ratio of the predicted or expected number of recidivists (E) divided by the observed number of recidivists (O; Method M0 from Viallon et al., 2009). The E/O index is both a significance test and a measure of effect size. If the predicted numbers of recidivists perfectly matches the observed number, the E/O index will be 1. Values below 1 mean that the RM2000/S underestimated recidivism, and values above 1 reflect overestimation. Ninety-five percent confidence intervals that do not include 1 indicate significant differences between observed and predicted rates. For further explanation and calculation examples of the E/O Index, see Hanson (2017). Analyses were run in either SPSS version 20.0 or R.
Comparing the Discrimination of CPORT and RM2000/S
A key purpose of this article was to compare the predictive accuracy of the CPORT and RM2000/S, but the optimal analysis for this is unclear. In the last 10 years, many researchers have compared AUCs for risk scales using the Delong test, which accounts for the correlation between the two tools (Delong et al., 1988; for further examples on the use of this test, see Babchishin et al., 2012; Eher et al., 2016; Helmus et al., 2019; Wakeling et al., 2011). More recently, the BIC has been used to compare regression models, for example, to examine changes in risk over time (Babchishin & Hanson, 2020; Hanson et al., 2021; Helmus et al., 2021; Lloyd et al., 2020). For Cox regression models, the BIC = −2 LL + [k × ln(
A key advantage of the Delong test is that it accounts for the correlation between the tools being compared, which maximizes statistical power. A drawback is that it is primarily a null hypothesis significance test (i.e., there are no benchmarks for interpreting the differences in AUCs). The BIC includes benchmarks for interpretation of the magnitude (i.e., effect size) of differences in model fit. However, like many benchmarks, they may have some arbitrariness to them and are not meant to be applied blindly. Both are strongly influenced by the number of recidivists (which are small in many of our analyses, which could lead to substantial fluctuations). Consequently, we explored comparisons using both techniques.
Meta-Analysis
Validation studies of CPORT and RM2000/S with CSEM samples were obtained by searching both scale names in Google Scholar and PsycINFO. In addition, the developers of both scales were asked to provide a list of all validations they were aware of, and an additional scholar who specializes in CSEM research was also contacted to see whether they were aware of anything we missed. This field of research is sufficiently small that it is unlikely any of these scholars would be unaware of any published research, but it is possible that some unpublished research was missing, although most of the CPORT validation studies are unpublished dissertations that have been shared or discovered by CPORT authors.
Meta-analyses followed the formulae of Borenstein et al. (2021). Although random-effects analyses are often conceptually preferable, they are unstable when the number of studies is below 30 (Schulze, 2007); consequently, we reported both, but primarily relied on fixed-effect analyses. The primary drawback of fixed-effect meta-analysis is that it assumes studies are measuring the same common effect size and variability across studies is not incorporated into the error term, often resulting in unrealistically narrow confidence intervals. Variability in findings across studies was reported using Cochran’s
Overlap With Previous Research
Fixed 5-year follow-up AUCs for the CSEM/NC and CSEM + Contact subgroups were reported in Eke et al. (2019), with minor fluctuations due to ongoing data cleaning. CPORT analyses for the full group are similar to what is reported in Eke et al. (2019) but are following the new rules for handling missing data (Eke et al., 2018). Cox regression, Harrell’s C, all analyses of the RM2000/S, and comparisons between the CPORT and the Risk Matrix have not been reported elsewhere. The meta-analysis has also not been reported elsewhere.
Results
Table 1 presents the mean CPORT scores and distributions among RM2000/S levels for the full sample, as well as the CSEM/NC and CSEM + Contact group. CPORT scores were significantly higher for the CSEM + Contact group compared with the CSEM/NC group (3.5 vs. 1.5), with a large effect size (Cohen’s
CPORT and Risk Matrix 2000/S Total Scores and Subgroup Comparisons
For comparisons of the proportions in risk levels between the CSEM/NC and CSEM + Contact samples, Cohen’s
In the overall sample, more than 80% of the men scored Below Average or Average risk on the RM2000/S. Comparing the CSEM/NC and CSEM + Contact groups, there were significantly more CSEM/NC men in Below Average and Average groups, and significantly fewer in the Above Average and Well Above Average groups. In the overall sample (
Comparing the CPORT and the Risk Matrix 2000/Sex
CPORT and RM2000/S scores were strongly and positively correlated (
Comparing AUCs From the CPORT and Risk Matrix 2000/Sex Using the Delong Test, and Fixed 5-Year Follow-Up
For the CSEM + Contact group, which was the group the RM2000/S was designed for, the RM2000/S had a slightly higher AUC than the CPORT for sexual recidivism (AUC = .74 vs. .72, respectively), but the difference was not significant. In all other analyses in Table 2, the AUC for the CPORT was higher. It was a small difference for the CSEM + Contact group (AUC differences of .03), but in all other comparisons, the differences in AUCs were more pronounced, ranging between .07 and .12. In addition to sexual recidivism for the full group, the only other AUC difference that was statistically significant was predicting sexual recidivism among CSEM/NC individuals.
Table 3 presents the Cox regression results, which include the Harrell’s C effect size based on survival data. These results were similar to the AUCs based on fixed follow-ups in Table 2. Here, however, the focus is on the BIC as an indicator of model fit, and differences between BICs are a comparison of the fit of the CPORT versus the RM2000/S. For the full sample, CPORT was a better fit than the RM2000/S; the difference in model fit was strong for any sexual recidivism, and very strong for CSEM recidivism. For the CSEM/NC sample, the differences between model fit again favored the CPORT. For the CSEM + Contact sample, the difference between the CPORT and RM2000/S was almost nonexistent for any sexual recidivism, but strong for the prediction of CSEM recidivism, again favoring the CPORT.
Comparing CPORT and Risk Matrix 2000/Sex Based on BIC From Cox Regression Models
Calibration of the RM2000/S
Table 4 presents the 5-year sexual recidivism rates per RM2000/S risk level for the full sample and subgroups, alongside the recidivism norms for the tool from Lehmann et al. (2016). Recidivism rates observed in the current sample were generally higher than expected rates. Overall, RM2000/S significantly underestimated recidivism for the full sample and the CSEM/NC and CSEM + Contact subsamples, all by roughly the same amount, with E/O indices between .57 and .58; this indicates that expected recidivism rates were 57% to 58% of what was observed, or conversely, observed recidivism rates were nearly twice as high as expected. For example, for the full sample, the scale predicted 23 sexual recidivists but there were 40. The underestimation was roughly consistent for all risk levels except the above average risk level, where there was slight (nonsignificant) overestimation of recidivism.
RM2000/S Calibration Analyses With Fixed 5-Year Follow-Up Data
Cumulative Meta-Analysis of CPORT and RM2000/S Validation Studies
Table 5 summarizes the replication studies of CPORT and RM2000/S with men with CSEM offenses, including the current sample (which overlaps with Eke et al., 2019 and Seto & Eke, 2015). Where multiple effect sizes were reported, we coded effects based on the CPORT recommendations for how much missing information was allowed (Eke et al., 2018), with the exception of Soldino et al. (2021) and Eke et al. (2019), where we used all cases because restricting the sample to those with complete information severely reduced the sample size. Especially given the small number of replication studies, there could be a concern that including the development study (Seto & Eke, 2015) might inflate the findings. However, the effect size from the development study was the median value; consequently, there was no need to remove it as it would not meaningfully impact the analysis. Instead, effect sizes from the current study were used to replace Seto and Eke (2015) and Eke et al. (2019) as this study contains both samples, with the most updated and cleaned data set.
Studies of CPORT and Risk Matrix 2000 Included in Cumulative Meta-Analysis
This effect size was not in the dissertation but was obtained by personal communication (H. Gunnarsdóttir, personal communication, December 16, 2021). bTwo of the 14 recidivism incidents were for technical breaches of their sexual offense supervision order, and not necessarily for committing a new sexual offense. cNote that the current study subsumes the samples from Eke et al. (2019) and Seto and Eke (2015). The meta-analysis reported in text replaces those studies. dLog odds ratios and their 95% confidence interval limits were transformed to Cohen’s
For the prediction of any sexual recidivism, there were five CPORT studies with a large weighted average AUC of .75 (95% confidence interval [CI] = [.71, 79],
Meta-Analysis Results for CPORT and RM2000
Table 6 also presents the meta-analytic results for the individual items of the CPORT. Item data were available from the current study as well as Black (2018), Pilon (2016), Savoie et al. (2021), and for CSEM recidivism, also Soldino et al. (2021). As noted in Table 5, however, some samples were missing information on some items, so sample sizes fluctuate dramatically for these analyses. For any sexual recidivism, Items 1 through 4 had data from four samples and demonstrated significant predictive accuracy. For Items 5, 6, and 7, sample sizes dropped considerably. Item 5 (pedo/hebephilic interests) was a significant predictor in the fixed-effect model but not the random-effects model. Items 6 and 7 (related to preferences for boys in CSEM and other child content) only had data from two samples (the current study and Savoie et al., 2021). These items significantly predicted in the current sample but had negative predictive accuracy in Savoie et al.’s (2021) sample. These items were not significantly predictive in the aggregated analysis. For CSEM recidivism, Items 1 through 3 significantly predicted recidivism in both fixed-effect and random-effects analyses. Item 4 (any contact sex offense) did not quite reach statistical significance and had lower effect sizes (AUC = .54) compared with any sexual recidivism (AUC = .58). Items 5 through 7 did not significantly predict CSEM recidivism, although sample sizes were much reduced. In addition, as per the total score for CSEM recidivism, four of the seven items had significant variability in predictive accuracy across samples.
There were only two nonoverlapping studies of the RM2000/S, but combining them yielded an average AUC of .66 (95% CI = [.59, .74],
Discussion
This study was the first to directly compare the CPORT and the RM2000/S in the same sample and also used two different techniques to compare the predictive accuracy of the scales, the Delong test and BIC differences. Both the CPORT and the RM2000/S demonstrated large effect sizes in the fixed 5-year follow-up analyses for the overall sample; effect sizes dropped a little for the Harrell’s C analyses based on survival data. All but one effect size favored the CPORT (survival analyses of any sexual recidivism with the CSEM + Contact group slightly favored the RM2000/S), although differences in AUCs ranged considerably, from .03 to .12.
Delong tests showed that the CPORT significantly outperformed the RM2000/S in predicting any sexual recidivism for the overall sample and the CSEM/NC sample. Comparing BICs, all comparisons suggested the CPORT model was a better model than the RM2000/S, except for sexual recidivism among those with contact offenses. The differences in model fit were considered strong or very strong for predicting any sexual and CSEM recidivism among the full sample, and CSEM recidivism among the CSEM + Contact sample. The stronger effect sizes for the CPORT could be because it was specially developed for CSEM perpetrators, or possibly because it has a larger range of scores within which to distinguish risk, whereas the RM2000/S has only four levels (and the range within those levels tends to be restricted among CSEM samples).
For evaluators using these risk tools for CSEM cases, the group of most interest is those who do not have any offline sexual offenses because other widely used risk tools such as Static-99R are not applicable in these cases. 2 Effect sizes were meaningfully lower for this subgroup, but still significant, and in the fixed follow-up analyses for CPORT, moderate in magnitude and comparable with Static-99R (see Helmus et al., 2022). It is difficult to interpret these subgroup differences. While the CSEM/NC group is often of particular interest, separating them out from men with CSEM and contact offenses is restricting the range in risk and will necessarily reduce effect sizes. In this sample, the CSEM + Contact group scored two points higher than the CSEM/NC group on the CPORT, where only one point of which could be explained by the item for contact sex offenses. So for a broad and heterogeneous sample of people with CSEM convictions, CPORT does well at discrimination in predicting sexual recidivism. When looking at a narrow subgroup with considerably less heterogeneity, such as those with no other criminal history, it becomes harder to distinguish risk, but in the context of the full group, their lower risk scores and reduced variability is informative in and of itself.
Methods for Comparing Predictive Accuracy
Both the Delong and BIC tests revealed meaningful or significant differences between the CPORT and the Risk Matrix, but not necessarily in the same comparisons. This may partly be because the Delong tests used fixed 5-year follow-ups, which tended to yield cleaner and stronger effects than the Cox regression survival analyses used for the BIC comparisons. Nonetheless, the heuristics for interpreting BIC differences appeared to identify more differences as meaningful. Given the BIC is used to examine magnitude of model fit differences, it may be sensitive to large and meaningful differences that do not have sufficient statistical power to reach significance in the Delong test. The Delong tests, however, take into account the correlation between the two tools. Until more research is conducted comparing the two approaches, there may be some benefit in reporting both for direct scale comparisons. Minimally however, it is important to recognize that the analytic approach taken will impact the conclusions.
Calibration Issues in CSEM Risk Assessment
Given the low recidivism rates of men with CSEM offenses (Babchishin et al., 2015, 2018; Seto et al., 2011), a concern about using generic sexual offense risk assessment tools is that they may overestimate the risk of men with CSEM offenses, leading to violations of the risk principle of effective correctional practice (Bonta & Andrews, 2017). The current sample found the opposite; observed recidivism rates in this sample were almost twice the predicted recidivism rates from the RM2000/S norms. It is not clear why this was the case. It could not be attributable to the number of people in the higher risk group with mixed CSEM and contact offenses, as this evidence of poor calibration was found in the CSEM/NC group as well. The current study defined recidivism as new charges (with a large majority known to have ended in convictions) for sexual offenses, whereas the recidivism norms for the RM2000/S are for new convictions only. This methodological difference is unlikely to account for a large difference in calibration, based on previous research demonstrating that differences in recidivism rates across studies were not consistently and meaningfully explained by the use of charges or convictions as the recidivism outcome (Helmus, 2009).
The current sample may have a higher sexual recidivism rate in large part because of our access to multiple sources of good quality recidivism information. For example, we had access to outcome data through police occurrence reports from a large group of police services. The benefit of police occurrence reports is that they include charges at the time they are laid. We also had information from a national database of convictions. However, there may be lag time between a conviction being registered in court and it being recorded on the national system. As well, some convictions may be removed from the national database if the individual received a pardon for their offense(s), although reference to these offenses may still appear in the police databases. Furthermore, not all convictions are included in the national database. This study had an 11.8% sexual recidivism rate after 5 years, which is slightly higher than the base rate of recidivism among men charged or convicted for non-CSEM sexual offenses in previous research (9.1%; Hanson et al., 2018). Savoie et al. (2021) had a somewhat similar high 5-year sexual recidivism rate (10.0%) in their CPORT validation study, but Pilon (2016) had a recidivism rate of 2.9%, based on a more restrictive definition of new convictions solely within the province of Ontario. Consequently, it is unclear whether the RM2000/S truly underpredicts sexual recidivism for those with CSEM offenses, or whether this sample has an unexpectedly high recidivism rate. Minimally, however, the results should mitigate concerns that the scale’s recidivism probabilities will overpredict recidivism. Although those with CSEM offenses as a group may (in most studies) have lower sexual recidivism rates than those with offline sex offenses, their risk level will likely take this into account, especially given that the RM2000/S has clear coding rules for handling these cases.
Summary of Meta-Analytic Findings
Finally, we conducted a meta-analysis of existing CPORT and RM2000/S validation studies among men with CSEM offenses. Comparisons between the two tools are less useful in these meta-analyses because the samples and methods differ across studies. Nonetheless, the results support the use of both tools. CPORT had large effect sizes in predicting any sexual recidivism and moderate effect sizes in predicting CSEM recidivism. The RM2000/S had moderate effect sizes in predicting any sexual recidivism, with insufficient data to test CSEM recidivism. The CPORT has more validation studies available, although some have considerable missing information (Black, 2018; Pilon, 2016) and the scoring of the items were modified in Pilon (2016). The RM2000/S has only two nonoverlapping studies, but one is quite large, with nearly 1,000 men with CSEM offenses, and both yielded very similar effects, with AUCs between .66 and .67. As an important point for comparison, although not large, the meta-analytic average effects for the CPORT and the RM2000/S for this population are similar to or greater than a recent meta-analysis of the predictive accuracy of Static-99R, the most widely used sexual offense risk tool (fixed-effect AUC = .68,
There was considerable overlap in the studies available to test CPORT’s accuracy in predicting CSEM and any sexual recidivism. Two interesting patterns emerged across these findings. Predictive accuracy was consistent across samples for any sexual recidivism but not for CSEM recidivism. Effect sizes were also meaningfully larger for predicting any sexual recidivism (AUC = .75) compared with CSEM recidivism (AUC = .66). Any sexual recidivism is a broader outcome and includes CSEM offenses, and it was the outcome being predicted by the development sample, so in some ways it is understandable that the scale may be maximized for this outcome. This is also convenient as sexual recidivism risk assessments of individuals with CSEM convictions are often concerned with any sexual recidivism.
The lower accuracy and greater variability in findings for CSEM recidivism may also reflect challenges in measuring this outcome reliably. Although all sexual offenses are underreported to police, CSEM is unique in that its detection is heavily reliant on the investigative techniques and resources of police agencies, political priorities, and jurisdictional variation in prosecution practices. This could simply make it a harder outcome to predict reliably and accurately. In contrast, however, once it is identified/prosecuted, the digital evidence inherent in modern CSEM offending makes a conviction highly likely.
Finally, another possible explanation for the lower accuracy for predicting CSEM recidivism compared with any sexual recidivism may be a statistical artifact. Lower base rate outcomes are harder to predict, and many effect size measures are influenced by the base rate of the dichotomous outcome, such as recidivism. The effect sizes we used (AUCs and Harrell’s Cs) are less impacted by base rates than correlations, but are not unaffected (Babchishin & Helmus, 2016).
Strengths and Limitations
This study is the first to directly compare two risk assessment tools in the same sample of individuals involved with CSEM. In addition, the data in Study 1 included limited missing information, high-quality recidivism information, and a reasonably good length of follow-up (5+ years). There are also limitations to the study and constraints on generality. The current sample contained adult men, predominantly White, residing in Ontario, Canada. The meta-analysis supported the consistency of Risk Matrix 2000 and CPORT in predicting sexual recidivism in some other similar countries (industrialized, high-income, relatively educated). Generalizability outside these countries is unknown. Cross-cultural validity analyses should be conducted where sample sizes are sufficient, and more research is needed across countries. Women who sexually offend are meaningfully different than men who sexually offend (Cortoni, 2018) and these tools are not recommended for women without additional research. We also operationalized the male victim item of the RM2000/S based on a greater proportion of male versus female content, as opposed to the guidance in the coding manual, which focuses on evidence of deliberately searching for male content. This may have underestimated the prevalence of this risk factor.
Similar to most studies on individuals convicted of CSEM offenses, the sample size is not particularly large. Aggregating the results through meta-analysis improves our understanding of predictive accuracy, but introduces additional limitations. A meta-analysis is only as good as the quality of the studies included, which typically vary in terms of sample size and quality of data available for risk tool scoring and recidivism information. There were only two studies available on the RM2000/S, and some studies on the CPORT had considerable missing information. However, the missing information suggests that if anything, the current meta-analysis would offer a conservative estimate of its predictive accuracy. In addition, there were too few studies on the CPORT overall for any meaningful moderator analyses.
Conclusion
If sufficient information is available, the current study suggests the CPORT may be preferable to use. The RM2000/S is empirically defensible to use with men with CSEM offenses, but may underestimate recidivism; further research is needed to replicate this finding. Future research could also explore the applicability of other risk tools (e.g., Static-99R, STABLE-2007) for those with CSEM offenses.
