Abstract
Keywords
Traditionally, assessment of road safety relied on analyzing crash data reported by police. This is called direct assessment of safety based on crashes that present well-known challenges, including a slow and reactive approach because of the reliance on crash data, restricted availability of information for assessing crash risks, and a lack of real-time capabilities. Conflict-based safety analysis has been particularly valuable for analyzing rare crash types, such as those involving pedestrians. By using observed traffic conflicts, crashes can be estimated, so there is no need to wait several years for crash data to be collected for evaluating safety. That is why surrogate safety analysis (SSA) has been proposed as a complementary approach for safety assessments. This method relies on safety-critical events known as traffic conflicts instead of traffic crashes.
Traffic conflicts represent safety-critical events, and the assumption in this approach is that these events are most likely associated with the crash occurrence and, therefore, are strong indicators of safety. This assumption led to their use in safety assessments of road entities. Moreover, traffic conflicts can be useful for developing crash-based safety performance functions (SPFs) as an alternative or complementary approach to conventional SPFs used for a variety of Highway Safety Manual (HSM) applications. There are challenges with developing and calibrating robust, conventional SPFs for these HSM applications, as was evident in the most recent research aimed at meeting these challenges (
The application of conflicts to estimate crash-based SPFs for pedestrians is the focus of this paper. In the process, specific interactions between turning vehicles and pedestrians in these SPFs are considered. The appeal of using conflict measures as explanatory variables in SPFs is in their ability to logically capture the effects of multiple variables that relate to crashes, which, as noted, can be challenging to accomplish in conventional SPFs. Therefore, SPFs based on traffic conflicts can be used to proactively estimate crashes without requiring additional explanatory variables. This potential is especially important for quickly evaluating pedestrian safety improvements, in particular, new and innovative ones that are often the targets of Vision Zero programs.
Until relatively recently, conflict-based assessment has predominantly focused on one conflict indicator, that is, frequency based on either time to collision (TTC) or post encroachment time (PET). (TTC is the time required for two road users to collide if they continue at constant speeds and on the same path, while PET is the time difference between the moment a first road user leaves an area of potential collision and the moment of arrival of a second road user to this area.) Some early and notable efforts at incorporating another measure, that is, severity, include Ozbay et al. (
The research for this paper aimed to further the earlier and more recent efforts in estimating pedestrian SPFs while complementing existing crash prediction methods. In so doing, machine learning (ML) was used to develop a data-driven safety index that integrates conflict frequency and severity indicators as a replacement for conventional integration methods. This index was then used to classify pedestrian conflicts derived from video observations at 44 intersections in five Canadian cities into groups. The frequencies of conflicts in these groups, and crashes at these intersections, were then utilized to estimate SPFs.
The rest of the paper is structured as follows. The next section presents some background on ML-based safety indexes, followed by a description of the data. The methodology is then outlined, and the results of integrating conflict frequency and severity indicators, along with the estimation of the SPF, are presented. The final section presents a summary and conclusions.
Background on Machine Learning-Based Safety Indexes
Research on integrating conflict frequency and severity indicators is somewhat limited in the field of road safety; however, such research is quite prevalent in the domain of air traffic safety and has advanced ML techniques for this purpose. These techniques are so powerful that they can efficiently analyze and integrate high-dimensional safety-related indicators within a matter of seconds. The product of this integration is a unique index, known as an anomaly score (AS). This safety index quantifies the degree of separation of each sample within the population from the remainder of the data set (
Das et al. (
More advanced ML methods were introduced in the other studies to increase the accuracy of the results. For example, Puranik and Mavris (
In the road safety field, as noted, there have been precious few attempts at developing a ML-based safety index. One such research effort is only marginally relevant to the current study in that it pertains to the safety of automated vehicles (
Data
Miovision, a video analytics company, provided a data set from five Canadian cities (Toronto, Hamilton, York Region, Winnipeg, and Calgary) that contains information on conflict indicators, conflict speed (CS), and movements for all road users, including pedestrians, cyclists, and vehicles. Video data were recorded for over 24 h between 2020 and 2022 at 44 urban signalized intersections that share similar characteristics, for example, geometric layout (four-legged intersections) and operations (signal-controlled). This is a large database, according to the standards of previous research involving pedestrian crashes and video-derived conflicts (
The data were collected using four high-quality cameras placed at an elevated position mounted on 7 m high poles installed at each intersection. Figure 1 illustrates the layout, camera placements, and coverage (the shaded areas) of one of the studied intersections. At this intersection, one camera pole was positioned on the pedestrian refuge island at the northeast corner (b), while additional cameras were installed at the (a) Northwest (NW), (c) Southeast (SE), and (d) Southwest (SW) corners. These four cameras were necessary to capture an unobstructed view of all approaches and the pedestrian crosswalks at the intersection. These intersections are located within the same jurisdiction. Miovision indicated that the data were subject to a 17-step quality assurance and quality control (QAQC) process, which includes a manual review of every detected conflict using specialized tools by a team of software operators, as well as numerous spot checks and validation checks of results.

Examples of camera positions and views from each overhead camera.
Conflict records were collected for potentially serious vehicle-to-pedestrian interactions. One such interaction in the database is shown in Figure 2. As rationalized later in the

Vehicle-to-pedestrian conflict captured by elevated high-quality cameras in Toronto.
Crash records for the 5 years from 2017 to 2021 were assembled for the 44 intersections. This relatively rich data set included injury severity levels (i.e., property damage only, non-fatal injury, and fatal injury), collision impact type, pavement condition, and vehicle movement directions. The pedestrian crashes were separately extracted for each approach and then added together to produce the total number of such crashes at the intersection.
Table 1 presents the descriptive statistics of the data for conflicts and crashes between pedestrians and vehicles for the studied intersections. Not surprisingly, the mean value of 2.53 s for
Summary of Conflict and Crash Data
Methodology
In this section, the selection of the surrogate measures of safety and the ML-based safety index for integrating conflict frequency and severity indicators are discussed before presenting the study framework.
Rationale for the Selection of Safety Indicators
Generally speaking, traffic patterns are formed by aggregating the interactions among individual road users, including automobiles, pedestrians, and cars. An interaction can be an adaptation process or a response to the behavior of other road users. Traffic safety depends on these individual interactions. To be more precise, consider the frequency distribution function in Figure 3; increasing the conflict severity scale from left to right increases the risk of a collision, although the frequency of interactions decreases. These considerations are fundamental to the selection of the frequency and severity indicators.

Conflict frequency distribution of traffic interactions as a function of severity.
Conflict Frequency Indicator
TTC is a commonly used frequency indicator that was first considered for this research. It was introduced back in 1972 (
TTC, however, has some limitations as a vulnerable road user (VRU) conflict frequency indicator because of interactions when VRUs narrowly avoid impact and when the closeness of the interaction causes vehicles to stop for a few seconds before advancing; another issue with TTC arises when users are not on a precise collision course but were on a course to narrowly avoid a collision, a situation that is unsafe because of the low margin for error.
To overcome these issues with TTC,
This paper employs

Illustration of the
Conflict Severity Indicator
Delta-V or
Integration Method
This section presents the integration method, namely the autoencoder neural network (ANN). The aviation safety and similar literature suggest a reliance on autoencoders to integrate high-dimensional data by creating a single-dimension index called an AS (
An autoencoder attempts to encode data by compressing them into lower dimensions, represented by a “bottleneck” layer or code, and subsequently decoding the data to reconstruct the original input. The bottleneck layer, which is the hidden layer where the encoding is produced, retains the compressed representation of the input data or, in other words, the AS. The mean reconstruction loss is minimized during the training process based on the frequent data points. Because of the relatively low frequency of anomalies in the observations, the autoencoder does not prioritize their reconstruction loss while training. As a result, the trained autoencoder reconstructs the safe interaction with the lowest reconstruction error, while anomalies have the highest AS (
To express the mean reconstruction loss concept in mathematical terms, consider a training data set
Methodological Framework
Figure 5 illustrates the study procedure and modeling framework. Using the ANN and the safety index developed by integrating

Methodological framework.
Integration Results
The autoencoder model was selected to integrate conflict frequency and severity indicators, as described earlier. This model was run with Python using the pyod.models.auto_encoder, sklearn.preprocessing, and sklearn.model_selection library packages. Two encoders and decoders were used, indicating the number of layers responsible for encoding and decoding the input data. A single latent layer captured the compressed input data representation. The random state was set to “None,” meaning that no random seed is used for reproducibility. Shuffle was assumed to be “True,” signifying that training data are shuffled before each epoch. (This means that the order of the data samples is changed before each run, ensuring that the model does not learn patterns based on the order of the data. Shuffling the data helps prevent the model from being biased by the order of the samples and can lead to more robust and generalizable learning.) No early stopping was applied for the model during the training to prevent overfitting. Lastly, learning_rate_init was considered equal to 0.0001, representing the initial learning rate of the optimization algorithm used to update the model parameters during the training.
After training the data set based on the autoencoder with the described tuning parameters and using Equation 3, the reconstruction error for each observation was calculated. Figure 6 plots the autoencoder reconstruction error distribution for the combined data set of all intersections. This histogram provides a continuous reconstruction error distribution for each data point, which helps identify the anomaly decision threshold. A sharp decrease in the distribution ordinate suggests that the anomaly decision threshold for separating extreme conflicts from others should be close to that value. The conflicts that exceed the threshold have higher conflict severity levels. In Figure 6, the blue line represents the AS, and the red rectangle suggests the potential threshold range.

Histogram of anomaly scores obtained from the autoencoder algorithm for all intersections.
Figure 7 shows a scatter plot of the estimated ASs for each traffic conflict with

Classifying traffic conflicts based on their autoencoder neural network-based safety index.
Estimation of Pedestrian Safety Performance Functions
After selecting the initial threshold range based on the AS histogram illustrated in Figure 6, different AS threshold values ranging from 1 to 2, with intervals of 0.5, were employed to determine which AS thresholds provided conflict frequencies that provided the best-fit SPFs for vehicle-to-pedestrian crashes. This preliminary modeling revealed that an AS threshold equal to 1 that classified conflicts into two groups, conflicts with AS < 1 and conflicts with 1 ≤ AS, was best for estimating SPFs.
Table 2 shows the number of total pedestrian crashes per year and the candidate independent variables for the SPFs—the classified vehicle-to-pedestrian conflicts per day in the two groups, the proportion of daily conflicts that occur at night, and the proportion of daily conflicts that involve a left-turning vehicle. As seen in Table 2, in Toronto, an average of 5.6 pedestrian crashes per year is associated with 121 severe conflicts daily. Of these conflicts, approximately 12% occur at night, and nearly half involve left-turning vehicles and pedestrian crossings. After classifying conflicts into different severity levels based on their ASs, the next step was establishing a relationship between categorized conflicts according to ASs and the reported crashes.
Number of Pedestrian Crashes and Conflicts for Each City
To extrapolate conflicts to crashes and develop a link between them, conventional (or fixed-effect) generalized linear and simple linear regression models were first estimated. Although Peesapati et al. (
Fixed-effect models assume that the intercept and/or parameter coefficients should remain constant across all observations in various cities. In reality, pedestrian conflicts might vary among cities because of variations in pedestrian and motorist behavior and intersection characteristics (
Fixed- and mixed-effect generalized linear models (FE-GLM and ME-GLM) with a log link function can be written as follows:
Fixed- and mixed-effect linear regression (FE-LR and ME-LR) models were then developed as follows:
where
In practical terms, when the standard deviation of a parameter is notably greater than zero, the mixed-effect model will be chosen. Conversely, the fixed-effect model may be appropriate if the standard deviation is small or close to zero. It should be noted that a fixed-effect model is equivalent to a mixed-effect model, as presented in Equations 4 and 5, with the
One of the challenges encountered while pursuing this study was the identification of a suitable goodness-of-fit (GOF) measure for the negative binomial (NB) family of models that has the following properties: (a) it has a [0,1] bound; (b) it has a proportional increase concept where adding exploratory variables to the model one at a time will result in the same increase regardless of their order of selection; and (c) it is invariant with respect to the mean. Miaou (
where
For the linear regression models, model performance was assessed by centered and uncentered
For the GLMs and the linear regression models, the Akaike information criterion (AIC), Bayesian information criterion (BIC), and
Linear Regression Safety Performance Functions
The FE-LR models estimated for total pedestrian crashes per year are summarized in Table 3 with and without intercept. In developing the models, different model forms were tried, including multiple linear regression that incorporated, as additional independent variables, combinations of regular vehicle-to-pedestrian conflicts, left-turn extreme conflicts, right-turn and through movement extreme conflicts, the proportion of conflicts at night, and the proportion of left-turn conflicts. Left-turn, right-turn, and through movements extreme conflicts indicate the number of extreme conflicts that occur between pedestrians and vehicles turning left, right, and going straight through an intersection, respectively. The proportion of conflicts at night indicates the number of vehicle-to-pedestrian conflicts that occurred at night divided by the total vehicle-to-pedestrian conflicts. Left-turn extreme conflicts indicates the proportion of extreme conflicts that occur between left-turning vehicles and crossing pedestrians. Right turns and through extreme conflicts indicates the proportion of extreme conflicts that occurred between right turns or through movements and pedestrians at intersections.
Fixed effect linear regression parameter estimates (and standard errors)
Note: Est. = estimate; SE: standard error; LT = left-turn; RT & Thru = right-turn and through movements; na = not applicable.
After comparing the centered
To capture heterogeneity between the cities, the ME-LR model was estimated, with the results shown in Table 4. Six models were developed by considering the different combinations of fixed and mixed parameters. Even though several exploratory variables were initially considered, comparing full log-likelihoods, centered
Mixed effect linear regression parameter estimates (and standard errors)
Note: Est. = estimate; SE = standard error; LT = left-turn; RT & Thru = right-turn and through movements; na = not applicable.
Comparing the best FE-LR and ME-LR models (Models 6 and 4, respectively), the former encompassed three independent variables, while the latter included one independent variable. In addition, centered
GLM Safety Performance Functions
As can be seen in Table 5, SPFs were estimated using a FE-GLM form with a log link. The models were fitted with independent variables, including classified extreme conflicts, the proportion of conflicts at night, and the proportion of left-turn conflicts. It is worth noting that the GLMs only use 39 intersections instead of all 44 since the GLM framework includes the natural logarithm (Ln) of certain independent variables. For five intersections, the values of these independent variables (extreme conflicts) were zero, and the Ln of zero is undefined. This led to the exclusion of these intersections from the analysis to ensure the validity and accuracy of the model. It is seen that Model 1 has the lowest
Fixed effect generalized linear model estimates (and standard errors)
As was done for the linear regression models, a ME-GLM was developed to account for variances across different cities. The results are presented in Table 6. Although several exploratory variables were initially considered, the generally high
Mixed effect generalized linear model estimates (and standard errors)
Note: DIC = deviance information criterion; AS = anomaly score; na = not applicable.
Cumulative Residual Plots
As mentioned earlier, CURE plots are used to determine whether the selected functional form fits an explanatory variable along the entire range of its values represented in the data by offering a visually informative assessment of the GOF for the models. These plots were generated by first sorting intersections in ascending order based on the model-estimated pedestrian crashes. Then, the residuals (the difference between observed and predicted crashes) were calculated for each intersection, and cumulative values were plotted. The standard deviation boundaries,
where N is the total number of data points (residuals) and
In the interpretation of the plots, when the CUREs consistently drift upwards, crashes are underestimated by the model and, conversely, when the CUREs consistently drift down, the model overestimates crashes.
Figure 8 depicts the CURE plots for the four final models: the FE-LR model, ME-LR model, FE-GLM, and ME-GLM, shown as (a), (b), (c), and (d), respectively. As seen, the plotted lines oscillate around the abscissa and fall within the two standard deviation lines, indicating that the model fits the data well for a range of model-estimated pedestrian crashes. The ME-LR model seems best with respect to the smoothness of the oscillations along the

Cumulative residual (CURE) plots for the crash model estimates based on the fixed-effect linear regression (FE-LR) model, mixed-effect linear regression (ME-LR) model, fixed-effect generalized linear model (FE-GLM), and mixed-effect generalized linear model (ME-GLM), shown as (a), (b), (c), and (d), respectively.
Development of Statistical Equations for Anomaly Score Boundaries
Classifying conflicts based on ASs enables a more precise estimation of crash frequency; however, this process necessitates training an ANN for each specific data set, which may discourage jurisdictions from estimating long-term crash frequency using short-term video observations. To foster the practical application of this method, straightforward equations that define the boundaries between consecutive AS levels can offer an alternative for classifying severe conflicts. This research adopted a simple quadratic form for the boundary equations, as a visual inspection of the data plotted in Figure 7 indicates that the threshold boundary is curved and could be effectively modeled by such a function.
The proposed function form that aligns with the data is as follows:
where
To estimate the parameters
Equation 11 shows the parameters of the estimated boundary equations for extreme conflicts defined by an AS greater than 1, which can be used for estimating total pedestrian crashes. This means that when the results of the expressions on the left-hand side exceed 25, the corresponding extreme conflict is associated with an approximate AS of 1:
This equation is plotted in Figure 9.

Plot of the statistical equation for the anomaly score boundary.
Summary and Conclusions
A primary objective of the research was to use traffic conflicts for estimating crash-based pedestrian SPFs as a complementary approach to conventional SPFs used for HSM applications. The idea was to overcome the challenges of developing and calibrating robust, conventional SPFs for HSM applications, which were evident in recent research aimed at estimating those SPFs.
Using conflict measures as explanatory variables in SPFs can logically capture the effects of multiple variables, including specific interactions that relate to crashes that, as was evident in the recent HSM research (
Another key objective of this study was to advance the methodology for defining conflicts that are most closely related to crashes. To this end, the study examined an ANN model as a replacement for conventional approaches for integrating conflict frequency and severity indicators. For this exploration, a relatively rich pedestrian conflict database of 44 intersections in five Canadian cities covering 5 years of crash data and 24-h video-derived traffic conflicts with more than 1000 vehicle-to-pedestrian interactions was used. Then, conflicts were labeled and classified based on the unique data-driven safety index (i.e., AS), which was determined by integrating conflict frequency and severity indicators.
Once extreme conflicts were classified, SPFs were estimated using the ME-LR model, FE-LR model, ME-GLM, and FE-GLM to relate crashes to extreme conflicts and to combinations with other variables, including left-turn extreme conflicts, right-turn and through-movement extreme conflicts, the proportion of conflicts at night, and the proportion of left-turn conflicts. In general, the linear regression models outperformed the GLMs, a result that is consistent with previous research (
This study is unique in the sense that it attempts to demonstrate the potential of using ML methods for categorizing conflicts based on a data-driven safety index to reliably predict annual pedestrian crash frequency at an intersection in a short-term study. The estimated models revealed that the investigated approach is viable in that the best models were those that related crashes only to those conflicts with ASs larger than a threshold value. This emphasizes the need for proper classification of conflicts based on the AS threshold. The approach is especially important given the focus on pedestrian crashes in Vision Zero plans and the reality that observed frequencies of these crashes are typically too low for assessing the safety of intersections and for estimating models to capture the effects of multiple variables on crashes.
The practical application of these findings for other regions could be considered, albeit with due caution, as the safety index is derived from specific data and may vary with changes in the data set. To facilitate this practical application, simple statistical equations were developed based on Figure 7 that approximate the boundaries between consecutive AS levels. This approach would be particularly useful for jurisdictions that lack the resources to train an autoencoder for classifying conflicts within their own data sets. To be more precise, the conflict data set can be classified first using the developed equations, not the autoencoder model, and then the classified conflicts can be utilized to estimate total and/or severe crashes using the developed models.
With respect to data collection and safety indicators, it is important to note that analyzing vehicle-to-pedestrian conflicts poses more challenges than vehicle-to-vehicle conflicts, primarily because of the difficulty in detecting pedestrians—who are smaller in size—using image processing techniques. In addition, common safety indicators such as TTC may not fully capture the complex interactions that influence road user safety, particularly the immediate reactions between VRUs and drivers. Thus, future research could focus on developing safety indicators that better quantify these immediate reactions for vehicle-to-pedestrian conflicts.
Further work could also evaluate a larger sample and perhaps a wider variety of intersections to make the results more generalizable and facilitate the further development of the statistical models for the crash–conflict relationship and for establishing the AS boundaries. Such research can be enhanced by investigating other ML integration methods.
