Sage Journals: Discover world-class research

Abstract

Existing structures can become seismically vulnerable over time due to various factors including deterioration and outdated or seismic design provisions. Rapid visual screening (RVS) methods are commonly used to quickly filter large building inventories for at-risk structures, typically based on simple visual inspections, such as sidewalk surveys. In a previous study, the authors developed a machine learning (ML)-based RVS method for low-rise reinforced concrete (RC) buildings capable of identifying buildings that are likely to be severely damaged in an earthquake with an accuracy of 71%. However, uncertainty in the model’s predictions remains a concern. This study refines the previously proposed RVS methodology by addressing model uncertainty and minimizing misclassifications. Two primary approaches are proposed: the first analyzes class probabilities from the ML-based screening model to assess the prediction uncertainty rather than relying on the final predicted damage class. With this approach, buildings for which the ML model shows high uncertainty can be prioritized for more detailed evaluation. The second approach aims to optimize the decision threshold used by the ML model to more accurately identify buildings at risk of severe damage. This is done by evaluating the relative cost of misclassifications, low risk buildings identified as high risk (false positives) and high-risk buildings identified as low risk (false negatives). Building on the findings, this paper proposes a comprehensive three-level machine learning-based methodology for enhanced rapid seismic vulnerability assessments.

Keywords

rapid visual screening (RVS)machine learning (ML)model uncertainty threshold optimization misclassification cost

Introduction

Seismic vulnerability assessment of buildings is crucial for protecting lives, minimizing economic losses, and ensuring community resilience in earthquake-prone areas. Such evaluations provide a clear understanding of a building’s likely performance during future seismic events, enabling targeted retrofitting efforts that can prevent catastrophic failures. These assessments also play a vital role in disaster preparedness, supporting informed decision-making for emergency response planning and resource allocation. By investing in seismic vulnerability assessments, building owners contribute to the long-term resilience of their properties, protection of their investments, and help ensure the safety and sustainability of the surrounding community.

However, evaluating each building in a large inventory using traditional analytical methods is both time-consuming and expensive. To tackle this issue, rapid and preliminary vulnerability assessment methods have been developed which aim to quickly identify at-risk buildings through simple visual observations, such as sidewalk surveys (Kassem et al., 2020). These methods are referred to as rapid visual screening (RVS) methods. The usefulness of RVS methods have been demonstrated in recent studies. For instance, Ahmed et al. (2022) employed a method based on a series of survey-type questions addressing the structural and nonstructural qualities of buildings to assess 2900 reinforced concrete (RC) buildings in Northern Algeria and conducted a parametric study which showed consistent results across different building stock typologies. In another study, Purushothama et al. (2023) further validated the effectiveness of the RVS method used by Ahmed et al. (2022) by comparing its results with detailed nonlinear analyses of masonry-infilled RC buildings. In addition to the RVS method used in these studies, numerous others have been proposed in the past few decades including FEMA P-154 (FEMA, 2017), as well as methodologies developed by the Japan Building Disaster Prevention Association (JBDPA) (Japan Building Disaster Prevention Association (JBDPA), 2017), the New Zealand Society for Earthquake Engineering (NZSEE) (NZSEE, 2006), and the National Research Council Canada (NRC) (Fathi-Fazl et al., 2021). These methods calculate a score representing the seismic vulnerability of a given building based on simple features and available information, which is then modified with additional factors. Buildings with scores below a certain threshold score are prioritized for more detailed evaluations. Although these RVS methods based on vulnerability scores are simple and useful, there are challenges in defining a reliable threshold score to identify high-risk buildings. Additionally, the region or country-specific nature of these methods limits their applicability.

In addition to these scoring methods, there are various empirically derived approaches for seismic vulnerability screening. For example, Tesfamariam and Saatcioglu (2008) proposed a risk-based evaluation technique for RC buildings using a knowledge-based fuzzy rule modeling approach. This method includes parameters such as the structural system, irregularities, construction quality, and year of construction to assess seismic vulnerability. Tesfamariam and Saatcioglu (2008) validated the method using seismic damage data gathered from the RC buildings surveyed following the 1994 Northridge Earthquake. Coskun et al. (2020) used ordinary least squares regression and multivariate linear regression techniques to develop an RVS method for RC buildings, conducting a detailed statistical analysis of 545 RC buildings in Turkey (Coskun et al., 2020). Although their method is highly accurate, it requires 14 different input variables for predictions. In contrast, the Priority Index (PI) proposed by Hassan and Sozen (1997) is one of the simplest empirical screening methods (Hassan and Sozen, 1997). Developed for low-rise RC buildings using data from the 1992 Erzincan earthquake, the PI relies on two metrics: wall index and column index, calculated from seven input parameters. These include the number of stories, floor area, and the cross-sectional areas of columns and walls. Although these features are easy to obtain, human judgment is needed to relate PI to damage classes.

Despite the development of numerous RVS methodologies, there remains a need for a robust approach that maximizes the value of earthquake damage data collected from post-earthquake building inspections. Machine learning (ML), with its capacity to advance data-driven methodologies, provides a promising pathway for improving RVS techniques. Arslan et al. (2012) introduced an ANN-based method to evaluate the earthquake performance of RC buildings in Turkey, using 19 parameters such as the number of stories, steel strength, and spectrum intensity. Their study, based on 66 RC buildings (4–10 stories), achieved high accuracy in classifying performance levels according to the Turkish Earthquake Code-2007 (TEC-2007) (Arslan et al., 2012). Zhang et al. (2018) proposed an ML framework to link structural damage patterns to residual collapse capacity, enabling the assessment of structural safety. This framework was validated on a four-story RC frame with high predictive accuracy (Zhang et al., 2018). Mangalathu et al. (2020) applied ML models to predict building damage levels using input features like spectral acceleration, fault distance, and building age, achieving 66% accuracy with data from the 2014 South Napa earthquake (Mangalathu et al., 2020). More recently, Coskun and Aldemir (2023) developed an RVS method for masonry buildings using ensemble learning. Their model incorporated factors such as the number of stories, floor system type, wall material, vertical irregularity, and earthquake zone, and was validated using a dataset of 543 masonry buildings in Turkey (Coskun and Aldemir, 2023). While these methodologies have made significant contributions, there is still a pressing need for approaches that are not constrained by regional or national boundaries.

In a previous study by the authors (Elyasi et al., 2024), an ML-based RVS method was introduced, achieving a damage classification accuracy of 71%. While this performance is promising, further improvement is needed due to the serious consequences of prediction errors. If a vulnerable building is classified as safe, it could lead to major financial losses and even loss of life during an earthquake. On the other hand, incorrectly labeling a safe building as vulnerable may result in unnecessary and costly upgrades. Improving the model accuracy and reducing misclassifications is therefore critical. To encourage building owners and decision-makers to act on the model’s recommendations, it is important to clearly show that the cost of assessment and retrofitting is often much lower than the potential damage and rebuilding costs after an earthquake. A reliable and easy-to-understand framework can help increase confidence in its use. A key drawback of ML models is that they are often black boxes. Therefore, it is crucial to explore how they make predictions and identify cases with uncertainty. Understanding and addressing this uncertainty can make the model more trustworthy and useful in practice. The following sections outline the original RVS approach developed by the authors, explain the motivation for enhancing it, and detail the steps for improving its accuracy, interpretability, and practicality for seismic risk screening.

ML-based RVS

In a previous study by the authors (Elyasi et al., 2024), post-earthquake reconnaissance data from six earthquakes were used to develop a ML-based RVS method for low-rise RC buildings. A total of 658 building samples were obtained from Duzce (1999) (Sim et al., 2016a), Bingol (2003) (Sim et al., 2016a), Nepal (2015) (Shah et al., 2015), Taiwan (2016) (NCREE, 2016), Ecuador (2016) (Sim et al., 2016b; Villalobos et al., 2018), and Pohang (2017) (Sim et al., 2018). The Duzce (1999) dataset includes predominantly one- to four-story RC buildings with a small floor area, while the Bingol (2003) dataset has more three- to five-story buildings with a mix of masonry and RC walls. The Nepal (2015) dataset features mostly three- to six-story buildings, with nearly all having masonry infill walls. The Taiwan (2016) dataset contains two- to four-story buildings, with many having both masonry and RC walls. In the Ecuador (2016) dataset, most buildings are two- or three-story residential buildings with masonry infill walls, while the Pohang (2017) dataset includes three- or four-story RC buildings, mostly with RC walls. The primary goal of the study was to propose an ML model that is not region specific and can predict the level of earthquake damage based on a limited number of easily attainable building features with minimal need for engineering judgment. In this work, a random forest (RF) model was trained to predict the level of damage a building is likely to sustain in an earthquake based on nine input features consisting of seven building geometry parameters (number of stories, floor area, column cross-sectional areas, and the areas of concrete and masonry infill walls in both east-west and north-south directions), and two earthquake intensity measures (modified Mercalli intensity (MMI) and peak ground acceleration (PGA)). It is worth noting that some buildings in the dataset were classified as showing moderate damage in the post-earthquake survey. However, due to the limited number of such buildings and the ambiguity of the descriptions for moderate damage, this damage class was conservatively merged with the severe damage class, simplifying the problem to a binary classification between non-severe, and severe. Removing the moderate damage cases completely would have reduced the size of the dataset. Given that training models based on a small dataset increases the risk of overfitting, the moderate damage cases were retained and grouped with the severe class to be conservative. Descriptions of the two damage classes are provided in Table 1.

Table 1.

Description of structural damage classes (Elyasi et al., 2024; Johnson and Fick, 2018; Sim et al., 2016b).

Damage class	Description
Non-severe	Hairline cracks (<0.25 mm) and/or flexural cracks

Severe	Inclined cracks, spalling, loss of concrete cover or core, reinforcement buckling or fracture, permanent drift in structure visually apparent, shear failure, or failure of any structural element

The RF model operates by constructing a large number of decision trees, each trained on a random subset of the data. A random selection of input features is then considered for splitting nodes in each tree. This process can be repeated many times to build a diverse forest of trees. When making predictions, the output from each tree is aggregated, and the final prediction is determined by majority voting, where the most frequent prediction across all trees is chosen. To evaluate the model, the weighted average F1-score was chosen. This metric accounts for the F1-score of each class, weighted by the number of samples in that class, and averages the scores across all classes. The F1-score is calculated as: $F_{1} = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}$ (1)where precision and recall are defined as: $P r e c i s i o n = \frac{T P}{T P + F P}$ (2) $R e c a l l = \frac{T P}{T P + F N}$ (3)

TP is the number severe samples correctly classified as severe (true positives), FP is the number of non-severe samples incorrectly classified as severe (false positives), and FN is the number of severe samples incorrectly classified as non-severe (false negatives) (Derczynski, 2016; Jardine and Van Rijsbergen, 1971). These accuracy metrics for the RF model are summarized in Table 2. Given the basic nature of the input features and the introduction of earthquake intensity parameters, this generalized model is not constrained by region, and unlike other existing screening methodologies, can be used by individuals without an engineering background. More detailed discussion of the model development and results are available in (Elyasi et al., 2024).

Table 2.

Accuracy of the proposed RVS model (Elyasi et al., 2024).

Mixed dataset
Damage class	Precision	Recall	F1
Non-severe	0.71	0.62	0.66
Severe	0.69	0.77	0.73
Weighted average accuracy: 0.71

The model’s weighted average accuracy of 71%, achieved using only basic building features, is promising but further refinement to minimize misclassifications can enhance the model’s reliability. Enhancing prediction accuracy, particularly for the severe damage class, is vital given the potential consequences. Misclassifying a vulnerable building as safe could result in significant financial losses and pose serious risk to lives during an earthquake, while inaccurately labeling a structurally sound building as at risk would incur additional evaluation and retrofit costs. To effectively persuade building owners to invest in necessary evaluations and retrofitting, it is crucial to develop a robust framework that clearly demonstrates how the costs of assessment and retrofitting can be significantly lower than the potential damages and rebuilding expenses that could arise from an earthquake. This urgency is particularly pronounced in high-risk seismic regions like California, where the probability of a major earthquake in the near future is high. Identifying vulnerable buildings and implementing necessary actions, such as retrofitting, are critical to reducing these risks. However, this can only be achieved by convincing building owners through a reliable, cost-effective, and applicable procedure. Improving the model’s accuracy and reliability will be essential in persuading stakeholders to adopt its recommendations, ultimately leading to safer buildings and more resilient communities.

Study objectives

This study aims to refine the previously proposed RVS methodology by addressing model uncertainty and reducing misclassifications in the severe damage class. Two approaches are proposed to achieve these objectives. The first approach focuses on model uncertainty. Model uncertainty refers to the uncertainty in predictions due to the limitations, assumptions, and approximations inherent in the model’s structure. By analyzing the distribution of probabilities assigned by individual decision trees in the RF classifier to a building being in the severe damage class rather than relying solely on final predicted label, a more nuanced understanding of the model uncertainty can be obtained. This involves reducing the maximum depth of decision trees in the RF classifier by limiting the longest path from the root node to a leaf node in the tree. This reduction helps avoid prediction purity, a condition where a leaf node contains samples that all belong to the same class, leading to more confident classifications that may indicate overfitting. By restricting the depth, class probabilities can be calculated from each decision tree instead of predicting a single class label. The overall probability for each damage class is then averaged across all trees in the forest. This strategy helps identify low, moderate, and high uncertainty levels in predictions. Buildings with high prediction uncertainty should be prioritized for further inspection and analysis, while those with low uncertainty can be classified with greater confidence. By focusing on probability distributions, the methodology enhances the reliability of seismic vulnerability assessments and helps mitigate potential risks associated with misclassifications. The second approach aims to optimize the decision threshold of the classifier to enhance model reliability by reducing misclassifications. This is performed by considering the relative cost of misclassifying low risk buildings as high risk and those at high risk as low risk. A wide range of cost ratios are used to estimate the total misclassification cost and the effect of changing the decision threshold on prediction accuracy and the cost of misclassification is examined. Finally, the two refinement approaches are incorporated into the original ML model to propose a comprehensive three-level ML-based RVS methodology.

Model uncertainty detection

Understanding and evaluating uncertainty is crucial for assessing the reliability of an ML model’s predictions. There are two main types of uncertainty: data uncertainty and model uncertainty. Data uncertainty comes from the natural variation or noise in the data itself. This type of uncertainty is caused by factors that the model cannot account for such as measurement errors, incomplete data, or randomness in data collection. In this study, differences in the quality or reporting standards of the six earthquake datasets used for training and testing the model represent examples of data uncertainty. Differences in how accurately the geometric features were measured (e.g., floor area, column and wall areas), and how the inspectors judged the level of damage for buildings in these datasets are also notable sources of data uncertainty. To help reduce the data uncertainty in this study, damage labels were cross-checked against available building photos when a sufficient number of clear images existed. However, this was not possible for a small number of buildings, as the dataset was compiled from post-earthquake surveys conducted by different teams in various regions and photographs were sometimes limited in number and/or scope. Since the dataset was already limited in size, removing the samples without visual evidence could further reduce the data and increase the risk of overfitting. Therefore, it was assumed that the labels assigned by inspectors were accurate in such cases. Model uncertainty, on the other hand, comes from the limitations within the model itself. It can be caused by factors like the model’s structure, assumptions made during its development, or the limited amount of training data. In the case of the RVS methodology developed by the authors, one source of model uncertainty is the fact that the model makes predictions based on a small set of coarse geometric features that do not directly represent the structural design or condition. Additionally, the model may not have clear decision boundaries for classification. For example, if the model encounters a building with a unique combination of features, it may be unsure whether to classify it as severe or non-severe, especially if there are no similar examples in the training data. Given that addressing data uncertainty is often difficult due to limited control over data quality and availability, this study focuses primarily on addressing model uncertainty.

In the ML-based RVS model developed by the authors (Elyasi et al., 2024), misclassifying a building that is likely to sustain non-severe damage as severe may simply lead to higher costs associated with a more detailed inspection and evaluation, but incorrectly predicting a building at risk to be severely damaged in an earthquake as non-severe poses significant safety risks and could result in substantial losses in future earthquakes. Therefore, addressing the uncertainty in the classifier’s predictions is essential for mitigating these risks. To address this uncertainty, the probabilities assigned to both damage classes (non-severe and severe) by the RF classifier during prediction are examined rather than relying solely on the final predicted label. Figure 1 shows how these probabilities are calculated, providing an overview of the RF classifier’s structure. 100 trees were adopted for the RF classifier. This is a commonly used number that offers a good balance between performance and computational efficiency.

Figure 1.

The process of calculating damage class probabilities in the RF classifier.

Typically, the RF classifier makes predictions based on the class with the highest probability, using a decision threshold of 0.5 for binary classification. If the probability of a building being severely damaged exceeds this decision threshold, it is classified as severe; otherwise, it is categorized as non-severe. By focusing on probabilities rather than final labels, this approach moves beyond simple categorization of potential earthquake damage as severe or non-severe. It provides a more detailed understanding of the likelihood of each class and improves the overall assessment process by enabling more informed and careful decision-making regarding the building’s vulnerability. To further illustrate this point, three examples are presented.

Table 3 outlines the characteristics of three dataset samples where the model incorrectly classified the damage class. The assigned probabilities for each class (non-severe and severe), the predicted damage class based on a 0.5 decision threshold, and the actual observed damage are provided in Table 4. This approach allows for a deeper examination of the model’s uncertainty.

Table 3.

Characteristics of the three buildings listed in Table 4.

Sample	Earthquake	No. of stories	Floor area (m²)	Column area (m²)	Concrete wall area (East-West) (m²)	Concrete wall area (North-South) (m²)	Masonry wall area (East-West) (m²)	Masonry wall area (North-South) (m²)	MMI	PGA
Building 1	Ecuador (2016)	4	148	2.6	0	0	3	4.8	7.3	0.32 g
Building 2	Duzce (1999)	1	77	0.72	0	0	0	0	8.4	0.46 g
Building 3	Pohang (2017)	4	246.25	3.24	0.416	1.05	0	0	6.2	0.26 g

Table 4.

Predicted class probabilities and labels for three test set samples.

Sample	Probability of being in the non-severe damage class (1-P)	Probability of being in the severe damage class (P)	Predicted damage class	Actual damage class
Building 1	0.09	0.91	Severe	Non-severe
Building 2	0.87	0.13	Non-severe	Severe
Building 3	0.51	0.49	Non-severe	Severe

In the case of Building 1, the model predicts the probability of severe damage as 0.91, giving it a final label of severe, however, the actual damage was non-severe. This is a conservative but incorrect prediction by the model. Conversely, for Building 2, the model yielded a probability of severe damage of 0.13, classifying it as non-severe while the actual damage in the earthquake was severe. Lastly, Building 3 was given a probability of severe damage of 0.49, marginally below the 0.5 threshold and resulting in a final label of non-severe earthquake damage when the actual damage was severe. In a real application of the ML-based RVS model, the mislabeling of Building 1 would simply lead to detailed analysis of the building and possibly seismic retrofits. In contrast, the false predictions for Buildings 2 and 3 could mean significant losses in a future earthquake and potentially even put lives at risk. Cases like Building 2, where the predicted probability of severe damage is much lower than that of non-severe damage, reflect confident errors by the model and can be considered a part of its limitations. These errors can be attributed to the inherent simplicity of RVS, which is intended as a preliminary assessment and relies on a limited number of basic building parameters and is therefore reasonably expected to yield a certain level of prediction errors. Incorporating additional features such as structural characteristics during the screening process can help reduce these incorrect predictions, but it may also increase the complexity of the assessment. On the other hand, in cases like Building 3 where the probability of severe damage is high and close to the threshold of 0.5, it may be prudent to exercise judgment and classify such buildings as at risk of severe damage. These examples clearly illustrate that relying on predicted probabilities rather than solely on the final predicted label can lead to better decisions.

To address uncertainty in predictions for cases like Building 3, the maximum depth of the decision trees is reduced to avoid reaching final label predictions. In a RF model, decision trees typically continue to split until they reach leaf nodes that assign a definitive class label. By limiting the maximum depth, the trees are prevented from growing too complex and forcing a final classification at the leaf nodes. This adjustment allows the calculation of probabilities for each damage class from every tree in the forest, rather than just predicting a single class label. The probability distribution of a building being in the severe damage class can be visualized using histograms, where the height of each bar reflects the frequency (count) of probability estimates provided by the trees in the forest for a specific range (bin). A curve can be fitted to the histogram to represent a continuous probability density function, offering a visual representation of the overall shape of the probability distribution. In thi study, a margin of 10% around 0.5 is selected in order to focus on cases like Building 3, so the test set samples with predicted probabilities of being in the severe class between 0.4 and 0.6, obtained from the original classifier, are examined. This margin can be adjusted according to user needs. Figure 2 illustrates three types of probability distributions for a building being in the severe damage class using the decision tree probabilities. The red dashed vertical line indicates the mean probability, which is the average of the probabilities predicted by all trees in the forest for a given building being classified as severely damaged.

Figure 2.

Three types of probability distributions for a sample from the severe damage class: (a) low uncertainty; (b) moderate uncertainty; (c) high uncertainty.

Figure 2(a) shows a typical probability distribution with low uncertainty. The probability mass is concentrated around the mean probability, leading to a clear final class prediction. In contrast, in the probability distribution shown in Figure 2(b) the probability mass is focused around two distinct points rather than around the mean probability. This bimodal distribution suggests moderate uncertainty, as the prediction is less clear. Lastly, the probability distribution in Figure 2(c) is more uniform, indicating that the probability mass is spread out rather than concentrated around the mean probability. This implies high uncertainty in the prediction, making the final class significantly less certain.

Threshold optimization

In the context of ML classification, a decision threshold is a specific probability value that determines how predicted probabilities from a model are converted into class labels. However, a default threshold of 0.5 may not always be optimal for every classification problem. Adjusting the threshold can enhance model performance and reduce misclassifications, especially when the consequences of misclassifying one class are more critical than another. This can make the model more sensitive to important classes, thereby reducing the impact of costly errors. In this study, optimization of the decision threshold is explored as a means to address model uncertainty and minimize misclassifications. To identify this optimal threshold, the decision threshold is changed between 0 and 1 at increments of 0.05 and the cost of misclassifications is estimated for the predictions made based on each threshold. The total misclassification cost is calculated as follows: $T o t a l m i s c l a s s i f i c a t i o n c o s t = N_{F P} \times C_{F P} + N_{F N} \times C_{F N}$ (4)Where $N_{FP}$ and $C_{FP}$ , respectively, are the number of cases and the cost associated with low risk buildings incorrectly classified as high risk of severe earthquake damage (false positives) while $N_{FN}$ and $C_{FN}$ are the number of cases and cost associated with high risk buildings misclassified as low risk (false negatives). The value of $C_{FP}$ includes expenses such as inspections, analyses, temporary housing, and business interruption while $C_{FN}$ would encompass costs related to severe earthquake damage including loss of life, building repair or reconstruction, and broader social impacts such as disruptions to the economy, education, healthcare, and the displacement of communities. It is important to note that the values of $C_{FP}$ and $C_{FN}$ are not universal; they vary based on regional factors and specific building characteristics. Therefore, ranges of relative costs are considered instead of precise values in this study, but values can be specified by users according to their particular goals and the unique characteristics of their region. This cost-based adjustment of the decision threshold ensures optimizing the seismic screening through trade-offs between the costs of false positives and false negatives, ultimately leading to more robust and cost-effective predictions in seismic vulnerability assessments.

Due to the focus of this study on accurately predicting the severe damage class, an accuracy metric is required to assess the performance of the severe class as the threshold varies. To maintain consistency with the original RF classifier evaluation, the F1-score is used to measure the accuracy of the severe class with the following modification to the definition of precision to incorporate $C_{FP}$ and $C_{FN}$ : $P r e c i s i o n = \frac{T P}{T P + \frac{C_{F P}}{C_{F N}} . F P}$ (5)

The objective of this approach is to identify the optimal decision threshold that minimizes misclassification cost while maximizing or achieving high accuracy for the severe damage class.

Results and discussion

Results on model uncertainty

In analyzing model uncertainty using the dataset from the previous study (Elyasi et al., 2024), a 5-fold cross-validation technique is employed. All 658 buildings in the original dataset are used in the cross-validation. For each fold, the dataset is divided into five subsets, with four subsets used for training and one subset used as the test set. The test set in each fold consisted of roughly 132 buildings. Given the impracticality of showing probability distributions for every building, four representative example buildings are presented for each level of uncertainty. These examples were carefully selected from the test set to ensure that all possible scenarios within their respective uncertainty levels are represented. The characteristics of these buildings are detailed in Table 5. Figures 3 –5 illustrate representative building examples of different levels of uncertainty.

Table 5.

Characteristics of example buildings described in Figures 3 –5.

Uncertainty level	Example building number	No. of stories	Floor area (m²)	Column area (m²)	Concrete wall area (East-West) (m²)	Concrete wall area (North-South) (m²)	Masonry wall area (East-West) (m²)	Masonry wall area (North-South) (m²)	MMI	PGA (g)
Low	1	4	192	2.84	0	0	0	3.4	8.1	0.31
	2	3	235	1.8	0	0	0	2.7	7.3	0.32
	3	2	175	1.1	0	0	2.7	3.1	7.6	0.46
	4	4	134	1.87	0.54	0.79	0	0	7.7	0.48
Moderate	5	1	114	1	0	0	0.9	1.6	7.3	0.32
	6	2	300	3.7	0	0	3	3.2	7.4	0.34
	7	3	124	3.75	0.2	0.11	0	0	7.7	0.48
	8	4	193	3	0	0	0	5	6.5	0.16
High	9	2	317	2.3	0	0	3.1	3.3	7.8	0.52
	10	5	830	8	0	9.6	0	0	6.5	0.17
	11	2	366	3.1	0	0	2	5.9	7.4	0.34
	12	3	650	3.9	1.8	8.6	1.8	2.4	6.2	0.25

Figure 3.

Example buildings for low uncertainty. Example Building (1) Actual damage class: Severe. Example Building (2) Actual damage class: Severe. Example Building (3) Actual damage class: Non-Severe. Example Building (4) Actual damage class: Non-Severe.

Figure 4.

Example buildings for moderate uncertainty. Example Building (5) Actual damage class: Severe. Example Building (6) Actual damage class: Severe. Example Building (7) Actual damage class: Non-Severe. Example Building (8) Actual damage class: Non-Severe.

Figure 5.

Example buildings for high uncertainty. Example Building (9) Actual class: Severe. Example Building (10) Actual class: Severe. Example Building (11) Actual class: Non-Severe. Example Building (12) Actual class: Severe.

Low Uncertainty (Figure 3)

In scenarios of low uncertainty, the probability distribution is sharply concentrated around the mean probability or a value very close to it. This concentration allows for a more confident classification based on the mean probability. However, despite this high confidence, some predictions may still be incorrect due to the model’s inherent error. This error arises because there will always be a margin of misclassification due to factors such as data uncertainty and inherent variability in the dataset. In the RF model, errors can also result from its simplicity and the fact that seismic vulnerability of a building is influenced by many other parameters not typically accounted for in RVS methodologies. For instance, Example Buildings 1 and 2 have mean probabilities of approximately 0.6 and 0.68, respectively, leading to their classification as severe since they exceed the threshold of 0.5. In contrast, Example Building 3, with a mean probability around 0.35, falls below this threshold and is therefore classified as non-severe. Example Building 4 presents a more complex case with its mean probability is about 0.58. This results in a final classification of severe, even though its actual damage class is non-severe. This highlights the inevitable misclassification errors that can occur in the model, even in situations of low uncertainty.

Moderate Uncertainty (Figure 4)

Example Buildings 5-8 exhibit probability distributions concentrated around two distinct points rather than a single mean probability. This bimodal distribution pattern introduces complications in classification and reduces the model’s confidence in its predictions. In each example building, one concentration point lies below the threshold of 0.5, while the other exceeds it, and both are positioned at a distance from the mean probability. These two-point concentrations indicate that the classifier’s decision-making is not straightforward, making it difficult to rely solely on mean probabilities for accurate predictions.

High Uncertainty (Figure 5)

As indicated by the probability distributions the damage state predictions for Example Buildings 9-12 show high uncertainty. The probability distributions are nearly uniform, meaning that the model provides nearly equal likelihoods for both non-severe and severe classifications. This signifies that the classifier is uncertain and struggles to differentiate between the classes. In such cases, it becomes nearly impossible to make a definitive classification, and detailed evaluation is required to resolve the ambiguity. To effectively manage cases of moderate and high uncertainty, it is essential to categorize these samples separately for further action.

Determining the uncertainty level of damage class predictions for buildings based on probability distributions is an effective method to prioritize high uncertainty cases and enhance the RVS method. Among the approximately 132 buildings in each test set per fold, nearly 25% were decisions with limited confidence where the probability of falling in the severe damage class ranged between 0.4 and 0.6. The mean probabilities of these cases were examined but no clear patterns or trends was found across the folds for misclassified samples. This highlights the importance of evaluating the quality of the complete probability distribution, which provides a more clear picture of the uncertainty than relying on mean probability values alone. In the end, these buildings can be individually examined more closely to assess the uncertainty in their predicted classifications which can considerably improve the model performance. However, manually analyzing and categorizing all these distributions based on judgment can be a tedious process. Therefore, the alternative approach of threshold optimization is discussed in the next section.

Results on threshold optimization

To address model uncertainty in classification, an alternative approach involves adjusting the decision threshold. Instead of using the traditional value of 0.5 for classification, an optimal threshold is determined to better suit the proposed classifier and its purpose. This adjustment is particularly important for reducing misclassifications in the severe damage class, which has more significant adverse consequences compared to misclassifications in the non-severe damage class. The goal is to select a threshold that minimizes the cost of false predictions while maintaining high accuracy for the model, especially for the severe damage class. To find this optimal threshold, a cost sensitivity analysis is conducted, considering the total cost of misclassifications by the model. For each fold of the 5-fold cross-validation, the model is trained on the corresponding training set from the dataset used in the previous study (Elyasi et al., 2024). During training, cost coefficients, $C_{FP}$ and $C_{FN}$ , are applied as weights to samples in the non-severe and severe damage classes, respectively. These weights adjust the model’s focus, guiding the decision trees to prioritize samples from the class with a higher associated cost. This ensures the model learns to account for the varying costs of false positives and false negatives during training. In the context of seismic vulnerability assessments, $C_{FN}$ typically carries greater weight than $C_{FP}$ . In regions at high risk of significant earthquakes where such assessments are conducted, the potential consequences of building collapse or severe damage are substantial. Therefore, it is reasonable to consider $C_{FN}$ as equal to or greater than $C_{FP}$ in these scenarios. Ratios higher than 1:5 are not considered because when $C_{FN}$ is significantly greater than $C_{FP}$ , the optimal threshold tends to be very close to zero which indicates that the cost of incorrectly predicting a severely damaged building as non-severe (false negative) is so high that there is no value in RVS methods and a detailed seismic evaluation and retrofit is warranted for every single building in the population.

In this study, a range of $C_{FP}$ to $C_{FN}$ ratios (1:1, 1:1.5, 1:2, 1:2.5 up to 1:5) were explored to estimate the total misclassification cost instead of using exact cost figures since the data encompasses different regions and various building characteristics. For this purpose, a fixed value of $C_{FP}$ = 1 was assumed and $C_{FN}$ was varied accordingly (i.e., 1, 1.5, 2, …, 5). These values were then substituted into equation (4) to compute the total misclassification cost. This method yields normalized or relative cost values which are sufficient and appropriate for comparing the impact of different thresholds and misclassification trade-offs. For each $C_{FP}$ to $C_{FN}$ ratio, the trained model was evaluated on a test set using decision thresholds varying from 0 to 1 in increments of 0.05. The total misclassification cost and the F1-score for prediction accuracy of the severe damage class were computed for each case. These results are plotted in Figure 6. The optimal threshold is defined as the point where the misclassification cost is minimized and the F1-score for the severe damage class is maximized or is sufficiently high. The plots in Figure 6 provide a visual representation of how the decision threshold impacts the misclassification cost and the F1-score for the severe damage class, aiding in the identification of the optimal threshold. Each plot displays two curves: one representing the misclassification cost and the other representing the F1-score for the severe damage class. Vertical dashed lines highlight the threshold values where the misclassification cost is minimized and where the F1-score for the severe damage class is maximized. In most cases, the threshold that maximizes the prediction accuracy also minimizes the misclassification cost. In general, as the $C_{FP}$ to $C_{FN}$ ratio decreases, the optimum threshold decreases which means the ML model will predict more buildings to be severely damaged to reduce costs associated with false negatives. This also means that the model will yield more false positives (low risk buildings identified as high risk) but it will increase the F1-score for the severe damage class.

Figure 6.

Threshold optimization based on misclassification cost and F1-score for the severe damage class, evaluated at cost ratios of: (a) 1:1; (b) 1:1.5; (c) 1:2; (d) 1:2.5; (e) 1:3; (f) 1:3.5; (g) 1:4; (h) 1:4.5; (i) 1:5.

For a 1:1 cost ratio, the threshold value with minimum misclassification cost is 0.5. However, the F1-score for the severe damage class is maximized at the threshold of 0.45. As the cost ratio increases, the threshold that minimizes misclassification cost, and maximizes F1-score for the severe damage class, decreases, approaching zero. This adjustment improves the F1-score because when the cost of false negatives is higher, the model lowers the threshold to classify more buildings as severely damaged. This reduces the number of severe damage cases that are incorrectly classified as non-severe (false negatives), thereby minimizing the overall misclassification cost. The optimal thresholds for cost ratios of 1:1.5, 1:2, 1:2.5, and up to 1:5 are presented in Table 6. For the 1:5 cost ratio, a threshold of 0.1 minimizes the misclassification cost. The F1-score is close but not at its maximum with this threshold. This occurrence, observed for cost ratios of 1:1 and 1:5, is because the metrics for total misclassification cost and F1-score for the severe damage class prioritize different objectives, even though cost coefficients are incorporated into their calculations. Misclassification cost minimization focuses on reducing the overall weighted penalty for false positives and false negatives which is directly influenced by the cost ratio. The F1-score, however, balances cost-weighted precision and recall, which may result in a slightly different threshold. For cost ratios exceeding 1:5 (i.e., cost of false negatives five times greater than that of false positives), the optimal threshold approaches zero which implies that almost all buildings should be subject to detailed inspection and seismic evaluation because the cost of false negatives is so significant.

Table 6.

Comparison of false positives and false negatives at optimal thresholds versus the conventional threshold of 0.5 for different cost ratios.

Cost ratio	Conventional threshold of 0.5		Optimal threshold			Increase in number of false positives	Decrease in number of false negatives
Cost ratio	Number of false positives	Number of false negatives	Selected threshold	Number of false positives	Number of false negatives	Increase in number of false positives	Decrease in number of false negatives
1:1.5	22	20	0.40	30	11	8	9
1:2			0.35	33	9	11	11
1:2.5			0.3	36	7	14	13
1:3			0.25	40	4	18	16
1:3.5			0.2	45	3	23	17
1:4 & 1:4.5			0.15	48	2	26	18
1:5			0.1	52	1	30	19

To assess how the optimal decision threshold compares to the conventional threshold of 0.5, the modified ML-based RVS model was again tested using 5-fold cross validation. The dataset, consisting of 658 buildings, was divided into five equal subsets. In each fold, four subsets were used for training and one for testing, resulting in approximately 132 buildings in the test set per fold. This process was repeated five times, ensuring that each building was tested once. The results were then averaged across all five folds to provide a comprehensive evaluation. The primary goal was to determine whether adjusting the threshold improves the performance of the ML model, particularly in accurately detecting buildings likely to be severely damaged. Table 6 presents the number of non-severe samples incorrectly classified as severe (false positives) and the number of severe samples incorrectly classified as non-severe (false negatives) for both the optimal thresholds and the conventional threshold of 0.5. When the optimal threshold is used to screen the buildings instead of 0.5, a notable reduction in the number of false negatives is observed, thereby improving the accuracy for the severe class. However, this improvement is accompanied by an increase in the number of false positives. For cost ratios of 1:1.5, 1:2, and 1:2.5, using the optimal threshold results in a reduction in false negatives that is approximately balanced by an increase in false positives. Conversely, for higher cost ratios with lower optimal thresholds, since the cost of misclassifying a severe case is equal to or greater than three times that of a non-severe case, adopting the optimal threshold over the conventional 0.5 is still considered an improvement. For example, with a cost ratio of 1:4, using the optimal threshold of 0.15 instead of 0.5 removes 18 out of 20 false negatives, correctly classifying these buildings as severe. However, this change also results in 26 more false positives. Given that the misclassification cost of each false negative is four times that of each false positive, this adjustment still enhances the classifier’s performance. Notably, at a cost ratio of 1:5 with an optimal threshold of 0.1, nearly all false negatives are eliminated, significantly improving the classifier’s accuracy in identifying severe cases.

By adopting this threshold optimization and cost sensitivity analysis approach, the model ensures that the decision-making process in seismic vulnerability assessments is both cost-effective and robust, particularly in minimizing the severe damage class misclassifications. This approach effectively manages the trade-offs between the costs of false positives and false negatives.

A three-level ML-based RVS framework

The RVS approach previously proposed by the authors employed an RF model based on basic building characteristics including the number of stories, floor area, column cross-sectional area, the areas of concrete and masonry infill walls in both east-west and north-south directions. MMI and PGA were introduced as earthquake intensity features to improve the prediction accuracy. Full details of the model are available in Elyasi et al. (2024). A total of 658 building samples collected from Duzce (1999) (Sim et al., 2016a), Bingol (2003) (Sim et al., 2016a), Nepal (2015) (Shah et al., 2015), Taiwan (2016) (NCREE, 2016), Ecuador (2016) (Sim et al., 2016b; Villalobos et al., 2018), and Pohang (2017) (Sim et al., 2018) were used for training and evaluation of the model which showed promising results. The current study introduced further refinements aimed at minimizing the model’s misclassifications. This section presents a three-level ML-based RVS framework that offers users the flexibility to choose between the original method and two enhanced alternatives, based on their specific needs and available resources for conducting an RVS. It is worth noting that while the proposed enhancements improve the model’s reliability, the original ML model without them is still a robust method with 71% accuracy.

Level 1: Initial rapid assessment

This level involves the application of the RVS previously proposed by the authors (Elyasi et al., 2024). Utilizing an RF classifier, users can rapidly identify low-rise RC buildings at high risk of severe damage. This step is recommended when resources are limited, aiming for a quick assessment due to budget and time constraints. This initial assessment provides a reasonable accuracy of approximately 71%, allowing for a rapid evaluation.

Level 2: Enhanced assessment with model uncertainty detection

For users with sufficient budget and time, and who seek a more accurate assessment, this level includes performing the model uncertainty detection proposed in this study. By analyzing the probability distribution of predictions, it becomes possible to identify the level of uncertainty in the prediction. Buildings with moderate or high uncertainty should be categorized for further actions, such as detailed inspections or additional analyses, with a priority given to those with high uncertainty. The decision to investigate both groups or only those with high uncertainty should be based on the available resources, time, and regional conditions.

Level 3: Threshold optimization for misclassification cost reduction

This level incorporates decision threshold optimization of the classifier to account for the cost of misclassifications, making it the most conservative of the three levels. Instead of using the conventional threshold of 0.5, an optimal threshold is determined to minimize the cost of false predictions by the model while maintaining high accuracy, especially for the severe damage class. Real costs or the relative cost of false positives ( $C_{FP})$ and false negatives ( $C_{FN})$ can be used to assign values to misclassifications. These coefficients are applied as weights to the samples in the non-severe and severe damage classes, guiding the model to prioritize samples from the class with a higher associated cost. The primary goal is to minimize the misclassification of the severe class which necessitates sufficient resources for a larger number of buildings to be detected as probable to be severely damaged. This comprehensive approach ensures a more cautious and thorough evaluation, aimed at significantly reducing the risk of overlooking severely damaged buildings. The enhanced screening approach is illustrated in Figure 7, highlighting the step-by-step process for each level.

Figure 7.

Flowchart of three-level ML-based RVS framework.

The enhanced RVS framework not only improves prediction accuracy but also provides a scalable and practical solution for large-scale seismic screening, ultimately supporting better structural safety outcomes. For example, at Level 2 of the framework, by applying a 10% margin around the 0.5 decision threshold, buildings with decisions based on limited confidence can be further assessed for prediction uncertainty. This level of screening only requires basic understanding of probability and statistics. Although there is some subjectivity in distinguishing the level of uncertainty (low, moderate, severe) from probability distributions, the point of Level 2 screening is to qualitatively examine the level of uncertainty in the prediction not to exactly classify or quantify it. Finally, Level 3 aims to further improve the prediction accuracy by optimizing the decision threshold based on the cost of misclassifications. This would require expertise in estimating costs associated with seismic upgrades and losses. Determining appropriate cost ratios for false positives and false negatives is a challenge as these ratios are inherently context-dependent, varying significantly based on regional economic conditions, building characteristics, seismic hazard levels, and stakeholder priorities. Therefore, Level 3 screening with an optimized threshold requires input from trained professionals and decision makers. Future research can focus on the development of an adaptable framework for cost ratio determination that considers the diverse and region-specific factors influencing seismic vulnerability assessments. To further enhance adaptability, methodologies that incorporate dynamic cost ratio adjustments based on real-time data should be explored.

Conclusions

This study presented a comprehensive approach to RVS through a three-level ML-based methodology, emphasizing the importance of model uncertainty detection and threshold optimization. The initial ML-based seismic screening model developed by the authors can identify low-rise reinforced concrete buildings likely to be severely damaged in earthquakes with 71% accuracy, but two enhancements were investigated in this work to provide a more nuanced understanding of prediction confidence and improve its performance. The first enhancement involved calculating probabilities for each damage class from every tree in the forest, rather than predicting a single class label. This allowed for visualizing the probability distribution for a building belonging to each damage class, offering clearer insights into the model’s predictions. Buildings were then categorized into low, moderate, or high uncertainty groups based on these distributions. This categorization helps prioritize further detailed investigations for buildings with moderate to high uncertainty. This approach is essential in ensuring that the potential model uncertainty is addressed, particularly for structures at greater risk, thereby enhancing overall safety. Finally, the threshold optimization approach refined the decision-making process by considering the relative costs associated with misclassifications. This strategy was vital for reducing the misclassification of buildings at high risk of severe damage as low risk, as such errors could result in substantial financial and safety implications. By evaluating misclassification costs across various decision thresholds for the classifier, this methodology improved the overall reliability of seismic vulnerability assessments. The enhanced three-level ML-based RVS methodology offers a structured and effective framework for accurately identifying buildings at risk of severe damage. By systematically addressing model uncertainty and optimizing decision thresholds, this framework helps improve prediction outcomes while being mindful of the associated costs, ultimately contributing to safer built environments and informed decision-making for stakeholders involved in seismic risk management.

Footnotes

ORCID iDs

Niloofar Elyasi

Eugene Kim

Author contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Niloofar Elyasi and Eugene Kim. The first draft of the manuscript was written by Niloofar Elyasi and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant (RGPIN-2023-03729).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The datasets generated during and/or analysed during the current study are available on Datacenterhub at https://datacenterhub.org/. The code and machine learning models developed by the authors are also available on GitHub at .

References

Ahmed

Abarca

Perrone

, et al. (2022) Large-scale seismic assessment of RC buildings through rapid visual screening. International Journal of Disaster Risk Reduction 80: 103219.

Arslan

Ceylan

Koyuncu

(2012) An ANN approaches on estimating earthquake performances of existing RC buildings. Neural Network World 22(5): 443–458.

Coskun

Aldemir

(2023) Machine learning network suitable for accurate rapid seismic risk estimation of masonry building stocks. Natural Hazards 115(1): 261–287.

Coskun

Aldemir

Sahmaran

(2020) Rapid screening method for the determination of seismic vulnerability assessment of RC building stocks. Bulletin of Earthquake Engineering 18: 1401–1416.

Derczynski

(2016) Complementarity, F-score, and NLP evaluation. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), Portorož, Slovenia, pp. 261–266. European Language Resources Association (ELRA).

Elyasi

Kim

Yeum

(2024) A machine-learning-based seismic vulnerability assessment approach for low-rise RC buildings. Journal of Earthquake Engineering 28(3): 760–776.

Fathi-Fazl

Cai

Jacques

, et al. (2021) Methodology for seismic risk screening of existing buildings in Canada: structural scoring system. Canadian Journal of Civil Engineering 48(3): 250–262.

FEMA (2017) Rapid Visual Screening of Buildings for Potential Seismic Hazards: A Handbook. Government Printing Office.

Hassan

Sozen

(1997) Seismic vulnerability assessment of low-rise buildings in regions with infrequent earthquakes. ACI Structural Journal 94(1): 31–39.

10.

Japan Building Disaster Prevention Association (JBDPA) (2017) Standard for evaluation of seismic capacity of existing reinforced concrete buildings.

11.

Jardine

van Rijsbergen

(1971) The use of hierarchic clustering in information retrieval. Information Storage and Retrieval 7(5): 217–240.

12.

Johnson

Fick

(2018) Generalized trends of severe damage observed from building surveys of seven different earthquakes. In: Eleventh US National Conference on Earthquake Engineering, Los Angeles, CA, 25–29 June 2018.

13.

Kassem

Mohamed Nazri

Noroozinejad Farsangi

(2020) The seismic vulnerability assessment methodologies: a state-of-the-art review. Ain Shams Engineering Journal 11(4): 849–864.

14.

Mangalathu

Sun

Nweke

, et al. (2020) Classifying earthquake damage to buildings using machine learning. Earthquake Spectra 36(1): 183–208.

15.

NCREE (2016) 2016 Taiwan (Meinong) earthquake. Available at. https://datacenterhub.org

16.

NZSEE (2006) Assessment and improvement of the structural performance of buildings in earthquake. In: Recommendations of a NZSEE Study Group on Earthquake Risk Buildings.

17.

Purushothama

Mucedero

Perrone

, et al. (2023) Evaluation of rapid visual screening assessment of existing buildings using nonlinear numerical analysis. Journal of Building Engineering 76: 107110.

18.

Shah

Pujol

Puranam

, et al. (2015) 2015 Nepal earthquake building performance database. Available at. https://datacenterhub.org

19.

Sim

Song

Skok

, et al. (2016a) Database of low-rise reinforced concrete buildings with earthquake damage. Available at. https://datacenterhub.org

20.

Sim

Villalobos

Smith

, et al. (2016b) Performance of low-rise reinforced concrete buildings in the 2016 Ecuador earthquake. Purdue University Research Repository. (Accessed 28 October 2017).

21.

Sim

Laughery

Chiou

, et al. (2018) 2017 Pohang Earthquake—Reinforced Concrete Building Damage Survey; DEEDS. Lafayette, IN, USA: Purdue University Research Repository. Available at. https://datacenterhub.org

22.

Tesfamariam

Saatcioglu

(2008) Risk-based seismic evaluation of reinforced concrete buildings. Earthquake Spectra 24(3): 795–821.

23.