Abstract
Keywords
Introduction
Seismic vulnerability assessment of buildings is crucial for protecting lives, minimizing economic losses, and ensuring community resilience in earthquake-prone areas. Such evaluations provide a clear understanding of a building’s likely performance during future seismic events, enabling targeted retrofitting efforts that can prevent catastrophic failures. These assessments also play a vital role in disaster preparedness, supporting informed decision-making for emergency response planning and resource allocation. By investing in seismic vulnerability assessments, building owners contribute to the long-term resilience of their properties, protection of their investments, and help ensure the safety and sustainability of the surrounding community.
However, evaluating each building in a large inventory using traditional analytical methods is both time-consuming and expensive. To tackle this issue, rapid and preliminary vulnerability assessment methods have been developed which aim to quickly identify at-risk buildings through simple visual observations, such as sidewalk surveys (Kassem et al., 2020). These methods are referred to as rapid visual screening (RVS) methods. The usefulness of RVS methods have been demonstrated in recent studies. For instance, Ahmed et al. (2022) employed a method based on a series of survey-type questions addressing the structural and nonstructural qualities of buildings to assess 2900 reinforced concrete (RC) buildings in Northern Algeria and conducted a parametric study which showed consistent results across different building stock typologies. In another study, Purushothama et al. (2023) further validated the effectiveness of the RVS method used by Ahmed et al. (2022) by comparing its results with detailed nonlinear analyses of masonry-infilled RC buildings. In addition to the RVS method used in these studies, numerous others have been proposed in the past few decades including FEMA P-154 (FEMA, 2017), as well as methodologies developed by the Japan Building Disaster Prevention Association (JBDPA) (Japan Building Disaster Prevention Association (JBDPA), 2017), the New Zealand Society for Earthquake Engineering (NZSEE) (NZSEE, 2006), and the National Research Council Canada (NRC) (Fathi-Fazl et al., 2021). These methods calculate a score representing the seismic vulnerability of a given building based on simple features and available information, which is then modified with additional factors. Buildings with scores below a certain threshold score are prioritized for more detailed evaluations. Although these RVS methods based on vulnerability scores are simple and useful, there are challenges in defining a reliable threshold score to identify high-risk buildings. Additionally, the region or country-specific nature of these methods limits their applicability.
In addition to these scoring methods, there are various empirically derived approaches for seismic vulnerability screening. For example, Tesfamariam and Saatcioglu (2008) proposed a risk-based evaluation technique for RC buildings using a knowledge-based fuzzy rule modeling approach. This method includes parameters such as the structural system, irregularities, construction quality, and year of construction to assess seismic vulnerability. Tesfamariam and Saatcioglu (2008) validated the method using seismic damage data gathered from the RC buildings surveyed following the 1994 Northridge Earthquake. Coskun et al. (2020) used ordinary least squares regression and multivariate linear regression techniques to develop an RVS method for RC buildings, conducting a detailed statistical analysis of 545 RC buildings in Turkey (Coskun et al., 2020). Although their method is highly accurate, it requires 14 different input variables for predictions. In contrast, the Priority Index (PI) proposed by Hassan and Sozen (1997) is one of the simplest empirical screening methods (Hassan and Sozen, 1997). Developed for low-rise RC buildings using data from the 1992 Erzincan earthquake, the PI relies on two metrics: wall index and column index, calculated from seven input parameters. These include the number of stories, floor area, and the cross-sectional areas of columns and walls. Although these features are easy to obtain, human judgment is needed to relate PI to damage classes.
Despite the development of numerous RVS methodologies, there remains a need for a robust approach that maximizes the value of earthquake damage data collected from post-earthquake building inspections. Machine learning (ML), with its capacity to advance data-driven methodologies, provides a promising pathway for improving RVS techniques. Arslan et al. (2012) introduced an ANN-based method to evaluate the earthquake performance of RC buildings in Turkey, using 19 parameters such as the number of stories, steel strength, and spectrum intensity. Their study, based on 66 RC buildings (4–10 stories), achieved high accuracy in classifying performance levels according to the Turkish Earthquake Code-2007 (TEC-2007) (Arslan et al., 2012). Zhang et al. (2018) proposed an ML framework to link structural damage patterns to residual collapse capacity, enabling the assessment of structural safety. This framework was validated on a four-story RC frame with high predictive accuracy (Zhang et al., 2018). Mangalathu et al. (2020) applied ML models to predict building damage levels using input features like spectral acceleration, fault distance, and building age, achieving 66% accuracy with data from the 2014 South Napa earthquake (Mangalathu et al., 2020). More recently, Coskun and Aldemir (2023) developed an RVS method for masonry buildings using ensemble learning. Their model incorporated factors such as the number of stories, floor system type, wall material, vertical irregularity, and earthquake zone, and was validated using a dataset of 543 masonry buildings in Turkey (Coskun and Aldemir, 2023). While these methodologies have made significant contributions, there is still a pressing need for approaches that are not constrained by regional or national boundaries.
In a previous study by the authors (Elyasi et al., 2024), an ML-based RVS method was introduced, achieving a damage classification accuracy of 71%. While this performance is promising, further improvement is needed due to the serious consequences of prediction errors. If a vulnerable building is classified as safe, it could lead to major financial losses and even loss of life during an earthquake. On the other hand, incorrectly labeling a safe building as vulnerable may result in unnecessary and costly upgrades. Improving the model accuracy and reducing misclassifications is therefore critical. To encourage building owners and decision-makers to act on the model’s recommendations, it is important to clearly show that the cost of assessment and retrofitting is often much lower than the potential damage and rebuilding costs after an earthquake. A reliable and easy-to-understand framework can help increase confidence in its use. A key drawback of ML models is that they are often black boxes. Therefore, it is crucial to explore how they make predictions and identify cases with uncertainty. Understanding and addressing this uncertainty can make the model more trustworthy and useful in practice. The following sections outline the original RVS approach developed by the authors, explain the motivation for enhancing it, and detail the steps for improving its accuracy, interpretability, and practicality for seismic risk screening.
ML-based RVS
Description of structural damage classes (Elyasi et al., 2024; Johnson and Fick, 2018; Sim et al., 2016b).
The RF model operates by constructing a large number of decision trees, each trained on a random subset of the data. A random selection of input features is then considered for splitting nodes in each tree. This process can be repeated many times to build a diverse forest of trees. When making predictions, the output from each tree is aggregated, and the final prediction is determined by majority voting, where the most frequent prediction across all trees is chosen. To evaluate the model, the weighted average F1-score was chosen. This metric accounts for the F1-score of each class, weighted by the number of samples in that class, and averages the scores across all classes. The F1-score is calculated as:
Accuracy of the proposed RVS model (Elyasi et al., 2024).
The model’s weighted average accuracy of 71%, achieved using only basic building features, is promising but further refinement to minimize misclassifications can enhance the model’s reliability. Enhancing prediction accuracy, particularly for the severe damage class, is vital given the potential consequences. Misclassifying a vulnerable building as safe could result in significant financial losses and pose serious risk to lives during an earthquake, while inaccurately labeling a structurally sound building as at risk would incur additional evaluation and retrofit costs. To effectively persuade building owners to invest in necessary evaluations and retrofitting, it is crucial to develop a robust framework that clearly demonstrates how the costs of assessment and retrofitting can be significantly lower than the potential damages and rebuilding expenses that could arise from an earthquake. This urgency is particularly pronounced in high-risk seismic regions like California, where the probability of a major earthquake in the near future is high. Identifying vulnerable buildings and implementing necessary actions, such as retrofitting, are critical to reducing these risks. However, this can only be achieved by convincing building owners through a reliable, cost-effective, and applicable procedure. Improving the model’s accuracy and reliability will be essential in persuading stakeholders to adopt its recommendations, ultimately leading to safer buildings and more resilient communities.
Study objectives
This study aims to refine the previously proposed RVS methodology by addressing model uncertainty and reducing misclassifications in the severe damage class. Two approaches are proposed to achieve these objectives. The first approach focuses on model uncertainty. Model uncertainty refers to the uncertainty in predictions due to the limitations, assumptions, and approximations inherent in the model’s structure. By analyzing the distribution of probabilities assigned by individual decision trees in the RF classifier to a building being in the severe damage class rather than relying solely on final predicted label, a more nuanced understanding of the model uncertainty can be obtained. This involves reducing the maximum depth of decision trees in the RF classifier by limiting the longest path from the root node to a leaf node in the tree. This reduction helps avoid prediction purity, a condition where a leaf node contains samples that all belong to the same class, leading to more confident classifications that may indicate overfitting. By restricting the depth, class probabilities can be calculated from each decision tree instead of predicting a single class label. The overall probability for each damage class is then averaged across all trees in the forest. This strategy helps identify low, moderate, and high uncertainty levels in predictions. Buildings with high prediction uncertainty should be prioritized for further inspection and analysis, while those with low uncertainty can be classified with greater confidence. By focusing on probability distributions, the methodology enhances the reliability of seismic vulnerability assessments and helps mitigate potential risks associated with misclassifications. The second approach aims to optimize the decision threshold of the classifier to enhance model reliability by reducing misclassifications. This is performed by considering the relative cost of misclassifying low risk buildings as high risk and those at high risk as low risk. A wide range of cost ratios are used to estimate the total misclassification cost and the effect of changing the decision threshold on prediction accuracy and the cost of misclassification is examined. Finally, the two refinement approaches are incorporated into the original ML model to propose a comprehensive three-level ML-based RVS methodology.
Model uncertainty detection
Understanding and evaluating uncertainty is crucial for assessing the reliability of an ML model’s predictions. There are two main types of uncertainty: data uncertainty and model uncertainty. Data uncertainty comes from the natural variation or noise in the data itself. This type of uncertainty is caused by factors that the model cannot account for such as measurement errors, incomplete data, or randomness in data collection. In this study, differences in the quality or reporting standards of the six earthquake datasets used for training and testing the model represent examples of data uncertainty. Differences in how accurately the geometric features were measured (e.g., floor area, column and wall areas), and how the inspectors judged the level of damage for buildings in these datasets are also notable sources of data uncertainty. To help reduce the data uncertainty in this study, damage labels were cross-checked against available building photos when a sufficient number of clear images existed. However, this was not possible for a small number of buildings, as the dataset was compiled from post-earthquake surveys conducted by different teams in various regions and photographs were sometimes limited in number and/or scope. Since the dataset was already limited in size, removing the samples without visual evidence could further reduce the data and increase the risk of overfitting. Therefore, it was assumed that the labels assigned by inspectors were accurate in such cases. Model uncertainty, on the other hand, comes from the limitations within the model itself. It can be caused by factors like the model’s structure, assumptions made during its development, or the limited amount of training data. In the case of the RVS methodology developed by the authors, one source of model uncertainty is the fact that the model makes predictions based on a small set of coarse geometric features that do not directly represent the structural design or condition. Additionally, the model may not have clear decision boundaries for classification. For example, if the model encounters a building with a unique combination of features, it may be unsure whether to classify it as severe or non-severe, especially if there are no similar examples in the training data. Given that addressing data uncertainty is often difficult due to limited control over data quality and availability, this study focuses primarily on addressing model uncertainty.
In the ML-based RVS model developed by the authors (Elyasi et al., 2024), misclassifying a building that is likely to sustain non-severe damage as severe may simply lead to higher costs associated with a more detailed inspection and evaluation, but incorrectly predicting a building at risk to be severely damaged in an earthquake as non-severe poses significant safety risks and could result in substantial losses in future earthquakes. Therefore, addressing the uncertainty in the classifier’s predictions is essential for mitigating these risks. To address this uncertainty, the probabilities assigned to both damage classes (non-severe and severe) by the RF classifier during prediction are examined rather than relying solely on the final predicted label. Figure 1 shows how these probabilities are calculated, providing an overview of the RF classifier’s structure. 100 trees were adopted for the RF classifier. This is a commonly used number that offers a good balance between performance and computational efficiency. The process of calculating damage class probabilities in the RF classifier.
Typically, the RF classifier makes predictions based on the class with the highest probability, using a decision threshold of 0.5 for binary classification. If the probability of a building being severely damaged exceeds this decision threshold, it is classified as severe; otherwise, it is categorized as non-severe. By focusing on probabilities rather than final labels, this approach moves beyond simple categorization of potential earthquake damage as severe or non-severe. It provides a more detailed understanding of the likelihood of each class and improves the overall assessment process by enabling more informed and careful decision-making regarding the building’s vulnerability. To further illustrate this point, three examples are presented.
Characteristics of the three buildings listed in Table 4.
Predicted class probabilities and labels for three test set samples.
In the case of Building 1, the model predicts the probability of severe damage as 0.91, giving it a final label of severe, however, the actual damage was non-severe. This is a conservative but incorrect prediction by the model. Conversely, for Building 2, the model yielded a probability of severe damage of 0.13, classifying it as non-severe while the actual damage in the earthquake was severe. Lastly, Building 3 was given a probability of severe damage of 0.49, marginally below the 0.5 threshold and resulting in a final label of non-severe earthquake damage when the actual damage was severe. In a real application of the ML-based RVS model, the mislabeling of Building 1 would simply lead to detailed analysis of the building and possibly seismic retrofits. In contrast, the false predictions for Buildings 2 and 3 could mean significant losses in a future earthquake and potentially even put lives at risk. Cases like Building 2, where the predicted probability of severe damage is much lower than that of non-severe damage, reflect confident errors by the model and can be considered a part of its limitations. These errors can be attributed to the inherent simplicity of RVS, which is intended as a preliminary assessment and relies on a limited number of basic building parameters and is therefore reasonably expected to yield a certain level of prediction errors. Incorporating additional features such as structural characteristics during the screening process can help reduce these incorrect predictions, but it may also increase the complexity of the assessment. On the other hand, in cases like Building 3 where the probability of severe damage is high and close to the threshold of 0.5, it may be prudent to exercise judgment and classify such buildings as at risk of severe damage. These examples clearly illustrate that relying on predicted probabilities rather than solely on the final predicted label can lead to better decisions.
To address uncertainty in predictions for cases like Building 3, the maximum depth of the decision trees is reduced to avoid reaching final label predictions. In a RF model, decision trees typically continue to split until they reach leaf nodes that assign a definitive class label. By limiting the maximum depth, the trees are prevented from growing too complex and forcing a final classification at the leaf nodes. This adjustment allows the calculation of probabilities for each damage class from every tree in the forest, rather than just predicting a single class label. The probability distribution of a building being in the severe damage class can be visualized using histograms, where the height of each bar reflects the frequency (count) of probability estimates provided by the trees in the forest for a specific range (bin). A curve can be fitted to the histogram to represent a continuous probability density function, offering a visual representation of the overall shape of the probability distribution. In thi study, a margin of 10% around 0.5 is selected in order to focus on cases like Building 3, so the test set samples with predicted probabilities of being in the severe class between 0.4 and 0.6, obtained from the original classifier, are examined. This margin can be adjusted according to user needs. Figure 2 illustrates three types of probability distributions for a building being in the severe damage class using the decision tree probabilities. The red dashed vertical line indicates the mean probability, which is the average of the probabilities predicted by all trees in the forest for a given building being classified as severely damaged. Three types of probability distributions for a sample from the severe damage class: (a) low uncertainty; (b) moderate uncertainty; (c) high uncertainty.
Figure 2(a) shows a typical probability distribution with low uncertainty. The probability mass is concentrated around the mean probability, leading to a clear final class prediction. In contrast, in the probability distribution shown in Figure 2(b) the probability mass is focused around two distinct points rather than around the mean probability. This bimodal distribution suggests moderate uncertainty, as the prediction is less clear. Lastly, the probability distribution in Figure 2(c) is more uniform, indicating that the probability mass is spread out rather than concentrated around the mean probability. This implies high uncertainty in the prediction, making the final class significantly less certain.
Threshold optimization
In the context of ML classification, a decision threshold is a specific probability value that determines how predicted probabilities from a model are converted into class labels. However, a default threshold of 0.5 may not always be optimal for every classification problem. Adjusting the threshold can enhance model performance and reduce misclassifications, especially when the consequences of misclassifying one class are more critical than another. This can make the model more sensitive to important classes, thereby reducing the impact of costly errors. In this study, optimization of the decision threshold is explored as a means to address model uncertainty and minimize misclassifications. To identify this optimal threshold, the decision threshold is changed between 0 and 1 at increments of 0.05 and the cost of misclassifications is estimated for the predictions made based on each threshold. The total misclassification cost is calculated as follows:
Due to the focus of this study on accurately predicting the severe damage class, an accuracy metric is required to assess the performance of the severe class as the threshold varies. To maintain consistency with the original RF classifier evaluation, the F1-score is used to measure the accuracy of the severe class with the following modification to the definition of precision to incorporate
The objective of this approach is to identify the optimal decision threshold that minimizes misclassification cost while maximizing or achieving high accuracy for the severe damage class.
Results and discussion
Results on model uncertainty

Example buildings for low uncertainty. Example Building (1) Actual damage class: Severe. Example Building (2) Actual damage class: Severe. Example Building (3) Actual damage class: Non-Severe. Example Building (4) Actual damage class: Non-Severe.

Example buildings for moderate uncertainty. Example Building (5) Actual damage class: Severe. Example Building (6) Actual damage class: Severe. Example Building (7) Actual damage class: Non-Severe. Example Building (8) Actual damage class: Non-Severe.

Example buildings for high uncertainty. Example Building (9) Actual class: Severe. Example Building (10) Actual class: Severe. Example Building (11) Actual class: Non-Severe. Example Building (12) Actual class: Severe.
Low Uncertainty (Figure 3)
In scenarios of low uncertainty, the probability distribution is sharply concentrated around the mean probability or a value very close to it. This concentration allows for a more confident classification based on the mean probability. However, despite this high confidence, some predictions may still be incorrect due to the model’s inherent error. This error arises because there will always be a margin of misclassification due to factors such as data uncertainty and inherent variability in the dataset. In the RF model, errors can also result from its simplicity and the fact that seismic vulnerability of a building is influenced by many other parameters not typically accounted for in RVS methodologies. For instance, Example Buildings 1 and 2 have mean probabilities of approximately 0.6 and 0.68, respectively, leading to their classification as severe since they exceed the threshold of 0.5. In contrast, Example Building 3, with a mean probability around 0.35, falls below this threshold and is therefore classified as non-severe. Example Building 4 presents a more complex case with its mean probability is about 0.58. This results in a final classification of severe, even though its actual damage class is non-severe. This highlights the inevitable misclassification errors that can occur in the model, even in situations of low uncertainty.
Moderate Uncertainty (Figure 4)
Example Buildings 5-8 exhibit probability distributions concentrated around two distinct points rather than a single mean probability. This bimodal distribution pattern introduces complications in classification and reduces the model’s confidence in its predictions. In each example building, one concentration point lies below the threshold of 0.5, while the other exceeds it, and both are positioned at a distance from the mean probability. These two-point concentrations indicate that the classifier’s decision-making is not straightforward, making it difficult to rely solely on mean probabilities for accurate predictions.
High Uncertainty (Figure 5)
As indicated by the probability distributions the damage state predictions for Example Buildings 9-12 show high uncertainty. The probability distributions are nearly uniform, meaning that the model provides nearly equal likelihoods for both non-severe and severe classifications. This signifies that the classifier is uncertain and struggles to differentiate between the classes. In such cases, it becomes nearly impossible to make a definitive classification, and detailed evaluation is required to resolve the ambiguity. To effectively manage cases of moderate and high uncertainty, it is essential to categorize these samples separately for further action.
Determining the uncertainty level of damage class predictions for buildings based on probability distributions is an effective method to prioritize high uncertainty cases and enhance the RVS method. Among the approximately 132 buildings in each test set per fold, nearly 25% were decisions with limited confidence where the probability of falling in the severe damage class ranged between 0.4 and 0.6. The mean probabilities of these cases were examined but no clear patterns or trends was found across the folds for misclassified samples. This highlights the importance of evaluating the quality of the complete probability distribution, which provides a more clear picture of the uncertainty than relying on mean probability values alone. In the end, these buildings can be individually examined more closely to assess the uncertainty in their predicted classifications which can considerably improve the model performance. However, manually analyzing and categorizing all these distributions based on judgment can be a tedious process. Therefore, the alternative approach of threshold optimization is discussed in the next section.
Results on threshold optimization
To address model uncertainty in classification, an alternative approach involves adjusting the decision threshold. Instead of using the traditional value of 0.5 for classification, an optimal threshold is determined to better suit the proposed classifier and its purpose. This adjustment is particularly important for reducing misclassifications in the severe damage class, which has more significant adverse consequences compared to misclassifications in the non-severe damage class. The goal is to select a threshold that minimizes the cost of false predictions while maintaining high accuracy for the model, especially for the severe damage class. To find this optimal threshold, a cost sensitivity analysis is conducted, considering the total cost of misclassifications by the model. For each fold of the 5-fold cross-validation, the model is trained on the corresponding training set from the dataset used in the previous study (Elyasi et al., 2024). During training, cost coefficients,
In this study, a range of Threshold optimization based on misclassification cost and F1-score for the severe damage class, evaluated at cost ratios of: (a) 1:1; (b) 1:1.5; (c) 1:2; (d) 1:2.5; (e) 1:3; (f) 1:3.5; (g) 1:4; (h) 1:4.5; (i) 1:5.
Comparison of false positives and false negatives at optimal thresholds versus the conventional threshold of 0.5 for different cost ratios.
To assess how the optimal decision threshold compares to the conventional threshold of 0.5, the modified ML-based RVS model was again tested using 5-fold cross validation. The dataset, consisting of 658 buildings, was divided into five equal subsets. In each fold, four subsets were used for training and one for testing, resulting in approximately 132 buildings in the test set per fold. This process was repeated five times, ensuring that each building was tested once. The results were then averaged across all five folds to provide a comprehensive evaluation. The primary goal was to determine whether adjusting the threshold improves the performance of the ML model, particularly in accurately detecting buildings likely to be severely damaged. Table 6 presents the number of non-severe samples incorrectly classified as severe (false positives) and the number of severe samples incorrectly classified as non-severe (false negatives) for both the optimal thresholds and the conventional threshold of 0.5. When the optimal threshold is used to screen the buildings instead of 0.5, a notable reduction in the number of false negatives is observed, thereby improving the accuracy for the severe class. However, this improvement is accompanied by an increase in the number of false positives. For cost ratios of 1:1.5, 1:2, and 1:2.5, using the optimal threshold results in a reduction in false negatives that is approximately balanced by an increase in false positives. Conversely, for higher cost ratios with lower optimal thresholds, since the cost of misclassifying a severe case is equal to or greater than three times that of a non-severe case, adopting the optimal threshold over the conventional 0.5 is still considered an improvement. For example, with a cost ratio of 1:4, using the optimal threshold of 0.15 instead of 0.5 removes 18 out of 20 false negatives, correctly classifying these buildings as severe. However, this change also results in 26 more false positives. Given that the misclassification cost of each false negative is four times that of each false positive, this adjustment still enhances the classifier’s performance. Notably, at a cost ratio of 1:5 with an optimal threshold of 0.1, nearly all false negatives are eliminated, significantly improving the classifier’s accuracy in identifying severe cases.
By adopting this threshold optimization and cost sensitivity analysis approach, the model ensures that the decision-making process in seismic vulnerability assessments is both cost-effective and robust, particularly in minimizing the severe damage class misclassifications. This approach effectively manages the trade-offs between the costs of false positives and false negatives.
A three-level ML-based RVS framework
The RVS approach previously proposed by the authors employed an RF model based on basic building characteristics including the number of stories, floor area, column cross-sectional area, the areas of concrete and masonry infill walls in both east-west and north-south directions. MMI and PGA were introduced as earthquake intensity features to improve the prediction accuracy. Full details of the model are available in Elyasi et al. (2024). A total of 658 building samples collected from Duzce (1999) (Sim et al., 2016a), Bingol (2003) (Sim et al., 2016a), Nepal (2015) (Shah et al., 2015), Taiwan (2016) (NCREE, 2016), Ecuador (2016) (Sim et al., 2016b; Villalobos et al., 2018), and Pohang (2017) (Sim et al., 2018) were used for training and evaluation of the model which showed promising results. The current study introduced further refinements aimed at minimizing the model’s misclassifications. This section presents a three-level ML-based RVS framework that offers users the flexibility to choose between the original method and two enhanced alternatives, based on their specific needs and available resources for conducting an RVS. It is worth noting that while the proposed enhancements improve the model’s reliability, the original ML model without them is still a robust method with 71% accuracy.
Level 1: Initial rapid assessment
This level involves the application of the RVS previously proposed by the authors (Elyasi et al., 2024). Utilizing an RF classifier, users can rapidly identify low-rise RC buildings at high risk of severe damage. This step is recommended when resources are limited, aiming for a quick assessment due to budget and time constraints. This initial assessment provides a reasonable accuracy of approximately 71%, allowing for a rapid evaluation.
Level 2: Enhanced assessment with model uncertainty detection
For users with sufficient budget and time, and who seek a more accurate assessment, this level includes performing the model uncertainty detection proposed in this study. By analyzing the probability distribution of predictions, it becomes possible to identify the level of uncertainty in the prediction. Buildings with moderate or high uncertainty should be categorized for further actions, such as detailed inspections or additional analyses, with a priority given to those with high uncertainty. The decision to investigate both groups or only those with high uncertainty should be based on the available resources, time, and regional conditions.
Level 3: Threshold optimization for misclassification cost reduction
This level incorporates decision threshold optimization of the classifier to account for the cost of misclassifications, making it the most conservative of the three levels. Instead of using the conventional threshold of 0.5, an optimal threshold is determined to minimize the cost of false predictions by the model while maintaining high accuracy, especially for the severe damage class. Real costs or the relative cost of false positives ( Flowchart of three-level ML-based RVS framework.
The enhanced RVS framework not only improves prediction accuracy but also provides a scalable and practical solution for large-scale seismic screening, ultimately supporting better structural safety outcomes. For example, at Level 2 of the framework, by applying a 10% margin around the 0.5 decision threshold, buildings with decisions based on limited confidence can be further assessed for prediction uncertainty. This level of screening only requires basic understanding of probability and statistics. Although there is some subjectivity in distinguishing the level of uncertainty (low, moderate, severe) from probability distributions, the point of Level 2 screening is to qualitatively examine the level of uncertainty in the prediction not to exactly classify or quantify it. Finally, Level 3 aims to further improve the prediction accuracy by optimizing the decision threshold based on the cost of misclassifications. This would require expertise in estimating costs associated with seismic upgrades and losses. Determining appropriate cost ratios for false positives and false negatives is a challenge as these ratios are inherently context-dependent, varying significantly based on regional economic conditions, building characteristics, seismic hazard levels, and stakeholder priorities. Therefore, Level 3 screening with an optimized threshold requires input from trained professionals and decision makers. Future research can focus on the development of an adaptable framework for cost ratio determination that considers the diverse and region-specific factors influencing seismic vulnerability assessments. To further enhance adaptability, methodologies that incorporate dynamic cost ratio adjustments based on real-time data should be explored.
Conclusions
This study presented a comprehensive approach to RVS through a three-level ML-based methodology, emphasizing the importance of model uncertainty detection and threshold optimization. The initial ML-based seismic screening model developed by the authors can identify low-rise reinforced concrete buildings likely to be severely damaged in earthquakes with 71% accuracy, but two enhancements were investigated in this work to provide a more nuanced understanding of prediction confidence and improve its performance. The first enhancement involved calculating probabilities for each damage class from every tree in the forest, rather than predicting a single class label. This allowed for visualizing the probability distribution for a building belonging to each damage class, offering clearer insights into the model’s predictions. Buildings were then categorized into low, moderate, or high uncertainty groups based on these distributions. This categorization helps prioritize further detailed investigations for buildings with moderate to high uncertainty. This approach is essential in ensuring that the potential model uncertainty is addressed, particularly for structures at greater risk, thereby enhancing overall safety. Finally, the threshold optimization approach refined the decision-making process by considering the relative costs associated with misclassifications. This strategy was vital for reducing the misclassification of buildings at high risk of severe damage as low risk, as such errors could result in substantial financial and safety implications. By evaluating misclassification costs across various decision thresholds for the classifier, this methodology improved the overall reliability of seismic vulnerability assessments. The enhanced three-level ML-based RVS methodology offers a structured and effective framework for accurately identifying buildings at risk of severe damage. By systematically addressing model uncertainty and optimizing decision thresholds, this framework helps improve prediction outcomes while being mindful of the associated costs, ultimately contributing to safer built environments and informed decision-making for stakeholders involved in seismic risk management.
Footnotes
Author contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Niloofar Elyasi and Eugene Kim. The first draft of the manuscript was written by Niloofar Elyasi and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant (RGPIN-2023-03729).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The datasets generated during and/or analysed during the current study are available on Datacenterhub at https://datacenterhub.org/. The code and machine learning models developed by the authors are also available on GitHub at
.
