Sage Journals: Discover world-class research

Abstract

Damage detection in structural health monitoring (SHM) is critical for ensuring the safety and longevity of civil infrastructure. This article presents U-GraphFormer, an unsupervised deep learning model that extends its predecessor, EdgeConvFormer, by incorporating a U-Net-inspired encoder–decoder architecture, sensor-specific temporal attention, and a dynamic spatiotemporal graph learning mechanism. The model performs time-point-wise anomaly detection for early intervention and uses per-segment statistics, mean, and standard deviation of reconstruction errors, for an interpretable severity assessment. Validated in various benchmarks, including the International Association for Structural Control and Monitoring–the American Society of Civil Engineers (IASC–ASCE) steel frame, the artificially damaged steel truss bridge, and an offshore wind turbine bolt-loosening data set, U-GraphFormer shows strong performance. In the ASCE benchmark, the model achieves up to 0.99 precision, 0.98 recall, and 0.98 F1 score for severe damage, with segment-wise alarm accuracy reaching 100% in minor damage scenarios. The model adaptively selects between the base and Gaussian kernel scoring mechanisms based on anomaly subtlety, thereby ensuring high detection reliability. Furthermore, the segment-wise mean and standard deviation of the severity assessment reflect the progression of physical damage and rank faults consistent with the ground truth. Case studies confirm that early detection is achieved in seconds, and the model accurately distinguishes between minor, moderate, and severe damage. With low computational cost, real-time inference, and strong generalizability, U-GraphFormer offers a scalable and practical solution for continuous SHM in real-world applications.

Keywords

Structural health monitoring anomaly detection damage severity quantification machine learning transformer

Introduction

Civil engineering structures, such as bridges, buildings, and pipelines, rely on structural health monitoring (SHM) systems to ensure their safety and longevity. These systems utilize an array of sensors and non-destructive evaluation techniques to continuously or periodically assess the condition of these structures, providing accurate data on their current health.¹ Such data are crucial for the early detection of potential structural issues and for the precise evaluation of any detected damage. By incorporating advanced technologies and data analysis methods, SHM systems facilitate timely and proactive maintenance, enabling civil engineers to prioritize repairs based on the urgency and importance of the identified problems.^2,3 The early detection and accurate severity assessment offered by SHM not only ensure that resources are allocated efficiently but also help minimize risks and prolong the service life of critical infrastructure.^4,5

The concept of early damage detection in SHM became a critical focus during the field’s formative years, driven by the need to identify structural issues before they escalated. Doebling et al.⁶ provided one of the first comprehensive reviews of vibration-based damage identification methods, detailing approaches such as changes in modal frequencies, mode shapes, and advanced techniques using modal strain energy to detect incipient damage. Building on this foundation, Sohn et al.⁷ expanded the scope by reviewing statistical pattern recognition paradigms in SHM, stressing the importance of data normalization and feature extraction for identifying subtle changes indicative of early damage. Farrar et al.⁸ introduced statistical process control methods for distinguishing between environmental variations and actual structural damage, significantly enhancing the sensitivity of early detection methods.

Recent advancements in SHM have been significantly influenced by the integration of supervised learning techniques, which have greatly enhanced early damage detection. Supervised learning models, utilizing labeled datasets, have shown remarkable precision in identifying known damage patterns. Avci et al.⁹ demonstrated the superiority of convolutional neural networks (CNNs) for damage detection compared to traditional methods. Rafiei and Adeli¹⁰ further advanced the field with a dynamic neural classification algorithm for high-rise buildings, showing the adaptability of supervised learning across diverse structures. Azimi et al.¹¹ reviewed state-of-the-art deep learning methods in SHM, while more recently, Ahmadian et al.¹² developed a supervised machine learning model capable of detecting subtle structural deviations, marking a key step forward in applying supervised methods for real-time damage detection.

While supervised learning models have advanced SHM, they rely on large labeled datasets, which are often difficult to obtain in real-world scenarios. Unsupervised learning techniques address this challenge by detecting anomalies and potential damage using unlabeled data, making them increasingly important for early damage detection in the absence of labeled data.¹³ Bull et al.¹⁴ introduced an unsupervised novelty detection approach that uses probabilistic models to account for environmental and operational variability, thereby improving the reliability of structural anomaly detection. Similarly, Rastin et al.¹⁵ used convolutional autoencoders to detect and quantify structural damage, achieving success with both numerical models and full-scale structures like the Tianjin Yonghe Bridge. Sarmadi and Yuen¹⁶ employed a one-class kernel null space algorithm with probabilistic threshold estimation for early damage detection, effectively managing environmental variability. Wang and Cha¹⁷ proposed an unsupervised deep learning approach utilizing a deep auto-encoder with a one-class support vector machine to detect structural damage by extracting damage-sensitive features from acceleration response data, achieving high accuracy in both numerical and experimental studies. Lei et al.¹⁸ proposed a deep convolutional generative adversarial network for damage detection by reconstructing lost sensor data to enable accurate identification of structural conditions and damage. This reflects the growing trend of using AI to enhance the sensitivity of early damage detection in vibration-based monitoring systems.¹⁹

Once damage is detected, accurately quantifying its severity is crucial for determining appropriate maintenance and repair strategies. Over the past few decades, significant progress has been made in damage severity assessment methods in SHM. Early contributions by Pandey et al.²⁰ introduced the use of changes in modal curvature for damage localization and severity assessment, laying the groundwork for future developments. Farrar and Jauregui²¹ compared damage identification algorithms using both experimental and numerical modal data, offering insights into how different methods can assess damage severity effectively. Ren and Sun²² further advanced the field by applying wavelet analysis for quantitative damage assessment, demonstrating how frequency analysis can accurately localize and measure severity. As SHM methods evolved, more sophisticated techniques emerged. Fan and Qiao²³ provided a comprehensive review of vibration-based damage identification methods, focusing on algorithms for quantifying damage severity. Their work highlighted the potential of data-driven methods for precisely estimating the extent of structural damage.²⁴

As SHM systems evolved, machine learning emerged as a pivotal tool for enhancing the accuracy and efficiency of damage severity assessments.²⁵ Tibaduiza et al.²⁶ proposed a method that combined feature selection with extreme learning machines, significantly improving the precision of severity estimation. This marked a shift toward data-driven approaches, emphasizing the extraction and analysis of relevant features from extensive datasets. Building on this, Zhang et al.²⁷ introduced CNNs for assessing damage severity directly from raw sensor data, streamlining the assessment process by reducing the need for extensive feature engineering. Concurrently, applying probabilistic methods into SHM has addressed the inherent uncertainties in structural models and measurement data. Behmanesh et al.²⁸ developed a Bayesian probabilistic framework for assessing damage severity in steel structures, incorporating uncertainty into the assessment process to enhance the reliability of results. Huang et al.²⁹ extended this approach by applying Gaussian process regression to damage severity assessment, offering probabilistic predictions crucial for informed decision-making in SHM.

Recent advancements in SHM have also seen the integration of deep learning and data fusion, particularly in multi-sensor contexts, significantly enhancing damage severity assessment. Entezami and Shariatmadar³⁰ introduced an unsupervised learning approach for damage localization and severity assessment, using novel damage indices based on time series modeling. Their method identifies robust model orders through an iterative process and uses AutoRegressive model parameters and residuals as damage-sensitive features to quantify damage severity. Following this, Xu et al.³¹ proposed a Long Short-Term Memory (LSTM) neural network framework for real-time seismic damage assessment, demonstrating how deep learning can effectively quantify the extent of structural damage at a regional scale by analyzing data from multiple sources. Postorino et al.³² demonstrated the robustness of CNNs in predicting damage severity and location in composite structures, even under manufacturing uncertainties and noise. Building on these advancements, Nguyen-Ngoc et al.³³ introduced a method combining Deep Neural Networks with the Artificial Rabbit Optimization algorithm, effectively localizing and quantifying damage in truss bridges while overcoming challenges such as local minima in optimization processes.

In addition to these data-driven advancements, innovation in signal processing techniques continues to play a crucial role in damage severity assessment. Esmaielzadeh et al.³⁴ applied this method to concrete gravity dams, using relative frequency error to estimate damage severity. By analyzing changes in natural frequencies, they accurately identified the location and severity of structural damage in non-linear, non-stationary signals common in SHM. Mousavi et al.³⁵ used the complete ensemble empirical mode decomposition with adaptive noise technique, focusing on key features like energy and instantaneous amplitude, to achieve more accurate damage severity classification compared to traditional methods.

Hybrid methodologies that integrate various techniques have emerged as powerful tools in severity assessment.³⁶ Dang et al.³⁷ proposed a hybrid 1- Deep Convolutional Neural Network (DCNN)-LSTM model for accurate damage severity assessment in civil structures. By combining signal processing techniques with deep learning, their method efficiently handles noisy sensor data and achieves high accuracy in real-time SHM. Similarly, Svendsen et al.³⁸ developed a hybrid SHM framework that uses both numerical and experimental data to assess damage severity in steel bridges. Another innovative hybrid methodology, as described by Sakiyama et al.,³⁹ combines principal component analysis (PCA), finite element simulations, and Monte Carlo simulations to quantify the severity of the damage of the aging infrastructure.

Recent advances in SHM have explored diverse methodologies across well-established benchmarks, yet key limitations persist—especially regarding scalability, automation, and interpretability. Traditional approaches to the International Association for Structural Control and Monitoring–the American Society of Civil Engineers (IASC–ASCE) Phase I benchmark often rely on supervised learning frameworks with hand-crafted features and static thresholds,^40–42 limiting their adaptability to new damage conditions or unseen environments. Similarly, SHM studies using the Old ADA truss bridge, a full-scale, artificially damaged structure, typically require extensive manual calibration and often lack generalized scoring mechanisms.^43,44 In both cases, these models are prone to false alarms due to environmental noise and lack the ability to distinguish between damage severity levels without supervised labels or retraining.

Valdez-Yepez et al.⁴⁵ introduced the only public dataset for bolt-loosening detection in offshore wind-turbine jacket supports and applied PCA plus Mahalanobis distance for anomaly detection.⁴⁶ Although effective in small-scale laboratory tests, this pipeline has limited practical scalability: manual thresholding impedes real-time operation, and the method yields only a binary healthy/damaged flag, offering neither bolt-level localization nor multi-severity discrimination (e.g., 6 vs 9 Nm loosening) without retraining. Such limitations restrict its utility for progressive on-the-fly damage assessment.

Despite significant advancements in both early damage detection and severity assessment, several key challenges continue to hinder the full potential of these methods. Environmental and operational variability pose significant challenges in SHM, where fluctuations in temperature, humidity, and operational loads can obscure or mimic structural damage, complicating the accurate detection of anomalies. Robust algorithms capable of distinguishing between benign variations and genuine structural problems are urgently needed.⁴⁷ Additionally, setting reliable thresholds for anomaly detection is problematic; arbitrarily or context-dependent thresholds can lead to false positives or negatives, particularly in complex machine learning and deep learning models.⁴⁸ The issue of generalization across different datasets and structural types further limits the applicability of current SHM methods, as many models perform well in specific contexts, but struggle in varied environments.⁴⁹ Scalability and computational efficiency are also critical concerns, especially as SHM systems increasingly rely on large-scale sensor networks that generate vast amounts of data requiring real-time analysis.⁵⁰ Furthermore, the interpretability of SHM models, especially those based on deep learning, remains a challenge. The “black-box” nature of these models can impede trust and practical application, underscoring the need for interpretable AI that provides clear and understandable rationales for their decisions.⁵¹

Recent studies have begun to address the most pressing of these gaps, namely, robustness to environmental variability. Manifold learning-aided clustering followed by non-parametric probabilistic scoring substantially reduces false alarms by forming environment-specific data manifolds before anomaly detection.⁵² Complementary work removes freezing-temperature artifacts from bridge modal frequencies by normalizing unsupervised data, thus restoring damage sensitivity without manual tuning.⁵³ Moreover, cyclostationarity-based signal modelling offers a physics-guided route to suppress speed-induced variance in rotating machinery while retaining wear signatures.⁵⁴ Together, these advances demonstrate that hybrid environment-aware learning pipelines can outperform purely data-driven baselines, yet they remain largely confined to single-domain case studies and require custom parameter choices, leaving open questions of cross-domain generalization and computational scalability. In parallel, hybrid physics-informed neural networks have been introduced to merge first-principles structural models with data-driven learning,⁵⁵ and probabilistic generative models such as variational autoencoders offer an unsupervised framework for quantifying uncertainty and detecting anomalies without labeled damage data.⁵⁶

Digital twin platforms now pair high-rate Internet of Things (IoT) sensor streams with physics-based simulations to deliver real-time asset-specific health assessments of gearboxes, wind turbine drivetrains, and bridge decks, enabling condition-based maintenance and reducing unplanned downtime.⁵⁷ In intelligent manufacturing, cyclostationarity-aware vibration analysis takes advantage of the periodic statistics of rotating machinery to isolate weak fault signatures in spur gears with highly variable speed and load profiles.⁵⁸ These successes underscore the advantage of blending domain physics with advanced deep learning pipelines for SHM.⁴⁸ However, a unified and transferable framework that can accommodate the sparse sensor layouts, large spatial scales, and severe environmental variability typical of civil infrastructure remains an open research challenge. Existing approaches often face serious limitations: In such settings, algorithms typically require labeled damage data, hand-tuned thresholds, or extensive sensitivity analyzes to avoid false alarms caused by operational or environmental fluctuations.^52–54 These limitations highlight the pressing need for an unsupervised and calibration-free framework that enables robust early damage detection and interpretable severity assessment, even under complex real-world variability.

Addressing the complex challenges of damage detection in SHM has been a central focus of our research. Our development of the EdgeConvFormer⁵⁹ model marked a significant advancement in this area, integrating graph convolutional networks and transformers to capture complex spatiotemporal patterns. A key innovation of EdgeConvFormer is its use of dynamic Graph Convolutional Network (GCN)s within a spatial-temporal 2D framework, which adapts to varying structural conditions and sensor layouts, enhancing its ability to capture intricate spatial dependencies. Additionally, the Parallel Sensor-Specific Transformer enables precise modeling of temporal dependencies across sensors, preserving individual sensor characteristics while learning inter-sensor relationships. These features collectively provide a more accurate and robust analysis of the relationships between different sensors over time, setting EdgeConvFormer apart from existing methods.

While EdgeConvFormer demonstrated superior performance in anomaly detection and damage localization using a reconstruction approach,⁶⁰ it still has some limitations. The model’s encoder excelled at extracting multi-scale features; however, its decoder, based on a simplified multi-layer perceptron (MLP) structure, did not fully leverage the richness of the encoded information. The MLP-based decoder used max and mean pooling followed by linear transformations to synthesize these features into the final reconstructed data. This straightforward approach, while easy to implement and effective in certain contexts, ultimately fell short in preserving the intricate spatial and temporal details crucial for accurate anomaly detection and severity assessment. As a result, some critical contextual information was lost during the reconstruction process, reducing the model’s sensitivity to subtle yet significant structural changes, particularly in noisy environments where early detection is crucial.

Recognizing the need to overcome the limitations of EdgeConvFormer, we have focused on enhancing the model to better preserve the intricate details captured by the encoder, thereby improving its ability to accurately identify and assess structural damage, especially in complex scenarios. This work builds on the strengths of EdgeConvFormer, addressing its shortcomings to further improve its effectiveness in SHM applications. As a solution, we propose U-GraphFormer, an enhanced model that significantly advances feature extraction and reconstruction processes by incorporating a U-Net-inspired encoder–decoder architecture. This new approach ensures more comprehensive utilization of the encoded information, leading to improved performance in anomaly detection and severity assessment.

In U-GraphFormer, the simple MLP decoder used in EdgeConvFormer is replaced with a more sophisticated structure that mirrors the encoder, integrating skip connections between corresponding layers. This design allows for more dynamic and progressive refinement of features across multiple layers, akin to the U-Net architecture. The inclusion of skip connections between each layer of the encoder and decoder ensures that detailed spatial and temporal information is preserved throughout the reconstruction process, mitigating the loss of critical contextual information that was a limitation in EdgeConvFormer. By progressively refining features at each layer, U-GraphFormer enhances the model’s ability to capture and utilize multi-scale features more effectively, leading to more accurate and detailed analyses. This advanced decoding mechanism not only improves the reliability of early damage detection but also significantly enhances the model’s capacity for severity assessment. The skip connections allow for the direct flow of fine-grained information from the encoder to the decoder, enabling the model to reconstruct complex patterns with greater precision. As a result, U-GraphFormer is better equipped to differentiate between normal structural variations and subtle indicators of damage, even in challenging environments.

Additionally, U-GraphFormer is engineered for real-time monitoring, with the capability to trigger alarms immediately when an adaptively generated threshold is exceeded. This real-time functionality, combined with the enhanced decoding architecture, ensures timely interventions and provides detailed insights into the severity of detected damage. This facilitates informed decision-making and prioritization of maintenance efforts, making U-GraphFormer a comprehensive and reliable tool for monitoring critical infrastructure. The main contributions of this work are as follows:

A novel model, U-GraphFormer, enhances SHM by integrating spatio-temporal graph learning with sensor-specific temporal self-attention within a U-Net-inspired encoder–decoder architecture, enabling dynamic refinement of features for more accurate anomaly detection and severity assessment.

The development of a Segment-Level Self-Adaptive Scoring Decision Mechanism for damage early detection and severity assessment. By dynamically selecting the most appropriate scoring method, base anomaly scoring, or dynamic Gaussian kernel scoring (Gauss_D_K)—based on an adaptively determined threshold, this mechanism ensures accurate, context-sensitive assessments. This innovation allows for simple and intuitive use of mean and standard deviation to reliably gauge damage progression, facilitating timely and informed maintenance decisions.

For each structure, U-GraphFormer is trained once on its healthy baseline data. Afterwards, any new segment (400 steady-state points) from that same structure can be processed in less than 2 min on a standard Graphics Processing Unit (GPU)—without further threshold tuning or retraining—enabling straightforward, continuous monitoring of the system.

The research evaluates U-GraphFormer on three SHM case studies—(i) the IASC-ASCE Phase I synthetic steel-frame benchmark, (ii) the offshore wind turbine jacket bolt loosening data set, and (iii) the Old ADA full-scale steel-truss bridge. In all cases, the model achieves segment-wise damage detection accuracy 100%, allowing reliable early warnings from short testing segments of a few minutes. In the synthetic benchmark, time-point scores and mean/ $σ$ -based severity rankings align precisely with the known stiffness reduction order. In the wind turbine dataset, the model perfectly flags every loose bolt segment in all torque reduction levels, and anomaly scores align proportionally with the severity of damage. On the ADA bridge, U-GraphFormer faithfully captures the full damage–repair sequence, demonstrating accurate progression tracking and practical utility for early intervention in real-world SHM scenarios.

Methodology

As depicted in Figure 1, the U-GraphFormer architecture builds upon the foundational EdgeConvFormer model⁵⁹ and is specifically designed for effective early detection and differentiation of structural damage severity in SHM. This methodology integrates advanced techniques to preprocess raw sensor data, encode complex spatiotemporal relationships, and accurately reconstruct the data for anomaly detection. The process includes data smoothing through a sliding window approach, sophisticated encoding using Time2Vec embeddings, spatiotemporal graph learning, and sensor-specific temporal self-attention. The decoder part of U-GraphFormer is enhanced by incorporating edge convolution and transformer layers connected via a U-Net structure, which allows for the seamless integration of high-resolution and low-resolution features, ensuring more precise reconstruction of the data. Anomalies are identified by comparing the reconstructed data to the original and applying thresholding to detect significant deviations. For early detection, the model continuously monitors the reconstruction errors, raising an alarm when these errors exceed predefined thresholds, allowing for timely intervention and maintenance to prevent further deterioration. For severity assessment, anomaly scores are analyzed over specific time segments, with the mean and standard deviation calculated to provide insights into the extent and progression of the damage. These statistical measures facilitate a more informed and proactive approach to maintenance and intervention, ensuring that the U-GraphFormer architecture not only detects anomalies early but also accurately assesses the severity of structural damage.

Figure 1.

The architecture of U-GraphFormer. (1) Data preprocessing: Moving average is applied independently to each sensor’s data, smoothing out noise and highlighting trends. (2) Encoder: The encoder uses Time2Vec embeddings and a multilayer stack of GraphFormer Blocks to capture both temporal patterns and spatial relationships in the sensor data. (3) Decoder: Mirroring the encoder’s architecture, the decoder employs GraphFormer blocks connected via a U-Net structure to reconstruct the data from encoded representations. (4) Reconstruction: The decoder’s output is passed through a final linear layer to produce the reconstructed data, which is then compared to the original moving mean data. (5) Anomaly detection and severity assessment: Reconstruction errors and anomaly scores are calculated and significant deviations are flagged as anomalies using Tail-p thresholding. An alarm is raised when these errors exceed the threshold, and the mean and standard deviation of reconstruction errors over specific time segments are analyzed to assess the extent and progression of the damage.

Data preprocessing

In this study, data preprocessing plays a crucial role in enhancing the accuracy of anomaly detection and localization within SHM. The preprocessing pipeline consists of several key steps that work together to prepare the data for effective analysis.

To begin with, moving average is applied to the time series data, which serves to smooth out noise and emphasize significant trends. This technique, calculated over a specified window size $W$ and applied with a stride $S$ , helps to reveal underlying patterns by reducing transient fluctuations. The smoothed data, denoted as $\bar{X}$ , highlights structural dynamics while minimizing noise, thereby improving the signal-to-noise ratio. This smoothing process is crucial for enabling the model to better reconstruct the data and detect subtle structural changes, ultimately leading to more accurate anomaly detection.

After smoothing, data are standardized to a mean of zero and a standard deviation of one, ensuring equal feature contribution. The standardized data ${\bar{X}}_{s}^{'}$ for each sensor is computed as:

${\bar{X}}_{s}^{'} = \frac{{\bar{X}}_{s} - mean (\bar{X})}{std (\bar{X})} .$ (1)

where $mean (\bar{X})$ and $std (\bar{X})$ represent the mean and standard deviation of the smoothed sensor data. This step ensures that differences in magnitude across sensors do not disproportionately influence the model’s learning process.

The final step in the preprocessing pipeline involves the implementation of an overlapping window approach. This technique divides the standardized data ${\bar{X}}^{'}$ into overlapping segments, which enhances the early detection and severity assessment of structural damage. By selecting an appropriate window size $l_{w}$ and stride $l_{s}$ , this method captures essential patterns, such as natural vibration modes, that are critical for identifying early signs of damage. The overlap between windows ensures high temporal resolution, enabling detailed analysis of each segment while maintaining computational efficiency.

Through empirical testing and cross-validation, the optimal window size and stride were determined to maximize precision, recall, and F1 score, leading to effective damage detection and localization. This overlapping window approach not only enhances detection accuracy but also supports real-time monitoring, allowing for timely assessments of damage severity.

Encoder

The encoder of the U-GraphFormer architecture, depicted in Figure 1, consists of four layers, each integrating an EdgeConv⁶¹ module with a parallel sensor-specific temporal self-attention (ParaAtten) module. The encoder processes the input sensor data to capture complex spatial-temporal relationships, enhancing data representation at each stage. Initiated by inputs from the Time2Vec⁶² module, this layered configuration gradually refines the features to improve the detection of spatiotemporal patterns in multivariate time series data.

Time2Vec embedding

Positional embedding is crucial in Transformer models to address their inherent limitation in recognizing sequence order. While traditional Transformers use sine and cosine functions to embed positional information, these methods often struggle with time series data, which can show both regular and irregular patterns. To overcome this, we employ Time2Vec,⁶² an advanced approach that differentiates between periodic and aperiodic patterns in time series data.

Time2Vec transforms each sensor’s moving mean data, $\bar{X_{s}}$ , into an enhanced $(m + 1)$ -dimensional embedding using a combination of linear and sinusoidal functions. The embedding is defined as follows:

$\begin{matrix} Time 2 Vec (\bar{X_{s}}) = [g_{0} (\bar{X_{s}}), g_{1} (\bar{X_{s}}), \dots, g_{m} (\bar{X_{s}})], \\ g_{0} (\bar{X_{s}}) = ω_{0} \bar{X_{s}} + ϕ_{0}, g_{j} (\bar{X_{s}}) = \sin (ω_{j} \bar{X_{s}} + ϕ_{j}) \\ for 1 \leq j \leq m . \end{matrix}$ (2)

Here, $ω_{j}$ and $ϕ_{j}$ are adjustable weights for each dimension, allowing the model to learn the periodic and aperiodic patterns in the time series data. Typically, 64 sinusoidal functions ( $m = 64$ ) are employed.

GraphFormer block

Each layer of the encoder uses an GraphFormer Block to dynamically update embeddings based on spatial-temporal relationships. The GraphFormer Block consists of two main components:

Edge Convolution: The EdgeConv module applies a graph convolution over the input data to capture spatial relationships. The input $x$ is reshaped into matrices with dimensions $[batch_size, l_{w} \times S, d_{en_in}^{(l)}]$ , where $d_{en_in}^{(0)} = 65$ , the dimension of the Time2Vec output, and $d_{en_in}^{(l)}$ are the dimensions of the outputs from their previous layers. The EdgeConv module constructs a $k$ -nearest neighbor graph, updating the embeddings $h_{p}^{(l)}$ for each point $p$ by combining the difference between each point and its neighbors in the sensor-time 2D space with the point’s original features:

$h_{p}^{(l + 1)} = \max_{j \in N (p)} (ReLU (Θ^{(l)} \cdot (h_{j}^{(l)} - h_{p}^{(l)}) + Φ^{(l)} \cdot h_{p}^{(l)})),$ (3)

where $N (p)$ is a set of points including the $k$ -nearest neighbors of the point $p$ , and $Θ^{(l)}$ and $Φ^{(l)}$ are layer-specific transformations. This process merges local and global contexts, emphasizing significant features through max-pooling.

Parallel sensor-specific temporal self-attention (ParaAtten): The attention mechanism employs ParaAtten to focus on temporal features unique to each sensor. The output from the EdgeConv module $h_{p}^{(l + 1)}$ is reshaped into a tensor $R^{(l)}$ with dimensions $[batch_size, l_{w}, S, d_{en_out}^{(l)}]$ and then permuted to $P^{(l)}$ with dimensions $[batch_size, S, l_{w}, d_{en_out}^{(l)}]$ . The ParaAtten mechanism independently processes features for each sensor using Multi-Head Scaled Dot-Product Attention. For the $l$ -th layer, the Query (Q), Key (K), and Value (V) matrices are computed from the permuted output $P^{(l)}$ as follows:

$Q_{s}^{(l)} = P^{(l)} \times W^{Q (l)}, K_{s}^{(l)} = P^{(l)} \times W^{K (l)}, V_{s}^{(l)} = P^{(l)} \times W^{V (l)} .$ (4)

The attention output for each sensor in the $l$ -th layer is then computed by:

$Attention (Q_{s}^{(l)}, K_{s}^{(l)}, V_{s}^{(l)}) = softmax (\frac{Q_{s}^{(l)} {(K_{s}^{(l)})}^{T}}{\sqrt{d_{en_out}^{(l)}}}) V_{s}^{(l)} .$ (5)

This attention output is further processed with a feedforward network and added residual connections, enhancing the temporal feature representation. The resulting tensor from each layer is permuted back to the original dimensions ${R^{'}}^{(l)}$ with $[batch_size, l_{w}, S, d_{en_out}^{(l)}]$ , where $d_{en_out}^{(l)}$ are set to be 256, 512, 1024, and 1024, respectively for $l = 0, 1, 2, 3$ , enhancing the encoder’s ability to detect complex patterns and anomalies in the multivariate time series data.

Decoder

Multi-level GraphFormer decoder blocks

As shown in Figure 2, the decoder in U-GraphFormer is central to the model’s performance, building upon the encoder’s architecture while introducing critical enhancements for improved reconstruction and anomaly detection. This core component features a multi-level GraphFormer structure, incorporating U-Net style skip connections that integrate features across different resolutions and scales, significantly enhancing the model’s ability to reconstruct sensor data with high fidelity.

Figure 2.

The architecture of U-GraphFormer featuring a multi-level encoder–decoder structure with skip connections. The encoder consists of four layers, each employing EdgeConv blocks followed by a transformer layer and reshaping step. The decoder mirrors the encoder’s structure, integrating high-resolution and low-resolution features through U-Net-style skip connections.

The decoder mirrors the structure of the encoder but incorporates added complexity to effectively capture and refine multi-scale features essential for accurate anomaly detection. It processes the encoded data by progressively refining it through multiple layers that blend high- and low-resolution features. Skip connections link these layers, enabling the model to retain and utilize information from earlier encoding stages, ensuring that crucial details are preserved during the reconstruction process.

The decoder starts with the final encoder layer’s output, ${R^{'}}^{(3)}$ , which is reshaped into a matrix of dimensions $[batch_size, l_{w} \times S, d_{dec_in}^{(0)}]$ , where $d_{dec_in}^{(0)} = d_{en_out}^{(3)} = 1024$ . This reshaped matrix becomes the input for the first decoding level.

For each subsequent decoder layer $l$ (where $l \geq 1$ ), the input is a concatenation of the output from the previous decoder layer $D^{(l - 1)}$ and the corresponding encoder layer output ${R^{'}}^{(3 - l)}$ , resulting in dimensions $d_{dec_in}^{(l)} = [2048, 1024, 512]$ for layers 1, 2, and 3, respectively:

$D^{(l)} = EdgeConv Attention Block ([D^{(l - 1)}, R'^{(3 - l)}]) .$ (6)

This concatenation ensures multi-scale feature integration, improving reconstruction quality. The significance of these multi-level features is profound. By merging high-resolution (local) and low-resolution (global) features, the decoder can accurately reconstruct the original sensor data, capturing both subtle anomalies and broader patterns in the structural health data. This capability is crucial for enabling the model to accurately detect anomalies and precisely assess their severity.

Each layer in the decoder includes a GraphFormer Block, which is crucial for processing and refining the encoded features. These blocks consist of two main components: EdgeConv and ParaAtten, which work together to model the relationships between different sensors and time steps effectively. Skip connections in a U-Net style architecture allow the decoder to incorporate information from earlier stages of encoding, ensuring that detailed features are preserved and utilized in the reconstruction.

The skip connections serve a dual purpose; they prevent the loss of spatial and temporal information as the data are passed through multiple layers, and they facilitate the integration of multi-level features, leading to a more accurate and detailed reconstruction of the original sensor data.

The multi-level refinement process in the decoder is particularly significant for anomaly detection and severity assessment. By effectively integrating features from different scales, the decoder ensures that the reconstructed data retains the essential characteristics of the original signals, allowing for precise detection of both subtle and significant anomalies. This multi-level approach also enhances the model’s ability to differentiate between various types of structural damage, leading to more accurate and reliable severity assessments.

The output of each decoder layer $D^{(l)}$ is transformed back to the original dimensions ${D^{'}}^{(l)}$ . Specifically, the output is reshaped to the dimensions $[batch_size, l_{w}, S, d_{dec_out}^{(l)}]$ , where $d_{dec_out}^{(l)}$ values are set to 1024, 512, 256, and 64 for layers 0, 1, 2, and 3, respectively. The reconstructed output is then passed through a final linear layer, reducing it to the original sensor data dimensions: $[batch_size, l_{w}, S]$ . This reconstructed data are ready for further analysis, where it can be used to detect anomalies and assess the severity of structural damage.

Reconstruction

The output from the decoder, denoted as ${D^{'}}^{(3)}$ , is passed through a linear layer to reconstruct the original sensor’s moving mean data. This linear layer transforms the high-dimensional output back to the original input space. For each time step $t$ within a window of size $l_{w}$ and for each sensor $s$ , the reconstructed data $\hat{X}$ is obtained by:

${\hat{X}}_{t, s} {=^{d}}^{(3)} \sum_{j = 1}^{dec_out} D'_{t, s, j}^{(3)} \cdot W_{j} + b,$ (7)

where ${\hat{X}}_{t, s}$ represents the reconstructed data for the $t$ -th time step within the window and $s$ -th sensor, $D'_{t, s, j}^{(3)}$ is the output from the decoder for the $t$ -th time step, $s$ -th sensor, and $j$ -th feature, and $W_{j}$ and $b$ are the weights and biases of the linear layer, respectively.

During the unsupervised training phase, the model is trained exclusively on data from normal operational conditions. The model aims to learn and recognize standard patterns in the data. The reconstructed data $\hat{X}$ is compared to the original moving mean data $\bar{X}$ to compute the reconstruction error. This error quantifies the difference between the reconstructed output and the expected sensor readings, indicating how well the model has captured the underlying patterns in the data.

The reconstruction error, measured using the mean squared error (MSE) loss function, is calculated as follows:

$L_{MSE} = \frac{1}{l_{w}} \sum_{t = 0}^{l_{w} - 1} \sum_{s = 1}^{S} ‖ {\hat{X}}_{t, s} - {\bar{X}}_{t, s} ‖^{2},$ (8)

where ${\hat{X}}_{t, s}$ is the reconstructed data, and ${\bar{X}}_{t, s}$ is the original moving mean data for the $t$ -th time step within the window and $s$ -th sensor. This MSE measures the deviation of the reconstructed output from the expected sensor readings’ moving mean. During unsupervised training, minimizing this loss function helps the model learn the normal patterns and behaviors within the data, which is critical for accurately identifying anomalies during testing.

Anomaly detection

Anomaly scoring

This stage of the architecture focuses on identifying anomalies by calculating the reconstruction error $E_{t}^{s} = ‖ {\hat{x}}_{t}^{s} - x_{t}^{s} ‖$ for each sensor at every timestep, where $\hat{x}$ is the predicted signal and $x$ is the actual signal. This error serves as a key indicator of anomalies.

In SHM, accurate and timely detection of anomalies is crucial for maintaining the integrity and safety of structures. Traditional anomaly detection methods often rely on static scoring mechanisms, which may not effectively capture minor anomalies or adapt to evolving data trends. To address these limitations, we propose a segment-level adaptive scoring approach that selectively uses either base anomaly scoring or Gauss_D_K depending on the characteristics of each segment. This adaptive approach aims to enhance the sensitivity and robustness of anomaly detection across various damage scenarios.

Base anomaly scoring: Once the reconstruction errors $E_{t}^{s}$ have been determined, the next step is to convert these errors into meaningful anomaly scores. This is achieved by utilizing statistical properties derived from the training phase. The anomaly score $A_{t}^{base}$ at each time step $t$ reflects the likelihood of an anomaly occurring, based on the distribution of reconstruction errors observed during training.

We assume that the reconstruction errors for each sensor follow a specific statistical distribution characterized by parameters $θ_{s}$ , which include the mean and variance encapsulating the system’s normal behavior as learned from the training data.

The probability of observing a reconstruction error $E_{t}^{s}$ under the learned distribution is calculated using the cumulative distribution function (CDF). This probability $P (E_{t}^{s})$ is transformed into a log-probability to enhance numerical stability and ease of interpretation:

$P (E_{t}^{s}) = \log (1 - CDF (E_{t}^{s} | θ_{s})), A_{t}^{base} = - \sum_{s = 1}^{S} \log P (E_{t}^{s}) .$ (9)

We obtain a vector of log probabilities for each time step by computing the log probabilities for all sensors. The aggregated anomaly score $A_{t}^{base}$ at time step $t$ is then determined by summing these log-probabilities across all sensors. In this formulation, a higher anomaly score $A_{t}^{base}$ indicates a greater likelihood of an anomaly, as it signifies that the observed reconstruction errors deviate significantly from what is expected under normal operating conditions.

Gauss_D_K⁶³: While the base anomaly scoring method is intuitive and effective, certain scenarios in SHM require a more sophisticated approach to handle temporal misalignments in sensor readings. Gauss_D_K was introduced to address these challenges. By applying a Gaussian kernel, this method smooths the anomaly scores, reducing noise and aligning temporal differences across sensors.

Anomaly scores $a_{i}^{t}$ are calculated using the CDF of the standard normal distribution:

$a_{i}^{t} = - \log (1 - Φ (\frac{E_{i}^{t} - μ_{i}^{t}}{σ_{i}^{t}})),$ (10)

where $E_{i}^{t}$ is the error at time $t$ , $μ_{i}^{t}$ is the moving average, and $σ_{i}^{t}$ is the variance of the error over a sliding window. These scores are then smoothed by convolving with a Gaussian kernel $G$ :

$a_{i}^{t; Gauss_D_K} = G * a_{i}^{t}, A_{t}^{Gauss_D_K} = ⌊_{i = 1}^{N} a_{i}^{t; Gauss_D_K},$ (11)

where * denotes convolution and the overall health $A_{t}^{Gauss_D_K}$ is evaluated by summing the smoothed anomaly scores across all sensors.

The scoring process involves dynamically updating the local mean and variance of sensor errors over a sliding window, which helps in capturing ongoing trends and subtle changes in the data. The resulting anomaly scores are then convolved with a Gaussian kernel, which enhances the temporal alignment of anomalies detected across different sensors. This additional scoring mechanism is particularly beneficial in complex scenarios where structural changes might not be immediately apparent or are spread over time. By summing the smoothed anomaly scores, Gauss_D_K provides a more unified and consistent analysis, ensuring more reliable detection of anomalies and offering a clearer picture of the system’s overall health.

Segment-level self-adaptive scoring decision mechanism: As shown in Figure 3, the process of determining the appropriate scoring method for each segment begins with calculating the Segment-Level Anomaly Scores, which include the mean and standard deviation of the base anomaly scores $A_{t}^{base}$ for both the testing segment and the healthy segment. These statistical measures provide a snapshot of the anomaly characteristics within each segment.

Figure 3.

Flowchart of the segment-level decision mechanism.

Furthermore, a threshold $θ$ is established to differentiate between normal and abnormal segments. This threshold is calculated based on the statistical properties of the healthy segment, using the formula:

$θ = μ_{A}^{healthy} + f \cdot σ_{A}^{healthy},$ (12)

where $μ_{A}^{healthy}$ represents the mean anomaly score, and $σ_{A}^{healthy}$ denotes the standard deviation for the healthy segment. The factor $f$ , set to 2 in our experiments, controls the sensitivity of the threshold. This approach ensures that the threshold is tailored to the specific characteristics of the healthy data, providing a more accurate basis for comparison.

Once the threshold $θ$ is defined, the model selects the appropriate scoring method for each segment. If the mean anomaly score ${\bar{A}}_{segment}^{base}$ of the testing segment exceeds $θ$ , it indicates that the segment may contain significant anomalies, and the model continues using base anomaly scoring $A_{t}^{base}$ for time-wise detection. However, if the mean anomaly score ${\bar{A}}_{segment}^{base}$ is below the threshold, the model switches to using Gauss_D_K $A_{t}^{Gauss_D_K}$ . This adaptive approach ensures that the scoring method is always aligned with the severity of the detected anomalies, enhancing the model’s ability to accurately and reliably detect structural damage.

Empirical validation: Our empirical observations confirm that the proposed segment-level decision method effectively detects both serious and minor damages. For serious damages, segments tend to exceed the threshold $θ$ , favoring base scoring due to large deviations from normal behavior. For minor damages, segments typically fall below the threshold, where Gauss_D_K enhances detection sensitivity by accounting for recent data behavior. This method addresses the limitations of traditional scoring methods by enhancing sensitivity to minor anomalies while maintaining robustness for serious damages. Our segment-level decision strategy, along with the Gauss_D_K function, ensures effective and reliable anomaly detection, contributing to improved SHM and maintenance.

Thresholding

The computed anomaly score is subjected to thresholding to identify significant deviations. Errors that exceed a predefined threshold are flagged as anomalies, indicating potential structural damage. This approach allows for early detection and differentiation of the severity of the damage, facilitating timely intervention and maintenance.

Tail-p thresholding⁶³: Tail-p thresholding identifies significant deviations across multiple sensors by summing the negative log probabilities across $S$ sensors. A time point is flagged as anomalous if its score $A_{t}$ exceeds the threshold $- S_{10} (ϵ)$ , where $S$ is the number of sensors and $ϵ$ is a small tail probability optimized for the specific application. This method is particularly suitable for real-time streaming scenarios, ensuring that the chosen $ϵ$ value is applied consistently across similar data streams to maintain robustness and sensitivity in detecting significant anomalies. Whether using base anomaly scoring or Gaussian kernel scoring, the process involves calculating the anomaly score $A_{t}$ for each time point, determining the threshold based on $S$ and $ϵ$ , and flagging time points as anomalous if their scores exceed this threshold. This ensures reliable identification of significant anomalies, enabling timely, and effective maintenance interventions.

Our Tail- $p$ threshold selection is based on established practices,^63,64 and previous comprehensive evaluations conducted in our prior work EdgeConvFormer.⁵⁹ Empirical findings consistently indicate that performance metrics are minimally sensitive to the precise selection of the threshold within a range of values ( $ε \in 10^{- 1}, 10^{- 2}, 10^{- 3}, 10^{- 4}, 10^{- 5}$ ), enabling effective generalization of the threshold setting across diverse multivariate time series datasets. Therefore, we adopted this robust thresholding approach directly in U-GraphFormer, ensuring reliable detection outcomes across all presented case studies without additional parameter tuning.

Early detection of damage

Early detection of potential structural damage is critical for timely intervention and maintenance, helping to prevent further deterioration. Building on the segment-level decision mechanism and the thresholding method described previously, the following process is used to achieve early detection: Once the appropriate anomaly scores $A_{t}$ are calculated using either base scoring or Gauss_D_K, the Tail-p thresholding method is applied to identify significant deviations. When the anomaly score $A_{t}$ exceeds the threshold, it indicates a potential anomaly. However, rather than raising an alarm immediately, a more robust approach is used based on the ratio of positive anomalies within a segment.

The process starts by continuously monitoring anomaly scores, $A_{t}$ . Tail-p thresholding, determined in advance, is applied to each score. For each segment, the percentage of time steps where $A_{t}$ exceeds the threshold is calculated, representing the proportion of positive anomalies. If this proportion exceeds a predefined threshold (set to 50% in our experiment), an alarm is triggered, indicating potential damage. By leveraging the law of large numbers,⁶⁵ the proportion of detected anomalies within a data segment converges to the true anomaly rate through statistical aggregation. This approach minimizes false positives and false negatives by filtering transient outliers while reliably identifying sustained structural deviations (e.g., high-frequency jitter or persistent baseline shifts in the signal).^65,66 Once the alarm is triggered, the first time step exhibiting a positive anomaly is identified as the starting point of the damage.

The alarm mechanism, triggered by the ratio of positive anomalies, acts as an early warning system, enabling timely inspection and intervention. This early detection approach is designed to facilitate timely maintenance actions to address potential structural issues before they worsen. It provides a comprehensive and unified view of the system’s health by aggregating data from multiple sensors, and ensures robustness by reducing false positives through the ratio-based detection method. The immediate alert triggered when the ratio of positive anomalies exceeds the threshold enables rapid identification of potential damage, allowing maintenance teams to take swift action. This early detection not only helps prevent further deterioration but also plays a crucial role in ensuring the safety and longevity of the structure. When a segment triggers an alarm, identifying the first positive time step as the onset of damage provides valuable information for understanding the origin of structural issues and planning effective remedial actions.

Severity assessment of damage

Following the early detection of potential anomalies, assessing the severity of the detected damage is crucial for prioritizing maintenance efforts and determining the appropriate remedial actions. Once an alarm is raised for a segment, a detailed severity assessment is conducted to gauge the extent of the damage.

In this work, the severity assessment process involves analyzing anomaly scores over specific time segments, where the mean and standard deviation of these scores are computed to gain insights into the extent and progression of the damage. The effectiveness of this approach lies not merely in the computation of these statistical measures, but in the inherent quality of the anomaly scores generated by the model.

The mean reconstruction error $μ_{A}$ and the standard deviation of the anomaly scores $σ_{A}$ for a segment of length $N$ are calculated as:

$μ_{A} = \frac{1}{N} \sum_{t = 1}^{N} A_{t}, σ_{A} = \sqrt{\frac{1}{N} \sum_{t = 1}^{N} {(A_{t} - μ_{A})}^{2}}$ (13)

The mean anomaly score serves as a reliable indicator of the overall level of structural anomalies detected, with higher mean values suggesting more significant deviations and, consequently, more severe damage. The standard deviation provides insight into the variability of these anomalies; higher values imply sporadic or inconsistent anomalies, potentially indicating intermittent issues or fluctuating severity levels.

What sets this work apart from previous approaches, including EdgeConvFormer, is the informativeness of the anomaly scores produced by U-GraphFormer. The model’s advanced architecture, particularly its U-Net-inspired encoder–decoder structure and refined graph learning capabilities, generates anomaly scores that are inherently more reflective of true structural conditions. This means that even the simple computation of mean and standard deviation from these scores yields highly reliable severity assessments. The anomaly scores produced by U-GraphFormer are more stable and consistent across varying levels of damage, making them a robust foundation for severity assessment.

Unlike other models, which may produce noisy or less consistent scores requiring complex post-processing to achieve accuracy, U-GraphFormer’s scores are directly meaningful. The simplicity of using mean and standard deviation for severity assessment is a testament to the model’s ability to capture and represent structural anomalies with high fidelity. This approach ensures a consistent and accurate ranking of damage severity, aligned closely with the ground truth, offering a clear advantage over previous methods.

In conclusion, the severity assessment methodology in U-GraphFormer leverages the intrinsic quality of the anomaly scores, ensuring that the computed mean and standard deviation not only provide a clear picture of the current state but also enable precise and reliable decision-making for maintenance and intervention strategies. This approach represents a significant advancement over the EdgeConvFormer model, demonstrating the enhanced capability of U-GraphFormer to deliver actionable insights into structural health.

Evaluation metrics

Metrics for early detection: The early detection capability of the model is evaluated using standard time-wise anomaly detection metrics: Precision, Recall, and the $F_{1}$ score. These metrics assess the model’s accuracy in identifying anomalies at each time step. Precision measures the proportion of true positives among all identified anomalies, Recall evaluates the proportion of actual anomalies correctly detected, and the $F_{1}$ score provides a balanced measure of the model’s performance by considering both Precision and Recall.

Metric for severity assessment: To assess the model’s effectiveness in evaluating damage severity, we introduce the Ranking Accuracy (Spearman’s rank correlation) metric.⁶⁷ This metric evaluates how well the anomaly scores correlate with the true severity ranking of the different damage scenarios. The Ranking Accuracy is calculated using Spearman’s rank correlation coefficient $ρ$ , which measures the strength and direction of association between two ranked variables. It is defined as:

$ρ = 1 - \frac{6 \sum d_{i}^{2}}{n (n^{2} - 1)}$ (14)

where $d_{i}$ is the difference between the ranks of corresponding values, and $n$ is the number of observations. The constant “6” in Spearman’s rank correlation formula normalizes the rank differences, ensuring $ρ$ falls within the range of −1 to 1.

In this context, $n$ is the number of damage scenarios, and $d_{i}$ is the difference between the rank of the mean anomaly score predicted by the model and the true rank based on actual damage severity. A $ρ$ value close to 1 indicates a strong correlation, showing that the model’s severity assessment aligns well with the true damage ranking.

Case study

Phase I IASC–ASCE benchmark

As shown in Figure 4, the Phase I IASC–ASCE SHM Benchmark Problem, organized by IASC and ASCE, evaluates SHM algorithms using simulated data from a 4-story, 2-bay by 2-bay 3D steel-frame model structure. This benchmark includes datasets from healthy conditions and six distinct damage patterns, progressing from easily detectable extreme damage to more challenging cases. The severity ranking, based on visual representation, places Damage2 as the most severe, followed by Damage1, Damage5, Damage4, Damage3, and Damage6 as the least severe.

Figure 4.

The six damage patterns (a-g) and their severity ranking.⁶⁸

The ASCE Benchmark Problem outlines five cases designed to evaluate SHM techniques, each varying in complexity with respect to load distribution and structural symmetry. These cases include both symmetric and asymmetric structures, with degrees of freedom (DOF) ranging from 12 to 120, and different load application points, such as all stories or roof-only. Among these, we have chosen to focus on Case 5 due to its complexity and relevance to real-world applications. This case features a 120-DOF asymmetric structure subjected to roof-level loading, making it particularly effective for testing the robustness of advanced anomaly detection and damage assessment methods. The asymmetry in Case 5 is introduced by replacing one 400-kg floor slab on the roof with a 550-kg slab, resulting in a roof configuration of three 400-kg slabs and one 550-kg slab. Additionally, instead of distributed floor excitations, a shaker is positioned at the top of the center column on the roof to simulate specific dynamic events. Gaussian noise is also added to the acceleration data to represent sensor measurement noise, further simulating real-world conditions. These settings allow for a precise simulation of dynamic events, providing a rigorous challenge for SHM techniques to accurately capture the structure’s response. More details on this benchmark can be found in Johnson et al.⁶⁸

Data generation

In the training phase of our model, we efficiently processed 45 min of high-frequency data sampled at 1000 Hz using a sliding window with a window size of 256 and a stride of 90, resulting in 29,998 time steps. Both the training and validation sets were derived from this single continuous 45-min segment of healthy condition data, split in an 8:2 ratio. During training, the model was exclusively exposed to this segment to learn patterns characterizing a healthy structural state.

For the test dataset, which was also sampled at 1000 Hz using a sliding window with a window size of 256 and a stride of 90, data were collected over durations of 1, 2, 4, 6, and 8 min for both the healthy state and the six damage patterns described earlier. This approach aimed to determine the optimal duration for early damage detection and severity assessment. Analyzing various time segments helped us understand how different observation lengths impact the model’s capability to accurately and promptly detect anomalies and assess structural damage severity. By comparing the model’s performance across these segments, we identified the most effective observation window for early damage detection, ensuring timely interventions and thorough damage severity assessments. This comprehensive strategy ensures that the SHM algorithms are robust and effective across a range of damage scenarios and observation periods.

As shown in Table 1, during the training phase, we convert the healthy dataset into a set of sub-sequences using overlapping windows of length $l_{w} = 100$ with a stride of $l_{s} = 10$ . Important parameters such as the neighbor size $k = 2$ and the Tail- $p$ threshold probability $ϵ$ was chosen based on validation performance. As shown in Table 2, we systematically evaluated thresholds from $10^{- 1}$ to $10^{- 5}$ . The optimal value, $ϵ = 10^{- 2}$ , achieved the best balance between precision (0.99) and recall (0.98), confirming its robustness and suitability for accurate anomaly detection in this application. Model selection is based on the minimum validation reconstruction error criterion. The optimization process utilizes the Adam optimizer with an initial learning rate of $10^{- 4}$ . Training is early stopped within 100 epochs, employing a batch size of 32. The training process was executed on 4 NVIDIA TITAN V 12 GB GPUs (NVIDIA Corporation, Santa Clara, CA, USA). The model was trained once, with the trained model and its parameters saved for subsequent testing.

Table 1.

U-GraphFormer configuration parameters for the ASCE dataset.

Parameter	Value
Moving average window size ( $W$ )	256
Moving average stride ( $S$ )	90
Sliding window size ( $l_{w}$ )	100
Sliding window stride ( $l_{s}$ )	10
Number of epochs	100
Neighbor size $k$	2
Embedding dimension of Time2Vec	64
Input dimensions of 4 encoderlayers	[64, 256, 512, 1024]
Hidden dimensions of 4 encoder layers	[256, 512, 1024, 1024]
Hidden dimensions of 4 decoderlayers	[1024, 2048, 1024, 512]
Output dimensions of 4 decoderlayers	[1024, 512, 256, 64]
Learning rate	1e-4
Batch size	64
Number of layers	4
Dropout	0.05
Patience of early stop	5
$ϵ$ in Tail-p thresholding	$10^{- 2}$

Table 2.

U-GraphFormer’s performance on the IASC–ASCE Phase I benchmark for damage 1 under varying Tail- $p$ thresholds.

Threshold method	Overall precision	Overall recall	Overall F1	Seg.-wise Det. Acc.
Tail- $p$ ( $ε = 10^{- 1}$ )	0.78	0.99	0.88	1.00
Tail- $p$ ( $ε = 10^{- 2}$ )	0.99	0.98	0.98	1.00
Tail- $p$ ( $ε = 10^{- 3}$ )	1.00	0.94	0.97	1.00
Tail- $p$ ( $ε = 10^{- 4}$ )	1.00	0.91	0.95	1.00
Tail- $p$ ( $ε = 10^{- 5}$ )	1.00	0.87	0.93	1.00

During the training phase, our model processed 45 min of high-frequency data sampled at 1000 Hz by employing average moving smoothing to improve reconstruction accuracy. This technique involved applying a sliding window with a size of 256 and a stride of 90, resulting in 29,998-time steps. Thanks to an early stopping criterion, the training was completed in 48 min and 56 s, with each iteration averaging 85.36 s. This efficiency is impressive given the high sampling rate and the complexity of handling numerous time steps.

In the testing phase, we analyzed a 4-min segment of data, also sampled at 1000 Hz, using identical sliding window parameters. This resulted in 2666 time steps. The model evaluated the test data in under 2 min, demonstrating its ability to deliver timely insights. This performance highlights the model’s ability to efficiently handle large volumes of high-frequency data, allowing for quick detection and response to structural changes.

Early detection and severity assessment of damages in testing

Early detection: To evaluate the model’s early detection performance, we created six testing segments, each consisting of 1-min healthy data combined with 1-min data from different severity damage scenarios (Damage1–Damage6). With a sample frequency of 1000 Hz, each segment contributes an equal length of 60,000 data points, resulting in a total test dataset of 120,000 data points. Preprocessing was performed using a sliding window with a window size of 256 and a stride of 90, yielding 1328 time points. As the reconstruction is window-based, utilizing overlapping windows of length $l_{w} = 100$ during the training and testing phases, we discard the first 100 time points for both the healthy and damage segments, resulting in 1228 time points in total. This initial window often includes a transition phase from healthy to damaged states, creating a splicing effect that does not accurately represent the true characteristics of the damage. By removing these initial rows, we focus our analysis on the stable, steady-state data, free from artifacts and distortions caused by transitions. This approach allows for a fair comparison and enhances the robustness and reliability of our anomaly detection and severity assessment.

Our objective is to assess the model’s capability to accurately and promptly detect the onset of damage within these segments. Anomaly scores were initially computed using base scoring to evaluate the preliminary severity. Based on this evaluation, a decision was made whether to further apply Gauss_D_K according to the selection method described in part Segment-Level Self-Adaptive Scoring Decision Mechanism of section “Anomaly detection.” The detection threshold was established based on the training data, and anomalies were identified when the anomaly score exceeded this threshold. This approach allows for a comprehensive evaluation of the model’s effectiveness in early damage detection across varying severity levels.

Severity assessment: To assess the severity of damage, we collected data over durations of 1, 2, 4, 6, and 8 min for both the healthy state and the six different severity damage scenarios (Damage1–Damage6). Each duration-specific dataset was preprocessed using a sliding window with a window size of 256 and a stride of 90. This preprocessing step yielded various time points depending on the duration of the dataset. We tested these seven segments separately to evaluate the model’s performance in assessing the severity of structural damage. Anomaly scores were computed for each segment using base scoring to determine the preliminary severity. The mean and standard deviation of these anomaly scores were then calculated for each damage scenario and duration.

As mentioned before, the base anomaly score is more effective in assessing the severity of damage compared to the Gauss_D_K anomaly score. The base anomaly score captures the overall deviation from normal behavior more directly, making it more sensitive to significant structural changes. Higher mean anomaly scores indicate more substantial structural deviations, while higher standard deviations suggest greater variability and inconsistency in the detected anomalies. This analysis provides critical insights into the extent and variability of the damage, enabling a nuanced understanding of the structural health. The rationale behind the effectiveness of the base anomaly score lies in its ability to reflect the immediate and cumulative impact of damage on the structure, whereas the Gauss_D_K score may sometimes smooth out critical deviations due to its kernel-based approach. We will validate this observation in our experiments by comparing the performance of both scoring methods across different damage scenarios and durations.

By comparing the mean and standard deviation of the anomaly scores across different durations, we can evaluate the effectiveness of the model in detecting and assessing the severity of damage over time. This comprehensive assessment helps in planning appropriate maintenance and intervention strategies, ensuring timely and accurate responses to various levels of structural damage.

Experimental results and discussions

We conducted a comprehensive series of experiments on the test dataset to evaluate the model’s performance in early detection of damage and severity assessment.

Early detection of damage: Early detection of structural damage is crucial for ensuring timely interventions and minimizing the risk of severe damage. In this context, our model employs a segment-wise alarm mechanism based on the ratio of time-wise point anomaly scoring, providing an effective strategy for early detection across different damage severities.

The model continuously monitors the anomaly score at each time point. When these scores exceed predefined thresholds, they contribute to the time-wise point anomalies. Within each segment, the model calculates the ratio of these anomalies—instances where the anomaly score $A_{t}$ exceeds the threshold. If this ratio surpasses 50%, the model raises a segment-wise alarm, signaling the detection of potential damage. This approach ensures that alarms are triggered only when a significant proportion of the segment indicates abnormal behavior, allowing for timely intervention and maintenance to prevent further deterioration.

Table 3 presents a comparative analysis of Base Anomaly Scoring and Gauss_D_K Anomaly Scoring across different test segments, combining healthy data with each damage pattern (Damage1–Damage6). The baseline for the healthy segment has a mean of 13.87 and a standard deviation of 5.23, setting the threshold $θ$ at $θ = 13.87 + 2 \times 5.23 = 24.33$ . Segments with mean $A_{segment}^{base}$ larger than 13.87 but less than 24.33 use Gauss_D_K scoring, while those with mean $A_{segment}^{base}$ larger than 24.33 adopt base anomaly scoring. This adaptive scoring strategy ensures that the model can handle both severe and minor damages effectively.

Table 3.

Comparison of base anomaly scoring and Gauss_D_K anomaly scoring under different test segments on metrics precision, recall, and F1 score for time-wise anomaly detection. Test segments include combinations of healthy state with each damage pattern (Damage1–Damage6). The baseline for the healthy segment has a mean of 13.87 and a standard deviation of 5.23, determining the threshold $θ$ as $θ = 13.87 + 2 \times 5.23 = 24.33$ . Segments with mean $A_{segment}^{base}$ larger than 13.87 but less than 24.33 use Gauss_D_K scoring, while those with mean $A_{segment}^{base}$ larger than 24.33 adopt base anomaly scoring. The results highlighted in red indicate that for serious damages (Healthy + Damage1 and Healthy + Damage2), base anomaly scoring achieves high precision, recall, and F1 scores, effectively capturing significant structural deviations. Conversely, for less severe damages (Healthy + Damage3 to Healthy + Damage6), where the mean base anomaly scores are below the threshold, Gauss_D_K scoring provides better performance. The cells highlighted in pink indicate the best-performing values in terms of precision, recall, or F1 score.

Test segment	Mean of $A_{segment}^{base}$	Base anomaly scoring			Gauss_D_K anomaly scoring
		Precision	Recall	F1 score	Precision	Recall	F1 score
Healthy + Damage1	380.81	0.9910	0.9770	0.9839	0.6995	0.7926	0.7431
Healthy + Damage2	796.03	0.9907	0.9486	0.9692	0.7337	0.9965	0.8451
Healthy + Damage3	14.69	0.5080	0.2801	0.3611	1.0000	0.6099	0.7577
Healthy + Damage4	15.22	0.5112	0.2837	0.3649	0.8469	0.7358	0.7875
Healthy + Damage5	15.36	0.5321	0.3085	0.3906	0.8510	0.7695	0.8082
Healthy + Damage6	16.29	0.5678	0.3564	0.4379	0.7960	0.7057	0.7481

The effectiveness of this strategy is reflected in the time-wise precision, recall, and F1-score metrics, which determine the accuracy of early detection. Precision measures the proportion of true positive anomalies among all detected anomalies, ensuring that when an alarm is raised, it is likely due to genuine damage rather than false positives. High precision is crucial for minimizing unnecessary inspections or repairs. Recall indicates the model’s ability to detect all actual damage events. High recall is essential for early detection because missing any signs of damage can allow it to progress unnoticed, leading to potentially catastrophic failures. The F1-score, which balances precision and recall, provides a comprehensive measure of the model’s overall performance, making it a critical metric for evaluating the effectiveness of early detection.

For example, in the case of severe damage (Healthy + Damage1), the model achieves a precision of 0.99, recall of 0.98, and F1-score of 0.98 using Base Anomaly Scoring, as presented in Figure 5. The mean anomaly score for this segment is 747.74 with a standard deviation of 1837.00, and the overall mean anomaly score is 380.81, significantly above the threshold of 24.33. This demonstrates the model’s ability to promptly flag anomalies in alignment with the ground truth, capturing substantial deviations in structural behavior within a short timeframe. Moreover, the threshold-independent metrics corroborate this result: the Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves in Figure 6 yield an Area Under the Curve (AUC) of 0.991 and an average-precision (AP) of 0.994, respectively, demonstrating that the anomaly score separates healthy and Damage1 windows almost perfectly across all decision thresholds.

Figure 5.

Anomaly detection for the test segment combining healthy data with Damage1 using base anomaly scoring.

Figure 6.

ROC curve for the test segment combining healthy data with Damage1 using base anomaly scoring.

Conversely, in the case of the most minor damage scenario (Healthy + Damage6) as shown in Figure 7, where Damage6 reduces the stiffness of a single brace on the first floor to two-thirds of its original value, the model applies the Gauss_D_K scoring method. The mean anomaly score in this case is 17.14 with a standard deviation of 3.21, resulting in a precision of 0.80, recall of 0.71, and F1-score of 0.75. The overall mean anomaly score is 15.62, which falls below the base threshold, justifying the use of Gauss_D_K scoring to enhance detection sensitivity. Despite the subtlety of the damage, the segment-wise alarm accuracy for this scenario remains at 100%, demonstrating the model’s robustness in detecting even the most minor anomalies. Although the Damage6 case is intentionally subtle, the ROC analysis in Figure 8 still reports an AUC of 0.794 and an AP of 0.834, confirming that the score distribution retains sufficient contrast to support early detection—particularly once the Gauss_D_K anomaly-scoring is applied.

Figure 7.

Anomaly detection for the test segment combining healthy data with Damage6 using Gauss_D_K anomaly scoring.

Figure 8.

ROC curve for the test segment combining healthy data with Damage 6 using Gauss_D_K anomaly scoring.

The model’s segment-wise alarm accuracy is consistently 100% across all tested scenarios, including both serious and minor damages. This high level of accuracy indicates that the model reliably triggers alarms whenever there is a genuine structural issue, regardless of the severity of the damage. By leveraging both Base Anomaly Scoring and Gauss_D_K scoring within this framework, the model can adaptively detect early signs of damage, ensuring that even minor issues are identified before they escalate into significant problems. This approach not only optimizes the early detection process but also enhances the overall reliability of the model in diverse SHM scenarios.

Severity assessment of damage: Table 4 evaluates damage detection and severity across different segment lengths for various damage states, including Healthy, Damage1, Damage2, Damage3, Damage4, Damage5, and Damage6. Segment lengths of 1, 2, 4, 6, and 8 min are analyzed. The mean and standard deviation of anomaly scores are used to assess the severity of damage, with higher values indicating more severe damage.

Table 4.

Evaluation of damage detection and severity across different segment lengths. The table presents the mean and standard deviation of anomaly scores for different damage states: Healthy, Damage1, Damage2, Damage3, Damage4, Damage5, and Damage6. Segment lengths evaluated include 1, 2, 4, 6, and 8 min. The Ranking Accuracy (Rank Correlation) metric indicates how well the mean and standard deviation of the anomaly scores align with the true severity ranking. The 4-min segment length shows the best performance with a perfect ranking correlation, highlighted in yellow. The 2- and 6-min segment lengths also demonstrate strong performance with high-ranking accuracy, highlighted in light yellow.

The detailed examination of the anomaly scores across different segment lengths provides several key insights into the model’s performance and its implications for SHM. The 4-min segment length stands out as the most effective, striking an optimal balance between data granularity and detection accuracy. This segment length captures sufficient data to reliably detect anomalies while maintaining responsiveness to structural changes. The highlighted yellow rows in the table illustrate how the 4-min segments’ mean and standard deviation of anomaly scores closely match the true severity rankings, demonstrating both high sensitivity and precision. In contrast, the 2- and 6-min segments also show a good alignment with the true severity rankings but exhibit more variability in their standard deviations. This suggests that while these segments can still accurately detect damage, their detection precision may fluctuate slightly, indicating a trade-off between stability and sensitivity. The 1-min segment length, with its higher variability in anomaly scores, particularly in the standard deviation, indicates less stable detection performance. This shorter segment length is more susceptible to noise and transient variations, which can obscure the true signal of structural anomalies and lead to inconsistent detection results. On the other hand, the 8-min segment length displays lower anomaly scores, reflecting a less sensitive detection capability. The averaging effect over a longer duration may dilute the impact of transient damage events, causing critical anomalies to be smoothed out and potentially missed. This highlights the potential drawback of longer segments in failing to capture rapid or transient changes in structural health.

Overall, the analysis underscores the importance of selecting an appropriate segment length for effective damage detection. The 4-min segment length provides a robust and reliable framework for SHM, balancing the need for sensitivity to anomalies and stability in detection. These insights are crucial for optimizing monitoring strategies and ensuring timely and accurate maintenance interventions, ultimately contributing to the safety and longevity of structures.

The boxplot as shown in Figure 9 provides a comprehensive visualization of the model’s performance in detecting and differentiating various levels of structural damage severity. In the main plot, the anomaly scores for the most severe damage scenarios (Damage1 and Damage2) are substantially higher than those for less severe damage scenarios. This clear distinction underscores the model’s capability to identify and prioritize significant structural issues accurately.

Figure 9.

Boxplot of mean and standard deviation for all damage severities. This plot illustrates the distribution of time-wise anomaly scores across different damage severities, ranging from Healthy to Damage6. Due to the substantial difference in anomaly scores, a zoomed-in boxplot is included for the lesser severity groups (Healthy, Damage6, Damage3, Damage4, and Damage5) to highlight their anomaly scores, which are significantly smaller compared to the more severe damage scenarios (Damage1 and Damage2). The smaller boxplot ensures the visibility of anomaly scores for less severe damage phases, demonstrating the effective differentiation of damage severity by the model.

The zoomed-in inset plot highlights the lesser severity damage scenarios (Healthy, Damage6, Damage3, Damage4, and Damage5). This inset is crucial as it brings to light the subtle variations in anomaly scores that might be overshadowed by the higher scores of the more severe damages. Despite the smaller anomaly scores, the model effectively differentiates between these less severe scenarios, demonstrating its sensitivity to even minor structural anomalies.

The ranking of the anomaly scores across all scenarios is completely consistent with the ground truth, validating the model’s accuracy in assessing damage severity. The mean and standard deviation values for each damage scenario, presented in the plot, further confirm the model’s robust performance in providing a nuanced understanding of the structural health. This capability is essential for prioritizing maintenance interventions, thereby enhancing the safety and integrity of the structure. The visualization not only highlights the model’s effectiveness in early damage detection but also its precision in ranking the severity of damage, which is critical for timely and appropriate structural maintenance.

Model comparison

To assess the improvements introduced by U-GraphFormer, a comprehensive comparison was conducted with the previous EdgeConvFormer model.⁵⁹ The U-GraphFormer model incorporates a more sophisticated architecture to enhance its feature extraction and reconstruction capabilities. The results presented in Tables 5 and 6 demonstrate that U-GraphFormer significantly outperforms EdgeConv Former across various validation metrics, including precision, recall, and F1 score for time-wise anomaly detection, as well as segment-wise detection accuracy. The improved architecture allows for more accurate identification of structural damages across different test segments, leading to better overall performance in SHM applications.

Table 5.

Comparison of EdgeConvFormer and U-GraphFormer based on validation metrics.

Model	Validation loss	Validation reconstruction
EdgeConvFormer	9.198e-4	6.80e-2
U-GraphFormer	9.024e-7	2.85e-3

Table 6.

Comparison of EdgeConvFormer and U-GraphFormer under different test segments on metrics precision, recall, F1 score, and segment-wise detection results (Seg. detection). Test segments include combinations of healthy state with each damage pattern (Damage1–Damage6).

Test segment	EdgeConvFormer				U-GraphFormer
	Precision	Recall	F1 score	Seg. detection	Precision	Recall	F1 score	Seg. detection
Healthy + Damage1	0.5853	0.9309	0.7187	Yes	0.9910	0.9770	0.9839	Yes
Healthy + Damage2	0.9030	0.7926	0.8442	Yes	0.9907	0.9486	0.9692	Yes
Healthy + Damage3	0.5755	0.8848	0.6974	Yes	1.0	0.6099	0.7577	Yes
Healthy + Damage4	0.5485	0.8954	0.7029	Yes	0.8469	0.7358	0.7875	Yes
Healthy + Damage5	0.5804	0.9025	0.7065	Yes	0.8510	0.7695	0.8082	Yes
Healthy + Damage6	0.3210	0.3085	0.3146	Fail	0.7960	0.7057	0.7481	Yes

Table 5 provides a comparative analysis of the validation loss and validation reconstruction error for both models. The results clearly demonstrate the superior performance of U-GraphFormer. The model achieves a remarkably lower validation loss of $9.024 \times 10^{- 7}$ compared to $9.198 \times 10^{- 4}$ for EdgeConvFormer. Similarly, the validation reconstruction error is significantly reduced from $6.80 \times 10^{- 2}$ in EdgeConvFormer to $2.85 \times 10^{- 3}$ in U-GraphFormer. These metrics indicate a substantial improvement in the learning and generalization capabilities of the enhanced model.

Further insights can be drawn from the performance metrics on different test segments, as shown in Table 6. Each test segment consists of 1-min healthy data combined with 1-min data from various damage scenarios (Damage1–Damage6). The U-GraphFormer consistently outperforms the original model across all test segments in terms of precision, recall, and F1 score for time-wise anomaly detection. For example, in the “Healthy + Damage1” test segment, U-GraphFormer achieves a precision of 0.9910, recall of 0.9770, and F1 score of 0.9839, compared to EdgeConvFormer’s precision of 0.5853, recall of 0.9309, and F1 score of 0.7187. This significant improvement highlights the model’s ability to accurately and promptly detect anomalies.

Moreover, the U-GraphFormer shows remarkable performance even in more challenging scenarios such as “Healthy + Damage6,” where the damage is subtle and harder to detect. The enhanced model attains a precision of 0.7960, recall of 0.7057, and F1 score of 0.7481, significantly outperforming EdgeConvFormer which records lower precision (0.3210), recall (0.3085), and F1 score (0.3146).

U-GraphFormer extends the U-Net/Transformer paradigm by operating on a learned spatio-temporal graph: nodes represent sensors, edges encode both physical proximity and feature similarity, and temporal self-attention captures per-sensor dynamics. In the decoder, we mirror the encoder’s graph layers with skip-connections for multi-scale fusion rather than an MLP (EdgeConvFormer) or standard deconvolution (U-Net). This hybrid design not only preserves both global and local context but also scales with edge sparsity (avoiding the quadratic blow-up of pure Transformers). Compared to a vanilla transformer of similar depth, U-GraphFormer incurs 30% more per-layer FLOPs and 10% higher peak GPU memory (≈40 GB on 4 × TITAN V), yet still processes a 60 s segment (400 × 8 sensors) in 16 s of inference time—well within real-time SHM requirements.

These findings validate the efficacy of the architectural modifications in U-GraphFormer. The mirrored encoder–decoder structure with skip connections enhances feature extraction and anomaly detection, resulting in improved performance metrics across different test segments. This underscores the robustness and adaptability of the U-GraphFormer model, making it a more reliable tool for SHM and damage detection. The ablation experiments further demonstrate that the enhancements in U-GraphFormer not only reduce validation loss and reconstruction error but also significantly improve the accuracy and reliability of anomaly detection across various damage scenarios.

Bolt-loosening detection in jacket-type offshore wind turbine supports

In this case study, we evaluate our unsupervised U-GraphFormer model on vibration data collected from a scaled jacket-type offshore wind turbine support, focusing on early detection and severity assessment of bolt loosening (see Figure 10, The dataset, originally described by Valdez-Yepez et al.,⁴⁵ comprises high-frequency voltage measurements from eight triaxial accelerometers (PCB^® 356A17) mounted at key joints on the jacket structure. Vibrations were induced by white noise excitation to simulate operational loading, while bolt conditions spanned four states: fully tightened (12 Nm), slight loosening (9 Nm), moderate loosening (6 Nm), and fully removed.

Figure 10.

Experimental setup for bolt-loosening detection on the scaled jacket-type support structure. (Left) Eight PCB^® 356A17 triaxial accelerometers mounted at key leg-to-brace joints and labeled Sensor 1–Sensor 8. (Top right) Close-up of the specific bolts whose preload was systematically varied. (Bottom right) Annotated view of the four structural levels (Level 1 through Level 4) where bolt conditions (healthy, 9 Nm, 6 Nm, removed) were induced and tested.⁴⁵

Data split and testing

The vibration dataset utilized in this study was acquired from the publicly accessible Dataverse archive published by Valdez-Yepez et al.,⁴⁵ available online via DOI: 10.34810/data1011. We specifically focused on the dataset corresponding to a mid-level white-noise excitation amplitude (folder A_1). Each data file in this subset comprises 24-channel accelerometer voltage responses collected at approximately 25,840 time points, equivalent to around 60 s of data recorded at a sampling rate of 1 kHz.

For the unsupervised training phase, we aggregated 19 CSV files corresponding to the healthy structural condition, resulting in a comprehensive training set containing a total of 491,055 time points across 24 channels. An additional healthy CSV file was reserved exclusively for validation purposes, ensuring that the model remained unbiased and robust in distinguishing normal structural behavior.

As shown in Table 7, the test dataset includes a total of 13 CSV files, carefully assembled to comprehensively assess the model’s anomaly detection capability and severity ranking accuracy. This set includes a single held out healthy file along with 12 damaged condition recordings. Specifically, for each of the three damage states—slightly loose (9 Nm), moderately loose (6 Nm), and fully removed bolts, one recording is randomly selected for each of the four structural levels (levels 1–4). Collectively, this results in roughly 326,214 time points dedicated to testing and inference.

Table 7.

Test set composition for white-noise excitation at nominal intensity (amplitude 1.0). Each trial (one healthy and twelve damaged at different levels) is nominally described as a 60-s recording at 1 kHz. Each file contains 24 sensor channels.

Case	Description	Time points
Healthy	Fully tightened bolts (12 Nm)	25 882 × 24
9 Nm @ level 1	Slight loosening at level 1	25,251 × 24
9 Nm @ level 2	Slight loosening at level 2	24,752 × 24
9 Nm @ level 3	Slight loosening at level 3	24,872 × 24
9 Nm @ level 4	Slight loosening at level 4	24,833 × 24
6 Nm @ level 1	Moderate loosening at level 1	25,356 × 24
6 Nm @ level 2	Moderate loosening at level 2	25,351 × 24
6 Nm @ level 3	Moderate loosening at level 3	25,357 × 24
6 Nm @ level 4	Moderate loosening at level 4	25,379 × 24
NoBolt @ level 1	Bolt removed at level 1	24,791 × 24
NoBolt @ level 2	Bolt removed at level 2	24,789 × 24
NoBolt @ level 3	Bolt removed at level 3	24,829 × 24
NoBolt @ level 4	Bolt removed at level 4	24,772 × 24
Total		326,214 × 24

Data preprocessing and inference procedure

The preprocessing pipeline begins by concatenating 19 CSV recordings corresponding to the healthy state into a unified training dataset, comprising approximately 491,055 time points across 24 sensor channels. Missing values in the concatenated dataset are imputed using the column-wise mean to maintain consistency and data integrity. Considering the spatial characteristics of structural vibrations, the original 24-dimensional dataset—corresponding to eight triaxial accelerometers—is condensed into eight magnitude-based scalar features. Each feature represents the vibration magnitude at a sensor location by computing the Euclidean norm across the X, Y, and Z-axes, mathematically defined as:

$M_{i} = \sqrt{X_{i}^{2} + Y_{i}^{2} + Z_{i}^{2}}, i = 1, 2, \dots, 8 .$ (15)

This fusion step transforms the triaxial sensor data into scalar magnitudes, capturing vibration energy comprehensively and effectively reducing dimensionality while preserving spatial insights critical for anomaly detection.

Following fusion, the magnitude-based training data are segmented into overlapping subsequences using a sliding window approach, configured with a window length of 60 samples (equivalent to 60 ms at 1 kHz sampling rate) and a stride of 30 samples (50% overlap). Each subsequence is then summarized by averaging across the temporal dimension to generate consistent feature vectors for model training.

The inference phase employs an identical preprocessing strategy across all test datasets, which include recordings for healthy conditions and multiple bolt-loosening damage states (minor (9 Nm), moderate (6 Nm), and severe (NoBolt)). These test recordings undergo the same dimensional fusion, segmentation, and standardization procedures, with feature scaling based exclusively on training data statistics.

Subsequent to preprocessing, each standardized test sequence is fed into the U-GraphFormer model. Reconstruction errors generated by the model serve as anomaly scores for each sequence, enabling both the identification of anomalies and the quantification of structural integrity. Thresholds for anomaly detection are adaptively derived from the distribution of anomaly scores obtained from healthy training segments, ensuring robust, and generalizable anomaly detection performance.

Training and inference efficiency

U-GraphFormer was trained once on the 19 healthy-state CSV trials provided by the wind-turbine dataset (491,055 samples per channel, i.e., ≈8 min at 1 kHz) in 1194 s on a workstation equipped with four NVIDIA TITAN V 12 GB GPUs (peak memory ≈40 GiB). For inference, we evaluated all 13 test trials—each containing 24,772–25,882 samples per channel (≈25–26 s of data at 1 kHz; see Table 7)—in a total of 207.1 s, averaging ≈16 s per file. This corresponds to ≈0.63 s of computation per 1 s of recorded data. Extrapolating to a full 60-s recording (60,000 samples per channel) yields an estimated inference time under 40 s. These runtimes and resource footprints confirm that U-GraphFormer is both efficient and practical for near-real-time SHM.

Results and discussions

Early Detection of Damage In a real-world monitoring scenario, each 60-s acquisition (one CSV file) is treated as a continuous segment whose condition must be assessed promptly. After applying the preprocessing steps described in section “Data preprocessing and inference procedure,” each segment yields 400 steady-state feature points. An anomaly score $A_{t}$ is computed at each time point using a single autoencoder trained solely on healthy data (Figure 11, middle panel).

Figure 11.

Anomaly detection was performed on a single continuous test sequence obtained by concatenating 13 CSV files—one healthy baseline and four distinct bolt-loosening positions (Levels 1–4) for each of three fault modes (9 Nm “minor,” 6 Nm “moderate,” and NoBolt “severe”). After preprocessing, the first 100 transitional samples of each 500-point segment were removed, yielding 400 steady-state points per file and a 5200-point sequence in total. The reconstruction-error-based anomaly score plotted over this sequence, together with precision metrics, shows a clear escalation in detected deviation corresponding to the progression from healthy through each loosening position and fault mode.

To support reliable early detection, we apply Tail- $p$ thresholding point-wise and monitor the proportion of points exceeding the threshold within each segment. An alarm is triggered once this ratio exceeds 50%, and the first-time step with a positive anomaly score is designated as the onset of damage. In practice, this condition is often met within the first few seconds of a damaged segment, meaning the system not only classifies damage correctly but also detects it early in time. This approach ensures both sensitivity to damage onset and robustness to outliers or noise.

The performance in time points is further illustrated in Figure 12, which shows an ROC AUC of 0.832 and an average precision of 0.985. These values confirm that the model effectively distinguishes healthy and anomalous points. However, relying on individual scores may be overly sensitive to brief transients. Our ratio-based mechanism provides a principled way to aggregate these time-wise signals into a stable yet responsive decision process.

Figure 12.

ROC and precision-recall curves for base anomaly scoring on the concatenated test set of healthy operation and progressive bolt-loosening faults (Levels 1–4 for each of 9 Nm minor, 6 Nm moderate and NoBolt severe).

Severity assessment of damage: Once a 60-s segment is flagged as anomalous, we assess the severity of the damage using two descriptive statistics of the reconstruction-error sequence $A_{t}$ : the mean $μ_{A}$ and standard deviation $σ_{A}$ , computed over the 400 steady-state points in the segment. These two metrics capture the overall deviation from healthy behavior and the variability due to transient structural responses, respectively.

In the healthy baseline segment, the scores are consistently low with $(μ_{A}, σ_{A}) \approx (12.3, 4.8)$ , reflecting normal sensor noise. The 9-Nm “minor” faults show increased scores on average $(77.0, 91.9)$ , but the variation across bolt positions is notable. For example, Level 2, located along a mid-height diagonal brace—a primary load-bearing member—produces the highest anomaly score ( $μ \approx 182, σ \approx 106$ ). In contrast, Level 4 (a more peripheral location) yields a much lower anomaly response ( $μ \approx 15, σ \approx 15$ ). This discrepancy is not a model limitation but instead highlights a structural reality: faults at critical joints induce greater global impact on the vibration signature, while faults on less mechanically significant members cause more localized and less detectable changes.

This trend persists in the 6-Nm “moderate” and NoBolt “severe” cases. The 6-Nm segments show stronger mean deviations due to sustained bias ( $μ_{A} \approx 92.1, σ_{A} \approx 109.0$ ), while the NoBolt cases—combining both structural imbalance and vibration—exhibit the highest scores overall ( $μ_{A} \approx 127.3, σ_{A} \approx 210.5$ ), as shown in Figure 11. These monotonic increases align with expected physical severity, demonstrating that the model’s output reflects not just anomaly presence but its mechanical significance.

Importantly, this variation across bolt positions adds practical value: it allows the system to prioritize maintenance based on the structural importance of the affected joint. While some damage levels may yield lower anomaly scores, this is consistent with their reduced contribution to global stiffness or load transfer. The method thus offers not only accurate segment-wise detection, but also an interpretable, physically grounded severity ranking—allowing domain experts to differentiate between faults that are merely detectable and those that are truly critical.

Real-world case: A steel truss bridge subject to artificial damage

In this study, we focus on enhancing the early detection and severity assessment of structural damage in bridge structures by analyzing progressive scenarios. The Old ADA Bridge in Japan provides a valuable case study for SHM and damage detection.⁴⁴ As shown in Figure 13, this steel truss bridge, extensively tested before its removal in 2012, serves as an ideal testbed for understanding the effects of artificial damage on structural integrity and system identification.

Figure 13.

Old ADA bridge.⁴⁴

This study utilized ambient vibration testing to measure the bridge’s structural response to natural environmental excitations. This method, which leverages the inherent vibrations from environmental factors such as wind and traffic, is advantageous due to its non-intrusive nature and cost-effectiveness. Capturing these low-level vibrations requires the deployment of high-quality accelerometers. Eight uniaxial accelerometers were strategically placed on the bridge deck to gather detailed vibration data. Five sensors were positioned near the truss member subjected to artificial damage, while the remaining three were located on the opposite side of the bridge deck. This arrangement ensured comprehensive coverage of the structural response across the entire bridge.

Figure 14(a) shows the layout and sensor information. Eight uniaxial accelerometers were strategically placed on the bridge deck: five near the truss member subjected to artificial damage and three on the opposite side. This arrangement ensured comprehensive coverage of the structural response across the entire bridge. The bridge was subjected to five distinct damage scenarios, as depicted in Figure 14(b) and (c).

Figure 14.

(a) Sketch and sensor information; (b) sketch of damage scenarios; and (c) artificial damage applied to tension members.⁴⁴

To ensure the statistical reliability of the data, measurements were repeated multiple times across these scenarios. Specifically, the intact (INT) state was recorded three times, while the half-cut, full-cut, and 5/8th span cut scenarios were each recorded once. The repaired state was recorded twice. Data were collected at a high sampling rate of 200 Hz, providing a detailed and high-resolution dataset of the bridge’s dynamic responses.

This comprehensive dataset offers a rare opportunity to analyze the behavior of a steel truss bridge under controlled damage conditions. It serves as a critical benchmark for developing and validating new methods for damage detection and SHM using accelerometer data. By analyzing this dataset, researchers can gain significant insights into the structural behavior under various damage scenarios, thereby enhancing the reliability and effectiveness of monitoring systems designed to ensure the safety and integrity of such critical infrastructure.

Data preparing and preprocessing

In this study, ambient vibration tests were conducted on a bridge structure with a sampling rate of 200 Hz to evaluate the structural condition under various scenarios, including INT and damaged states. The vibration data were collected using eight accelerometers, providing a high-resolution capture of the bridge’s vibrational behavior. The five different scenarios tested include INT, three damage states (DMG1, DMG2, and DMG3), and a recovery state (RCV).

As shown in Table 8, for the INT scenario, data were collected in three separate tests labeled No1, No2, and No3, with sample sizes of 48,615, 84,125, and 75,964, respectively. The recovery scenario included two tests, No1 and No2, with 10,574 and 75,294 samples, respectively. The damage scenarios, DMG1, DMG2, and DMG3, each involved a single test, with 57,553, 73,859, and 71,553 samples, respectively. To ensure consistent data length and robust model training, data preparation involved selecting INT No2 and No3 as the primary datasets for training, providing a diverse set of INT state data. These datasets were chosen for their larger size, which enhanced the model’s ability to learn the characteristics of the INT bridge structure. For testing, INT No1 was used as a baseline to verify the model’s ability to recognize the INT condition, while the DMG1, DMG2, and DMG3 datasets allowed for evaluating the model’s sensitivity to different damage states. RCV No2 was included to assess the model’s ability to identify improvements in structural integrity post-recovery.

Table 8.

Ambient vibration data collected from five different structural states of a steel truss bridge. The scenarios include: the INT state, half-cut vertical member (DMG1), fully cut vertical member (DMG2), repaired state (RCV), and 5/8th span cut (DMG3). Data were collected for three repetitions of the INT test and two repetitions of the RCV test, while only one dataset was collected for DMG1, DMG2, and DMG3. Each dataset’s size indicates the number of time steps (rows) recorded using 8 accelerometers (columns). The details of each structural state are provided to highlight the different damage levels and locations.⁶⁹

Scenario	Description	No1	No2	No3
INT	Original undamaged condition	48,615 × 8	84,125 × 8	75,964 × 8
DMG1	Half-cut vertical member at midspan	57,553 × 8
DMG2	Fully cut vertical member at midspan	73,859 × 8
RCV	Repaired vertical member	10,574 × 8	75,294 × 8
DMG3	Fully cut vertical member at 5/8th span	71,553 × 8

To ensure a consistent and comprehensive dataset for model training and evaluation, we selected specific datasets based on their length and variability. The INT datasets from tests No2 and No3, consisting of 84,125 and 75,964 samples respectively, were chosen for training. These datasets provide extensive coverage and variability, crucial for robust model learning. For testing, we utilized the INT No1 dataset, with 48,615 samples, as a baseline to assess the model’s ability to recognize the INT condition. Additionally, the DMG1, DMG2, and DMG3 datasets, containing 57,553, 73,859, and 71,553 samples, respectively, were used to evaluate the model’s sensitivity to different damage states. The RCV No2 dataset, with 75,294 samples, was included to assess the model’s capability to detect improvements in structural integrity after recovery.

In the preprocessing phase, we implemented several techniques to enhance the model’s efficiency, carefully chosen through empirical assessment and cross-validation. We began with moving average smoothing, using a window size of 60 and a stride of 30, to reduce noise and highlight significant patterns within the vibration data, thereby enhancing the signal-to-noise ratio for more accurate anomaly detection. Following this, we standardized the data to a mean of zero and a standard deviation of one, ensuring uniformity across all accelerometer readings and allowing each feature to contribute equally to the analysis. We then applied an overlapping sliding window technique with a window size of 100 and a stride of 10, generating multiple overlapping segments from each time series to enrich the training dataset. This approach enabled the model to focus on capturing essential vibration patterns and temporal cycles, improving its sensitivity to subtle variations that may indicate potential anomalies or structural damage.

Training and testing process

The training dataset consisted of 160,089 time points, maintaining a high sampling rate of 200 Hz. The training process adhered closely to the parameters set in the ASCE benchmark, as detailed in Table 1, except for using a window size of 60 and a stride of 30 for the moving average, which resulted in a total of 5335 time steps. The training process was efficient, requiring only 887 s and utilizing 34,436 MiB of GPU memory distributed across 4 NVIDIA TITAN V 12 GB GPUs.

For the testing phase, a more extensive dataset of 326,874 time points was assembled by combining data from INT No1, DMG1, DMG2, RCV No2, and DMG3. This dataset was designed to simulate a progressive structural state. Similarly to the training phase, a sliding window technique with a window size of 60 and a stride of 30 was applied, resulting in 10,894 time steps. To emphasize the critical early stages of structural changes, each phase was truncated to include only the first 500 time points, yielding a total of 2500 time steps for detailed analysis.

The model efficiently processed these test data in just 31.61 s, demonstrating its ability to rapidly deliver insights. This performance highlights the model’s proficiency in managing large volumes of high-frequency data, ensuring timely detection and response to structural changes. By capturing and analyzing subtle variations in structural behavior, the model provides a robust solution for real-time monitoring of structural health.

Results and discussions

Early detection of damage: As shown in Figure 15, the model’s performance is evaluated using time-point-wise precision, recall, and F1-score metrics. These metrics are essential for identifying structural anomalies at their earliest stages, enabling proactive maintenance strategies. The subplot depicting predicted labels shows the model’s metrics: an overall precision of 1.00, recall of 0.68, and F1-score of 0.81. The recall value is primarily affected by the DMG3 phase, where the anomaly detection is inconsistent between the first and second halves.

Figure 15.

Anomaly detection across progressive structural states. This plot illustrates the model’s performance in detecting anomalies across different structural states (INT, DMG1, DMG2, RCV, and DMG3) within a progressive scenario. Each phase originally consisted of 500 time points; however, the first 100 time points of each phase were removed to eliminate transitional effects, allowing the analysis to focus on steady-state behavior. This resulted in a total of 2000 time points. The model effectively identifies structural deviations using anomaly scores and precision metrics, highlighting its capability to detect and differentiate damage levels.

The model’s time-point-wise detection performance is robust across different phases, particularly in DMG1 and DMG2. In DMG1, the model effectively detects half-cut damage at the midspan with high precision and recall due to pronounced structural deviations that are easily captured. During DMG2, the model identifies full-cut damage, achieving high anomaly scores indicating severe structural compromise. This demonstrates the model’s capability to detect critical damage points that pose significant risks. The INT and RCV phases illustrate the model’s capabilities. It accurately identifies a few anomalies in stable conditions and effectively detects recovery post-repair, maintaining high precision and indicating its ability to recognize INT structures and recovery efforts.

However, the DMG3 phase presents a unique challenge with notable variance in recall, where the consistency of anomaly detection varies between the first and second halves. In the first half of DMG3, there is a significant spike in anomaly scores, reflecting the full-cut damage at the 5/8th span, indicating a severe structural deviation captured by the model. In contrast, the second half of DMG3 shows a marked drop in anomaly scores, as illustrated in Figure 15. Upon closer examination of the reconstruction results in Figure 16, this decrease in anomaly scores aligns with a substantial reduction in the ground truth values. This correlation suggests that the low anomaly scores in the second half may not reflect a limitation of the model but rather the reduced distinguishable structural deviation in the data. Moreover, factors such as sensor placement and the quality of the collected data could further impact the model’s ability to consistently detect anomalies, especially when structural changes are subtle or masked by noise. These insights underscore the importance of considering both model performance and data integrity in evaluating the effectiveness of SHM systems.

Figure 16.

Reconstruction results for sensors A1–A4: The figure illustrates the reconstruction performance of the model for sensors A1–A4. It demonstrates how well the model captures the original signal patterns and highlights any discrepancies that indicate potential anomalies.

Additionally, the transition from the RCV phase to DMG3, which involves recovery efforts, may introduce changes in structural dynamics that further complicate anomaly detection. These shifts in sensor readings during the DMG3 phase can affect the model’s consistency in identifying anomalies, highlighting the complexity of monitoring structural health during dynamic recovery processes.

Furthermore, despite the reduced recall in the second half of the DMG3 phase, the model’s ability to closely align anomaly scores with the actual time points where damage begins is noteworthy. This alignment, along with the detection of remarkably high anomaly scores during the critical moment when damage begins, ensures that severe structural deviation is accurately identified. Additionally, by exceeding the 50% ratio of time points detected as abnormal within this segment, the model guarantees that the segment-wise alarm accuracy remains at 100%. Even in this challenging phase, the model effectively triggers an alarm, demonstrating its robustness and reliability in early damage detection.

These findings underscore the model’s effectiveness in early detection, as it accurately flags anomalies within the initial 1-min sensor data, closely aligning the detection time with the actual occurrence of damage. The segment-level adaptive scoring strategy proves robust and adaptable, effectively capturing both significant and minor anomalies across varying levels of damage severity, with a consistent segment-wise alarm accuracy of 100%.

Severity assessment of damage: Assessing the severity of detected damage is critical for prioritizing maintenance efforts and resource allocation. Anomaly scores, which quantify structural deviations, provide valuable insights into the extent of damage across different phases. Higher mean anomaly scores and greater variability indicate more pronounced structural deterioration, as shown in Table 9 and visualized in Figure 15.

Table 9.

Anomaly score statistics and location-based insights for each damage phase.

Phase	Mean anomaly score	Standard deviation	Location	Damage description	Expected model output
INT	6.96	1.46	All locations	No damage (intact)	No anomaly
DMG1	240.75	378.02	Midspan	Half-cut damage	Medium anomaly scores
DMG2	1084.82	1563.54	Midspan	Full-cut damage	High anomaly scores
RCV	7.49	1.45	Midspan	Welded recovery	Low anomaly scores
DMG3	590.85	1890.57	5/8th span	Full-cut at new location	High anomaly scores

In the INT phase, which serves as the baseline, the anomaly scores are low, with a mean of 6.96 and a standard deviation of 1.46, indicating stable conditions across the bridge with no detected anomalies. This confirms the structural integrity of the bridge under normal conditions, demonstrating the model’s capability to recognize a healthy structure.

Moving to the DMG1 phase, half-cut damage is introduced at the midspan, where bending moments are highest, making it particularly vulnerable. The model reports medium anomaly scores, reflected by a mean of 240.75 and a high standard deviation of 378.02, indicating noticeable structural changes that could potentially affect stability. Early detection in this phase is crucial for initiating timely maintenance and ensuring the structural integrity of the bridge.

In the DMG2 phase, a full-cut at the midspan results in severe damage, with the structural integrity heavily compromised. The mean anomaly score spikes to 1084.82 with a substantial standard deviation of 1563.54, indicating a critical point in the structure’s lifecycle. The model is expected to detect high anomaly scores, accurately reflecting the severe degradation and increased failure risk.

The RCV phase involves welded recovery to restore the bridge’s integrity. The anomaly scores reduce significantly, with a mean of 7.49 and a standard deviation of 1.45, indicating successful recognition of the recovery efforts and diminished structural anomalies. This underscores the model’s ability to detect improvements in structural conditions following repairs.

In the DMG3 phase, a full-cut occurs at the 5/8th span, representing severe damage at a new location on the bridge. While the mean anomaly score of 590.85 is lower than DMG2 due to varied load distributions, it still indicates high severity, as reflected by the high standard deviation of 1890.57. Although both DMG2 and DMG3 signify critical damage, the central position of DMG2 is expected to have a more significant impact on overall stability. The model should effectively identify these structural variations, demonstrating its ability to monitor and respond to severe damage across different spans of the bridge.

The model exhibits strong early damage detection, with high precision and recall in the DMG1, DMG2, and DMG3 phases, which is crucial for preventing minor issues from escalating into severe failures. These phases underscore the model’s effectiveness in identifying significant structural changes at key points. During the damaged phases, while the mean anomaly scores are high, the standard deviations are even higher, highlighting the model’s heightened sensitivity to structural anomalies. This variability results from the model’s unsupervised training on healthy states, leading to greater reconstruction errors under damaged conditions. In contrast, during INT and recovery phases, the model consistently produces low mean anomaly scores with low standard deviation, accurately distinguishing non-damage states and confirming the effectiveness of repairs. These results collectively demonstrate the model’s reliability in monitoring the bridge’s condition and enabling timely maintenance interventions.

Conclusion

U-GraphFormer represents a significant advancement in SHM by offering enhanced capabilities in damage detection and severity assessment. This novel model combines advanced data processing techniques, such as spatiotemporal graph learning and sensor-specific temporal self-attention, within a mirrored encoder–decoder architecture reminiscent of U-Net, to effectively capture and analyze complex patterns in sensor data. The integration of skip connections further enhances the model’s ability to reconstruct and identify anomalies with greater accuracy, making it a powerful tool for early detection and intervention.

The successful application of U-GraphFormer in both benchmark tests and real-world scenarios underscores its robustness and adaptability, particularly in early damage detection and severity assessment. The model excels in segment-wise detection, achieving 100% accuracy in all scenarios, and demonstrates high precision in detecting the exact start time of damage. This is evident in its performance on Damage1 in the benchmark, where it achieved precision, recall, and F1-scores of 0.99, 0.98, and 0.98, respectively. Even when detecting the most minor damage, such as Damage6, U-GraphFormer continues to perform commendably, with precision, recall, and F1-scores of 0.80, 0.71, and 0.75. This capability is vital for preventing minor issues from escalating into severe structural failures. By utilizing the mean and standard deviation of anomaly scores as interpretable indicators, U-GraphFormer delivers a nuanced assessment of damage severity. Rigorous testing on the ASCE benchmark and a real-world steel truss bridge demonstrated the model’s effectiveness, achieving a ranking accuracy of 1. This flawless accuracy reflects the model’s ability to provide severity assessments that closely align with actual damage progression, making U-GraphFormer an invaluable tool for timely maintenance and informed decision-making.

Future work could improve U-GraphFormer in several important directions. First, investigating sensor-level and temporal attribution techniques, such as saliency maps or SHAP, may enhance diagnostic precision. Second, evaluating robustness under diverse environmental and operational conditions—by incorporating environment-adaptive scoring or domain-adaptive approaches—could further strengthen reliability in real-world settings. Finally, validating U-GraphFormer on a wider range of civil structures and damage types would help demonstrate its generalizability and scalability for SHM applications.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) received no financial support for the research,authorship,and/or publication of this article.

ORCID iD

Qilin Li

References

Hassani

Dackermann

A systematic review of advanced sensor technologies for non-destructive testing and structural health monitoring. Sensors 2023; 23(4): 2204.

Batchu

Raisi

Structural health monitoring of a train model under traffic loading. In: Nondestructive characterization and monitoring of advanced materials, aerospace, civil infrastructure, and transportation XVII, Las Vegas, NV, United States, 21–24 March 2016, vol. 12487, pp. 240–255. St. Bellingham, WA: SPIE.

Wang

Shao

, et al. A novel transformer-based semantic segmentation framework for structural condition assessment. Struct Health Monit 2024; 23(2): 1170–1183.

Jain

Sreerama

Kalapatapu

, et al. A real-time framework for structural health monitoring based on the internet of things—an experimental study. Struct Health Monit 2023. DOI: 10.12783/shm2023/36835

Cheng

Chen

Hao

, et al. Prediction of BLEVE-induced response of road tunnel using transformer network with modified self-attention (SAMT). Eng Struct 2024; 314: 118415.

Doebling

Farrar

Prime

, et al. Damage identification and health monitoring of structural and mechanical systems from changes in their vibration characteristics: a literature review. Shock Vibr Digest 1996; 30. https://doi.org/10.2172/249299

Sohn

Farrar

Hemez

, et al. A review of structural health monitoring literature: 1996–2001. Los Alamos, NM: Los Alamos National Laboratory, 2003.

Farrar

Doebling

Nix

DA.

Vibration–based structural damage identification. Philos Trans Royal Soc A Math Phys Eng Sci 2001;359: 131–149.

Avci

Abdeljaber

Kiranyaz

, et al. Structural damage detection in real time: implementation of 1D convolutional neural networks for SHM applications. In: Niezrecki

(eds) Structural health monitoring and damage detection, Volume 7. Conference proceedings of the society for experimental mechanics series. Cham: Springer, 2017, pp. 49–54.

10.

Rafiei

Adeli

A novel machine learning-based algorithm to detect damage in high-rise building structures. Struct Design Tall Spec Build 2017; 26: e1400.

11.

Azimi

Eslamlou

Pekcan

Data-driven structural health monitoring and damage detection through deep learning: state-of-the-art review. Sensors 2020; 20: 2778.

12.

Ahmadian

Aval

SBB

Noori

, et al. Comparative study of a newly proposed machine learning classification to detect damage occurrence in structures. Eng Appl Artif Intell 2024; 127: 107226.

13.

Wan

H-P

Zhu

Y-K

Luo

, et al. Unsupervised deep learning approach for structural anomaly detection using probabilistic features. Struct Health Monit 2024; 24(1): 147592.

14.

Bull

Worden

Dervilis

Towards semi-supervised and probabilistic classification in structural health monitoring. Mech Syst Signal Proc 2020; 140: 106653.

15.

Rastin

Ghodrati Amiri

Darvishan

Unsupervised structural damage detection technique based on a deep convolutional autoencoder. Shock Vibr 2021; 2021: 6658575.

16.

Sarmadi

Yuen

Early damage detection by an innovative unsupervised learning method based on kernel null space and peak-over-threshold. Comput Aided Civ Inf 2021; 36(9): 1150–1167.

17.

Wang

Cha

Y-J.

Unsupervised deep learning approach using a deep auto-encoder with a one-class support vector machine to detect damage. Struct Health Monit 2021; 20: 406–425.

18.

Lei

Sun

Xia

Lost data reconstruction for structural health monitoring using deep convolutional generative adversarial networks. Struct Health Monit 2021; 20: 2069–2087.

19.

Shao

, et al. Computer vision based target-free 3D vibration displacement measurement of structures. Eng Struct 2021; 246: 113040.

20.

Pandey

Biswas

Samman

Damage detection from changes in curvature mode shapes. J Sound Vibr 1991; 145: 321–332.

21.

Farrar

Jauregui

. Comparative study of damage identification algorithms applied to a bridge: I. Experiment. Smart Mater Struct 1998; 7: 704.

22.

Ren

W-X

Sun

Z-S.

Structural damage identification by using wavelet entropy. Eng Struct 2008; 30: 2840–2849.

23.

Fan

Qiao

Vibration-based damage identification methods: a review and comparative study. Struct Health Monit 2011; 10: 83–111.

24.

Shao

, et al. 3DGEN: a framework for generating custom-made synthetic 3D datasets for civil structure health monitoring. Struct Health Monit 2024. https://doi.org/10.1177/14759217241265540

25.

Wang

, et al. Prediction of BLEVE loads on structures using machine learning and CFD. Proc Saf Environ Protect 2023; 171: 914–925.

26.

Tibaduiza

D-A

Torres-Arredondo

M-A

Mujica

, et al. A study of two unsupervised data driven statistical methodologies for detecting and classifying damages in structural health monitoring. Mech Syst Signal Proc 2013; 41: 467–484.

27.

Zhang

Sun

Loh

, et al. Autonomous bolt loosening detection using deep learning. Struct Health Monit 2020; 19: 105–122.

28.

Behmanesh

Moaveni

Lombaert

, et al. Hierarchical Bayesian model updating for structural identification. Mech Syst Signal Proc 2015; 64: 360–376.

29.

Huang

Beck

Hierarchical sparse Bayesian learning for structural damage detection: theory, computation and application. Struct Saf 2017; 64: 37–53.

30.

Entezami

Shariatmadar

An unsupervised learning approach by novel damage indices in structural health monitoring for damage localization and quantification. Struct Health Monit 2018; 17: 325–345.

31.

Cetiner

, et al. Real-time regional seismic damage assessment framework based on long short-term memory neural network. Comput Aid Civ Infrastruct Eng 2021; 36: 504–521.

32.

Postorino

Monteiro

Rebillat

, et al. Experimental damage localization and quantification with a numerically trained convolutional neural network. In: Rizzo

Milazzo

(eds) European workshop on structural health monitoring. EWSHM 2022. Lecture notes in civil engineering, vol 270. Cham: Springer, 2022, pp. 401–407.

33.

Nguyen-Ngoc

Nguyen-Huu

De Roeck

, et al. Deep neural network and evolved optimization algorithm for damage assessment in a truss bridge. Mathematics 2024; 12: 2300.

34.

Esmaielzadeh

Ahmadi

Hosseini

SA.

Damage detection of concrete gravity dams using Hilbert-Huang method. J Appl Eng Sci 2018; 8: 7–16.

35.

Mousavi

Zhang

Masri

, et al. Structural damage detection method based on the complete ensemble empirical mode decomposition with adaptive noise: a model steel truss bridge case study. Struct Health Monit 2022; 21: 887–912.

36.

Wang

, et al. Structural damage identification by using physics-guided residual neural networks. Eng Struct 2024; 318: 118703.

37.

Dang

Tran-Ngoc

Nguyen

, et al. Data-driven structural health monitoring using feature fusion and hybrid deep learning. IEEE Trans Autom Sci Eng 2020; 18: 2087–2103.

38.

Svendsen

Øiseth

Frøseth

, et al. A hybrid structural health monitoring approach for damage detection in steel bridges under simulated environmental conditions using numerical and experimental data. Struct Health Monit 2023; 22: 540–561.

39.

Sakiyama

Veríssimo

Lehmann

, et al. Quantifying the extent of local damage of a 60-year-old prestressed concrete bridge: a hybrid SHM approach. Struct Health Monit 2023; 22: 496–517.

40.

Farrar

Worden

Structural health monitoring: a machine learning perspective. Hoboken, NJ: John Wiley & Sons, 2012.

41.

Figueiredo

Park

Farrar

, et al. Machine learning algorithms for damage detection under operational and environmental variability. Struct Health Monit 2011; 10: 559–572.

42.

Spanos

Sakellariou

Fassois

SD.

Vibration-response-only statistical time series structural health monitoring methods: a comprehensive assessment via a scale jacket structure. Struct Health Monit 2020; 19: 736–750.

43.

Al-Ghalib

AA.

Damage identification of old ADA steel bridge using discriminant analysis of factor analysis loadings. J Civ Struct Health Monit 2023; 13: 1207–1219.

44.

Kim

C-W

Zhang

F-L

Chang

K-C

, et al. Ambient and vehicle-induced vibration data of a steel truss bridge subject to artificial damage. J Bridge Eng 2021; 26: 04721002.

45.

Valdez-Yepez

Tutivén

Vidal

Structural health monitoring of jacket-type support structures in offshore wind turbines: a comprehensive dataset for bolt loosening detection through vibrational analysis. Data Brief 2024; 53: 110222.

46.

Valdez

Palacios

Tutivén

, et al. Bolt-loosening detection in offshore wind turbines’ jacket-type supports. Struct Health Monit 2024. https://doi.org/10.1177/14759217241268522

47.

Ghazimoghadam

Hosseinzadeh

A novel unsupervised deep learning approach for vibration-based damage diagnosis using a multi-head self-attention LSTM autoencoder. Measurement 2024; 229: 114410.

48.

Cha

Y-J

Ali

Lewis

, et al. Deep learning-based structural health monitoring. Autom Constr 2024; 161: 105328.

49.

Azad

Kim

Cheon

, et al. Intelligent structural health monitoring of composite structures using machine learning, deep learning, and transfer learning: a review. Adv Compos Mater 2024; 33: 162–188.

50.

Wang

Q-A

Dai

Z-G

, et al. Towards high-precision data modeling of SHM measurements using an improved sparse Bayesian learning scheme with strong generalization ability. Struct Health Monit 2024; 23: 588–604.

51.

Luckey

Fritz

Legatiuk

, et al. Explainable artificial intelligence to advance structural health monitoring. In: Cury

Ribeiro

Ubertini

, et al. (eds) Structural health monitoring based on data science techniques. Structural Integrity, vol 21. Cham: Springer, 2022, pp. 331–346.

52.

Entezami

Sarmadi

Behkamal

, et al. Early warning of structural damage via manifold learning-aided data clustering and non-parametric probabilistic anomaly detection. Mech Syst Signal Proc 2025; 224: 111984.

53.

Entezami

Sarmadi

Behkamal

Removal of freezing effects from modal frequencies of civil structures for structural health monitoring. Eng Struct 2024; 319: 118722.

54.

Rezaiee-Pajand

Entezami

Shariatmadar

An iterative order determination method for time-series modeling in structural health monitoring. Adv Struct Eng 2018; 21: 300–314.

55.

Raissi

Perdikaris

Karniadakis

GE.

Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J Comput Phys 2019; 378: 686–707.

56.

Kingma

Welling

Auto-encoding variational bayes [Internet], 2013.

57.

Hielscher

Khalil

Virgona

, et al. A neural network based digital twin model for the structural health monitoring of reinforced concrete bridges. Structures 2023; 57: 105248.

58.

Feng

Chen

, et al. A cyclostationarity-based wear monitoring framework of spur gears in intelligent manufacturing systems. Struct Health Monit 2023; 22: 3092–3108.

59.

Liu

, et al. Structural damage detection and localization via an unsupervised anomaly detection method. Reliab Eng Syst Saf 2024; 252: 110465.

60.

Dong

G-S

Wan

H-P

Luo

, et al. A fast sparsity-free compressive sensing approach for vibration data reconstruction using deep convolutional GAN. Mech Syst Signal Proc 2023; 188: 109937.

61.

Wang

Sun

Liu

, et al. Dynamic graph CNN for learning on point clouds. ACM Trans Graph 2019; 38: 1–12.

62.

Kazemi

Goel

Eghbali

, et al. Time2vec: learning a vector representation of time. arXiv preprint arXiv:1907.05321, 2019.

63.

Garg

Zhang

Samaran

, et al. An evaluation of anomaly detection and diagnosis in multivariate time series. IEEE Trans Neural Netw Learn Syst 2021; 33: 2508–2517.

64.

Ahmad

Lavin

Purdy

, et al. Unsupervised real-time anomaly detection for streaming data. Neurocomputing 2017; 262: 134–147.

65.

Wikipedia Contributors. Law of large numbers, https://en.wikipedia.org/wiki/Law_of_large_numbers (2025, accessed 10 May 2025).

66.

Boniol

Liu

Huang

, et al. Dive into time-series anomaly detection: a decade review. arXiv preprint arXiv:2412.20512, 2024.

67.

Kitagawa

Nybom

Stuhler

Measurement error and rank correlations. London, UK: Institute for Fiscal Studies, 2018.

68.

Johnson

Lam

H-F

Katafygiotis

, et al. Phase I IASC-ASCE structural health monitoring benchmark problem using simulated data. J Eng Mech 2004; 130: 3–15.

69.

Kim

Zhang

Chang

, et al. Old_ADA_Bridge-damage_vibration_data, https://doi.org/10.17632/sc8whx4pvm.2 (2020, accessed 21 April 2021).