Abstract
Keywords
Introduction
The growing number of geriatric patients in the world is causing a rise in intensive care unit (ICU) admissions. 1 Their pathophysiological characteristics, including multiple comorbidities,2,3 such as reduced hypoxemia tolerance and respiratory muscle fatigue, often enable earlier intubation to alleviate respiratory burden. However, mechanical ventilation in geriatric patients carries significant risks, with mortality rates 1.8 times higher than in younger patients and 3.07 times greater compared to nonventilated elderly patients. 4 This creates a critical clinical dilemma: how to handle low oxygen risks without increasing the chances of death from ventilation. 5
Current studies about intubation prediction in elderly ICU patients mainly use traditional machine learning and deep learning models. 6 Traditional machine learning methods (such as support vector machine (SVM), and XGBoost) usually rely on statistical feature construction (such as the 1-h average respiratory rate (RR))7,8 and heuristic feature selection. 9 While computationally efficient and interpretable, such methods rely on domain-specific prior knowledge in feature engineering, assuming its broad applicability. This overlooks the significant variability in key physiological features across elderly ICU patients. Consequently, features derived from potentially incomplete prior knowledge may fail to fully capture the complex physiology of this population.
Deep learning models, particularly sequence models like long short-term memory (LSTM)10,11 and transformer-based architectures, 12 have superior performance in some ICU-related predictive tasks, including intubation and mortality prediction. For instance, Nora et al. developed an LSTM-based model capable of predicting intubation requirement, mortality risk, and ventilation duration with an area under the curve (AUC) exceeding 0.95, demonstrating the correlation between intubation necessity and acute respiratory distress syndrome mortality. 13 However, these sequence models intrinsically presuppose uniformly sampled time-series data. 14 Elderly ICU records exhibit marked sampling frequency variability, resulting in inherently irregular and sparse patterns.
Modeling how different features connect has become key for better predicting intubation in ICUs. 15 The cross-attention mechanism, leveraging multi-head attention to model inter-variable dependencies, enhanced AUC by 0.0379 and reduced the false alarm rate to 17.8% in intubation prediction. 16 Li et al. employed a temporal convolutional network to model temporal convolutional network to model temporal dependencies and identify key risk factors such as mean blood pressure and oxygen saturation (SpO2). 17 A representative study by Kim et al. employed pulmonary expert-curated feature relationships to construct static graph structures for predicting spontaneous breathing trial success. Although achieving 0.85 AUC, their FT-GAT model's fixed topological configuration fails to capture the temporal evolution of feature interactions during acute respiratory failure episodes. 18 These approaches focus either on temporal correlations or feature-level dependencies, but they struggle to effectively model the time-evolving interactions between features simultaneously. This limitation is particularly acute in the context of ICU intubation prediction, where the influence of feature relationships varies significantly across different clinical states and time points.
Graph neural networks (GNNs) have recently demonstrated remarkable performance in heterogeneous data analysis, particularly for multivariate time-series forecasting. Some studies have utilized GNNs to extract directed dependencies among variables, enhancing predictive accuracy.19–21 Furthermore, integrating neural ordinary differential equations strengthened GNNs’ ability to model temporal dynamics. 22 Zhang et al. employed GNNs to model inter-variable associations and predict misaligned data points based on adjacent time steps. 23 Current GNNs are usually applied to observation windows spanning days to weeks to learn stable patterns or slowly evolving trends. This is inconsistent with the needs of ICU intubation prediction, which relies on short and sparse observation windows. As a result, existing GNNs often suffer from over-smoothing and performance degradation. 24
In summary, while progress has been made in developing ICU intubation prediction models, solutions tailored for elderly patients remain underdeveloped. Existing approaches often fail to adequately capture the dynamic and interrelated nature of clinical variables, instead treating vital signs and lab values as isolated rather than interdependent. This shortcoming is further complicated by data sparsity due to irregular sampling, which can degrade model performance. Moreover, while current explainability methods can identify key predictive features, they fall short in elucidating the complex interactions between clinical variables.
To address these limitations, we propose the dynamic mask attention graph neural network (DymaGNN) to enhance intubation prediction in geriatric ICU patients. The main contributions of this study are as follows:
We construct a time-evolving graph structure where dynamic physiological variables are represented as nodes. Adaptive edge weights capture cooperative and antagonistic effects between variables, allowing a more comprehensive representation of complex interactions. We employ masked attention temporal aggregation to identify critical time windows, improving the utilization of high-quality data segments while mitigating the impact of missing data. DymaGNN shows good performance in multiple mechanical ventilation prediction tasks. The generated feature interaction graph is consistent with clinical knowledge, which enhances doctors’ trust in the model's judgments.
This study provides a new method with both high prediction accuracy and interpretability 25 for the intubation prediction of elderly ICU patients.
Method
We first establish the ventilation prediction task through expert-guided ventilation status reclassification and sliding window segmentation. Subsequently, we conduct rigorous data splits to avoid temporal data leakage. Second, we detail the dynamic graph construction that encodes variable relationships through adaptive edge weights and mask attention mechanisms to handle irregular sampling. The integrated framework enables simultaneous learning of cross-feature dependencies and temporal dynamics for intubation prediction.
Data source
This study is based on the MIMIC-IV 2.2 database and focuses on elderly ICU patients aged 65 26 years and older to analyze their ventilation requirements and develop predictive models. In the MIMIC dataset, the mimiciv_derived.ventilation table classifies ventilation status into five categories based on the type of oxygen therapy and ventilatory support: (0) no oxygen therapy, (1) supplemental oxygen, (2) high-flow nasal cannula (HFNC), (3) non-invasive ventilation (NIV), and (4) invasive mechanical ventilation (IMV), which includes both endotracheal intubation and tracheostomy.
To facilitate analysis, we reclassified the ventilation status into two groups: intubation and assisted ventilation. Specifically, category 4 (IMV) is designated as intubation, encompassing both endotracheal intubation and tracheostomy. Meanwhile, categories 1 (supplemental oxygen), 2 (HFNC), and 3 (NIV) are grouped under assisted ventilation.
For the intubation prediction task, patient outcomes are categorized into two groups: those who underwent intubation (including endotracheal intubation and tracheostomy) and those who did not receive intubation (including patients with no oxygen therapy, supplemental oxygen, HFNC, or NIV). The mapping of ventilation status to categories is summarized in Table 1.
Reclassification of ventilation status in the MIMIC-IV dataset.
HFNC: high-flow nasal cannula.
The variables utilized in our study, extracted from the MIMIC-IV database, are summarized in Table 2. Static variables comprise demographic information like gender and admission age. We incorporate diagnostic information with specific constraints: Given the absence of temporal annotations for diagnoses in MIMIC-IV, our analysis was rigorously limited to chronic diseases and pre-existing conditions to prevent potential data leakage. This focus is particularly clinically appropriate for geriatric ICU patients, as healthcare services in this population pay attention to chronic diseases. 27 Additionally, only procedures performed before the prediction window were included as inputs.
Features in the processed dataset.
ICU: intensive care unit.
Dynamic variables encompass time-series measurements of vital signs and laboratory test results. For vital sign features, we retained all seven high-frequency recorded in the MIMIC-IV database due to their general availability and foundational role in intensive care monitoring. Laboratory variables present wide variation in measurement frequency. To select the most representative and clinically relevant laboratory variables while addressing challenges related to high missingness rates, we implemented the following strategy: We calculated the measurement frequency (i.e. the count of times each variable was performed) for the entire study cohort. These variables were then ranked from the highest to the lowest measurement frequency. The top 25 most frequently measured items were selected as the final input features. This strategy is grounded in clinical practice as high measurement frequency indicates the high level of clinical attention and prioritization in ICU.
Data processing
To formulate the prediction task for intubation requirements in ICU patients, we employ a sliding window mechanism to segment the time-series data. Each sample consists of three consecutive windows: an observation window (6 h), a gap window (2 h), and a prediction window (1 h). The physiological data within the observation window serve as input to predict whether the patient will require mechanical ventilation in the subsequent 1-h period following the 3-h gap, denoted as
To prevent data leakage, which may arise from direct random splitting and compromise the model's generalization ability, we strictly adhere to a chronological fivefold cross-validation strategy. This ensures that training and test sets in different folds do not share temporal information, thereby maintaining the validity and robustness of the experimental results.
For a given sample

Illustration of input data for ICU ventilation prediction. ICU: intensive care unit.
Dynamic mask attention graph neural network model
Patient physiological data in the ICU exhibit dynamic and heterogeneous characteristics, encompassing static and time-series features. The irregularity of data, and complex variable interactions in disease progression, makes intubation prediction highly challenging. To tackle these issues, we propose DymaGNN, a dynamic GNN that adaptively models heterogeneous ICU data, captures inter-variable dependencies and highlights critical time steps to improve predictive performance.
DymaGNN represents ICU data as a feature interaction graph in Figure 2, where nodes correspond to dynamic variables and edges capture their interactions, encoded via edge embeddings. While the graph structure remains static, node representations and edge weights evolve over time. To handle temporal irregularity, a masked attention mechanism identifies key time steps, enhancing feature aggregation and prediction accuracy.

Hierarchical architecture of GNN layer in DymaGNN. DymaGNN: dynamic mask attention graph neural network; GNN: graph neural network.
For a given sample
For an edge
A softmax normalization is then applied to derive the final edge weight
The threshold top
To select key time steps, we employ a masked attention mechanism. We first compute the query, key, and value matrices:
The attention weight is then computed as follows:
The missing mask
The final temporal aggregation representation is computed as follows:
To construct the final feature representation, we concatenate the time-aggregated representations of all dynamic variables
The prediction of intubation need is then obtained via an MLP:
During training, we minimize the binary cross-entropy loss:
DymaGNN effectively integrates feature interaction information with temporal dynamics in ICU data. By leveraging a dynamic GNN to model variable interactions and employing a masked attention mechanism to aggregate key time steps, our approach enhances the predictive accuracy of intubation need assessment in ICU patients.
Experiment results
We conducted several experiments to demonstrate the effectiveness of DymaGNN. We first compare the performance of intubation predictions with various models. Then we conduct our model on other ventilation prediction tasks, including assistant ventilation prediction based on MIMIC-IV and intubation prediction based on eICU dataset. We also visualized DymaGNN's edge weight and found that many of the interrelationships between the features corresponded with medical knowledge, which greatly increased doctors’ trust in DymaGNN.
We conducted ablation experiments and experimental effects of the model under different data loss percentages to demonstrate the effectiveness and robustness of our model. Experiments were conducted on the Google Tesla T4 GPU with 16GB memory.
Results in intubation prediction
We evaluate the performance of our proposed method against several baseline models, including SVM, 6 XGBoost, 28 LSTM, 29 Transformer, 30 Multivariate Time Series Graph Neural Network (MTGNN), 22 and Raindrop. 23 SVM, a traditional machine learning approach, is known for its stability in high-dimensional classification tasks. XGBoost, a widely used ensemble learning method for tabular data, exhibits robustness in handling missing values and outliers. LSTM and Transformer, both designed for sequential data modeling, effectively capture long-term dependencies. MTGNN, a GNN-based time-series forecasting model, constructs a graph structure to represent complex multivariate dependencies and demonstrates advantages in imputing missing values.
To ensure the reliability of the experimental results, we employ fivefold cross-validation, with five repeated experiments, reporting the average performance across all folds. The evaluation metrics include accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic curve (AUC), providing a comprehensive assessment of classification performance and model robustness. The results are presented in Table 3.
Performance comparison of different models in intubation prediction.
LSTM: long short-term memory; AUC: area under the curve; SVM: support vector machine; MTGNN: Multivariate Time Series Graph Neural Network.
Our method achieves the best performance in most metrics, particularly showing a significant advantage in AUC (AUC = 0.8388, p < 0.05 vs. all baselines) over other methods. However, in accuracy, our method is outperformed by SVM (0.7711 vs. our 0.7725, p > 0.05), though not significantly. Similarly, XGBoost shows a slight edge in precision (0.7634 vs. our 0.7533, p > 0.05), and the difference is not significant. This implies that traditional models may still have some competitiveness in specific metrics. Overall, traditional machine learning models generally underperform compared to time-series and GNN models.
Among time-series models, LSTM performs worse than Transformer. Within GNN models, while our method has the highest F1 score on average, the difference is not significant compared to other methods.
Analysis of feature interaction in dynamic mask attention graph neural network
The edge weights
As Figure 3 shows, the interaction pattern of physiological indicators revealed in this study is highly consistent with existing clinical knowledge, and this explainability helps to enhance clinicians’ trust in the predicted results of our model. Specific findings are as follows:

Feature interaction.
RR exhibits correlations with blood glucose, calcium ions (Ca2+), anion gap, hemoglobin (Hb), mean corpuscular hemoglobin concentration (MCHC), and red cell distribution width (RDW). Specifically, a patient in the MIMIC-IV dataset exhibited significantly low blood calcium levels 6.3 mg/dL at 05-04 17:30, the RR was observed to sharply increase to 26 breaths/min. This spike likely represents compensatory hyperventilation triggered by the hypocalcemic state. Subsequently, blood calcium levels gradually rose, increasing to 7.9 mg/dL by 05-04 15:00. In parallel, the RR steadily decreased and ultimately stabilized at a normal range of 14 breaths/min.
Elevated blood glucose levels are associated with increased RR, 12 as observed in diabetic ketoacidosis (DKA) patients who develop tachypnea to compensate for metabolic acidosis. This relationship is illustrated in the following clinical observation from a DKA patient: At 06-05 22:08:00, the patient's blood glucose was measured 145.0 mg/dL, with a corresponding 30-min average RR of 21.0 breaths/min. At the subsequent measurement 06-07 18:24:00, blood glucose increased to 335.0 mg/dL, and the average RR increased to 24.6 breaths/min during the same monitoring interval.
Hypocalcemia may impair neuromuscular function and induce respiratory distress, 31 while anion gap expansion reflecting metabolic acidosis typically triggers compensatory hyperventilation (Kussmaul respiration). 32 Reduced Hb, abnormal MCHC, and elevated RDW may further modulate RR through impaired oxygen transport efficiency. 33
Erythrocyte parameter abnormalities may affect renal hemodynamics through blood viscosity changes and oxygen delivery alterations. 41 These findings underscore the necessity for multimodal renal function assessment in clinical practice.
Analysis of feature importance in XGBoost
Figure 4 shows the dynamic feature importance ranking given by the XGBoost model. Respiratory function-related features are prioritized, with SpO2 as the most important, followed by mean arterial pressure and diastolic blood pressure. Subsequently, hematological features such as hematocrit, Hb, mean corpuscular volume, MCH, and MCHC are also of significant importance. Additionally, features related to electrolyte balance, such as calcium and sodium ion concentrations, exhibit high significance. The prominence of these features indicates their crucial role in predicting the need for intubation in elderly ICU patients.

Importance of dynamic features in XGBoost.
Performance in other ventilation tasks
To further evaluate the model's generalization capability across different ventilation prediction tasks for elderly ICU patients, we conducted experiments on NIV prediction using the MIMIC-IV dataset and intubation prediction using the eICU dataset. As shown in Table 4, the model demonstrates good performance in the eICU intubation prediction task (AUROC = 0.8557), indicating its efficacy in identifying critically ill patients requiring urgent mechanical ventilation. However, its performance significantly declines in the NIV prediction task (AUROC = 0.7246, precision = 0.4661).
Performance comparison on different ventilation tasks.
ICU: intensive care unit; AUC: area under the curve; NIV: non-invasive ventilation.
Influence of the missing rate
We evaluate three representative models—traditional machine learning SVM, sequence model Transformer, and our proposed GNN DymaGNN—to analyze their performance under varying data missing rates in Figure 5.

AUC performance under varying missing rates. AUC: area under the curve.
The SVM shows marked performance degradation, with a 12% AUC decrease at a 5% missing rate, confirming its strong feature dependency. Its performance progressively declines as the missing rates increase. The Transformer maintains robust performance (AUC > 0.78) below 10% missing rate, but exhibits accelerated deterioration when missing rates surpass 15%. Our DymaGNN, while following a similar trend to Transformer, demonstrated superior robustness. It sustains AUC above 0.81 at a 15% missing rate through dynamic graph structure adoption.
Figure 6 illustrates other metrics under varying data missing rates. SVM demonstrates remarkable stability in accuracy despite performance degradation in other metrics—precision drops 25% (from 0.72 to 0.53) and recall decreases 68% (from 0.54 to 0.17) at a 20% missing rate.

Performance under varying missing rates.
Our proposed DymaGNN shows trends with Transformer but significantly outperforms in recall metrics. Notably, the recall gap reverses from −0.04 (DymaGNN 0.77 vs Transformer 0.81) at 0% missing rate to +0.14 (DymaGNN 0.70 vs Transformer 0.56) at 20% missing rate.
Ablation experiment
To further validate the efficacy of DymaGNN, we conducted two ablation studies: 1) “weights” means fixing the time-varying edge weights to 1, and 2) “attention” means replacing the masked attention-based temporal aggregation with mean pooling.
As shown in Table 5, fixed edge weights caused significant performance degradation (AUC decreased by 2.85%; F1 decreased by 5.78%) compared to the original DymaGNN, confirming the critical role of dynamic edge weights in capturing time-evolving physiological interactions. Removing the masked attention module resulted in a marginal AUC decline to 0.8262, suggesting that this component enhances predictive stability through prioritized integration of critical time windows, while future work could explore better temporal aggregation strategies.
Performance comparison of different model variations.
AUC: area under the curve.
Discussion
Key findings
This study proposes DymaGNN to address intubation prediction in elderly ICU patients. Through comprehensive evaluations, our model demonstrates superior performance, particularly in the AUC metric, reflecting robust discriminative capacity in distinguishing between positive (intubation-required) and negative (non-intubation) patient groups. Although DymaGNN has the highest F1 score, it lacks statistical significance when compared to other GNN models. This indicates potential for refinement in balancing precision and recall—a clinically critical consideration given that false negatives (indicating delayed intubation) may increase patient mortality, while false positives (leading to unnecessary intubation) escalate resource burdens.
Traditional machine learning approaches (SVM and XGBoost) exhibit limited capability in processing ICU data's temporal and multivariate characteristics. While sequence models show improvement, the Transformer's advantage over LSTM confirms the value of attention mechanisms in capturing time-series patterns, as established in Ayad et al.'s work. 42 Notably, graph-based models excel at handling irregular data, managing different sampling frequencies effectively. Raindrop concatenates observation values with timestamp information, while DymaGNN employs a decoupled processing of observation values and timestamps. This design may preserve model expressiveness and reduce parameters, enhancing adaptability to sparse medical data.
Under varying missing data rates, the deterioration across multiple metrics in traditional machine learning models reveals their high dependency on complete data sets, where even a 5% data loss triggers a substantial performance decline. 43 Graph models exhibit significantly stronger robustness compared to traditional methods. This advantage may stem from DymaGNN's ability to infer missing values through cross-feature dependencies—a capability aligned with Yalavarthi et al.'s findings on GNN superiority in missing data imputation. 44 This capability proves clinically critical for real-time monitoring scenarios where partial sensor failures occur. While most studies report strictly declining performance with increasing missing rates, 45 our experiments reveal non-monotonic degradation patterns. This divergence may be attributed to contextual factors and experimental design, where random deletion occasionally removes critical features (e.g. SpO2) despite lower overall missing rates.0.7608.
The XGBoost model reveals that respiratory function indicators (e.g. SpO2), hematological parameters (e.g. Hb), and electrolyte balance indicators (e.g. Ca2+) are highly important, aligning with the key nodes (high degree or high edge weights) identified by DymaGNN. Notably, despite RR being a core physiological indicator of respiratory distress and compensation, and a proven predictor of intubation needs in prior studies, 46 it ranks relatively low in feature importance in our XGBoost analysis. This may stem from the model's handling of strongly correlated features. SpO₂, a key output of respiratory function, is strongly correlated with RR and may partially “capture” or “replace” RR's predictive information in XGBoost, thereby underestimating RR's independent contribution. Kazemitabar 47 also supports that correlated variables can impact feature importance in XGBoost (the model remains robust in the presence of multicollinearity). In contrast, DymaGNN's dynamic interaction graph explicitly positions RR as a core node. DymaGNN models potential feature interaction pathways through its graph structure, allowing it to identify RR's importance even when its changes are often accompanied by SpO2 changes. This mechanism-based explanation 48 better resonates with clinical practitioners’ understanding.
Clinical implications
Our proposed model enhances ICU intubation prediction accuracy, which can potentially improve patient outcomes. Experiments show GNNs excel in handling ICU data with irregular sampling, offering a robust benchmark for future ICU research.
Data quality, especially missing data rates, critically impacts model performance. For practical deployment during data collection, it is advisable to adhere to MIMIC-IV's data acquisition protocols 49 regarding sampling frequency and quality control standards. The minimum compliance threshold should not fall below 85% 50 of these benchmarks, as model efficacy substantially degrades when missing data rates exceed MIMIC-IV's baseline by approximately 15%. 51
Moreover, the edge-weight-based feature importance analysis method introduced in the study improves model interpretability. It can be cross-verified with other feature importance methods, enhancing conclusion reliability while revealing feature interactions. This enables clinicians to assess whether the model's key features match clinical knowledge, boosting trust in model decisions and facilitating practical application.
Limitations and future work
Future research directions mainly include the following three aspects. DymaGNN performed well in the intubation prediction task, but decreased in the NIV prediction task. We may attribute this to two main factors. First, this difference may stem from the inherent uncertainty of assisted ventilation decision-making in clinical practice. 52 Due to the lack of clear clinical indicators and significant individual differences in patients, there are significant differences in physician judgment in NIV decision-making. This uncertainty may lead more conservative physicians to adopt assisted ventilation prematurely. 53 Second, in the NIV prediction task with a positive-to-negative sample ratio of 2.94, the model exhibited inferior performance on imbalanced data. This imbalance likely induced a prediction bias toward the positive class, as evidenced by a relatively low precision—indicating a high false positive rate. Consequently, clinical deployment risks perpetuating unfair outcomes for minority patient subgroups, raising ethical concerns regarding algorithmic fairness. 54 To mitigate this limitation in future work, we suggest exploring the use of a weighted loss function to improve the model's performance. 55
Secondly, the dynamic weight graph in the current DymaGNN is entirely learned adaptively from data. In the next step of research, medical prior knowledge such as respiratory physiological mechanisms and the laws of blood gas compensation can be introduced to guide the construction of GNNs. This prior knowledge can not only reduce the scale of training parameters but also enhance clinical interpretability. Meanwhile, the aggregation method for time-level and feature-level 56 needs to be optimized to capture the dynamic evolution patterns of physiological variables more accurately.
Thirdly, this study mainly focuses on short-term intubation prediction. It can be further expanded, such as multiple ventilations, repeated ventilations, and prognosis assessment. For example, studying indicators like survival, rehabilitation status, and quality of life after receiving intubation can more comprehensively evaluate the effectiveness of mechanical ventilation decisions, 57 which can provide more long-term guidance for clinical treatment.
Conclusion
Our proposed DymaGNN presents a clinically valuable solution for predicting mechanical ventilation in elderly ICU patients. By dynamically modeling the critical, time-evolving interactions between physiological variables, DymaGNN achieves high predictive accuracy and maintains reliability even with 10% missing data. Crucially, its interpretable feature interaction graphs, which align with established clinical knowledge, provide clinicians with transparent insights into the model's reasoning. This integration of precision and explainability establishes a critical foundation for deploying trustworthy AI tools in real-world ICU intubation prediction. 58
Supplemental Material
sj-pdf-1-dhj-10.1177_20552076251361680 - Supplemental material for Methodological development study: Dynamic mask attention graph neural network for mechanical ventilation in elderly intensive care unit patients
Supplemental material, sj-pdf-1-dhj-10.1177_20552076251361680 for Methodological development study: Dynamic mask attention graph neural network for mechanical ventilation in elderly intensive care unit patients by Yi Xie, Ni Xie and Jiao Guo in DIGITAL HEALTH
Supplemental Material
sj-pdf-2-dhj-10.1177_20552076251361680 - Supplemental material for Methodological development study: Dynamic mask attention graph neural network for mechanical ventilation in elderly intensive care unit patients
Supplemental material, sj-pdf-2-dhj-10.1177_20552076251361680 for Methodological development study: Dynamic mask attention graph neural network for mechanical ventilation in elderly intensive care unit patients by Yi Xie, Ni Xie and Jiao Guo in DIGITAL HEALTH
Footnotes
Acknowledgements
Ethical considerations
Author contributions
Funding
Declaration of conflicting interests
Data availability statement
Supplemental material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
