Sage Journals: Discover world-class research

Abstract

Background

Mechanical ventilation is essential in intensive care units (ICUs) but poses risks such as ventilator-associated complications and high costs. The accuracy of predicting mechanical ventilation duration using clinical information is limited. Predicting ventilation duration accurately can aid clinical decisions like resource-allocation and early tracheostomy-planning.

Objective

To develop explainable artificial intelligence (AI) models for predicting mechanical ventilation duration leveraging diverse clinical parameters from ICU patient data.

Methodology

This development and testing study analysed 323 mechanically ventilated patients {(n = 323, Male:Female = 160:163, Age = 42.87 ± 19.54 years (mean ± standard deviation)} from three ICUs at AIIMS, Delhi. The dataset included 100-clinical parameters per patient. Two models were developed: (1) A regression model (n = 323) to predict ventilation duration in days, and (2) A classification model (n = 218, non-tracheostomized) to predict short- (≤3 days) vs. long-term (>3 days) ventilation requirements. The misclassification-cost was altered for the classification model. Feature selection was performed using Shapley additive explanations (SHAP) on a random forest model, and training was done with 5-fold cross-validation (80% training, 20% testing).

Results

The least-squares boosting regression model achieved root mean squared error (RMSE) of 4.66 days and coefficient of Determination (R²) of 0.65 using 34-SHAP-selected features, with tracheostomy (53.66% importance) being the top predictor. The best classification model, K-nearest neighbours, achieved 79.1% accuracy, Area under the receiver-operating-characteristic-curve (AUROC) of 0.82, sensitivity of 71.4%, and specificity of 86.4% using 47-SHAP-selected features. Key predictors included ICU admission type (8.1%), PO₂ (5.6%), and pH (5%).

Conclusion

AI-driven prediction of ventilation duration can enhance ICU workflows, optimize resource use, and improve personalized care. SHAP-based feature selection promotes AI interpretability, aiding clinical adoption.

Keywords

Clinical decision making explainable AI intensive care unit machine learning mechanical ventilation Shapley additive explanations (SHAP)

Introduction

Mechanical ventilation is indispensable in managing critically ill patients in the intensive care unit (ICU), particularly for patients with respiratory failure or acute respiratory distress syndrome (ARDS).^1,2 While this life-sustaining intervention stabilizes patients, its prolonged use can lead to complications such as ventilator-associated pneumonia (VAP) and lung injuries,^1,3,4 significantly impacting patient outcomes and raising healthcare costs.

Clinicians often rely on subjective, experience-based assessments to estimate the expected duration of mechanical ventilation, integrating multiple clinical features into implicit predictions. While clinical indicators such as shock at admission, sepsis-related parameters, or scores like the sequential organ failure assessment (SOFA) score may help in this decision-making, their interpretation can vary considerably between practitioners and patient contexts. Such evaluation is prone to inaccuracies due to the complexity and variability in clinical conditions.^5,6 Artificial intelligence (AI) may offer a promising solution, enabling data-driven predictions that can complement clinical judgment.^7,8 Earlier studies have focused on predicting mechanical ventilation duration, using machine learning and deep learning approaches^6,9–13 and have used classification^6,9–11 and regression^12,13 models for prediction. The classification models demonstrated in previous studies have used different binary classification cut-offs for defining prolonged mechanical ventilation duration and have achieved an Area under the receiver operating characteristic curve (AUROC) anywhere between 0.69 and 0.83 using different classifiers.^6,9–11 The previous regression studies on ARDS patients achieved a root mean squared error (RMSE) 5–6 days.^12,13 Therefore, more studies in this area are desirable. Further, most datasets used for model training come from high-income countries,^6,9,12,13 limiting their generalizability to diverse populations as it may not capture the characteristics of healthcare systems in low- and middle-income countries (LMICs).¹⁴

This study is an attempt to develop supervised machine learning models to predict mechanical ventilation duration in ICU patients using a diverse dataset from a single tertiary care hospital in India, representing a typical LMIC setting. This study can classify patients into short- or long-term ventilation groups as well as predict ventilation days. The Shapley additive explanations (SHAP)-based feature selection enhances prediction accuracy and interpretability, aiding informed clinical decisions.^15,16 The findings could have significant implications for improving ICU workflows, optimizing resource allocation, timing of tracheostomy, and advancing patient care,⁶ especially in low- and middle-income countries with limited resources.

Methodology

Data set and features

The data set consisted of a total of 660 mechanically ventilated patients {n = 660, Male:Female = 365:295, Age = 44.45 ± 19.36 years (mean ± standard deviation)} from three ICUs at AIIMS, Delhi, collected in 2023–24 after ethics approval (Institute Ethics Committee of All India Institute of Medical Sciences, New Delhi, India; ethical approval number: IECPG-190/20.04.2023, RT-05/07.06.2023, CTRI/2023/10/058972). Clinical parameters were recorded at the time of ICU admission. Of these, 323 patients {n = 323, Male:Female = 160:163, Age = 42.87 ± 19.54 years (mean ± standard deviation)} who were alive at the time of extubation were used for regression model to ensure accurate ventilation duration records. This approach improved model training by excluding extubation post death, enhancing data quality–a key factor in machine learning performance.¹⁷ The descriptive statistics of all the features analysed in the dataset are provided in Supplemental Table 1. The subset included for the regression model had patients with diverse pathologies (67% patients with PO₂/FiO₂ ratio <300, 56.5% patients with mean arterial pressure <65 or requiring vasopressors, 39.2% patients with creatinine levels >1.5, 30.9% patients with total bilirubin levels > 2 mg/dL or INR >1.5 and 38.6% patients with Glasgow Coma Scale score <15), including patients with ARDS and respiratory failure and 105 patients out of the subset had undergone tracheostomy. The classification model used 218 non-tracheostomized patients in order to observe changes in the model performance and since tracheostomy is one of the important clinical decisions that can be taken with the prediction model.

The data set comprised of clinical parameters, 92 measured by the clinician in the ICU and 8 clinically significant features derived from the existing parameters. These were categorized into seven groups; i) demographic ii) vitals iii) blood tests and arterial blood gas parameters iv) parameters related to mechanical ventilation v) respiratory system specific features vi) cardiovascular system specific features and vii) ICU specific features. Some of the clinically significant scores that were a part of the data set were SOFA score, which is an important clinical score used in ICU to determine patient prognosis,^18,19 ventilatory ratio, PO₂/PCO₂, rapid shallow breathing index, shock index and cumulative lung ultrasound score, a valuable clinical parameter in critically ill patients to incorporate imaging data indirectly for the analysis.²⁰ The clinical parameters selected were based on availability in the hospital, clinical significance, and prior studies.^21,22,23 The features were transformed into numerical format, with categorical variables encoded using one-hot encoding and continuous variables retained as floating-point numbers. Features with more than 3% missing data were excluded to reduce the risk of bias associated with imputation in a limited sample size. This criterion resulted in the exclusion of only seven features from further analysis. For the remaining missing values, median imputation was applied to continuous features, while mode imputation was used for binary variables.

Features selection

Feature selection was essential to reduce model complexity. While offering great opportunities to discover patterns and tendencies, dealing with high-dimensional data can be complicated due to the so-called curse of dimensionality.²⁴ For both the models, features were selected using Shapley additive explanations (SHAP), a game theory approach to calculate a value for each feature that represents its contribution to the model's prediction.¹⁵ It is instrumental in healthcare because it enhances model transparency and supports clinician trust in AI predictions. Various studies emphasize the importance of SHAP specifically in healthcare.^25,26,27 Random forest model was used for calculating SHAP scores for feature importance. Following SHAP, the top features contributing to >80% of the total SHAP scores were included for analysis. Some of the low ranking or redundant features, based on a Spearman correlation threshold of >0.8 and in cases where a composite feature and one of its constituent variables conveyed overlapping information, were eliminated. This resulted in a reduced feature set of 34, representing 86% of data importance in the regression model, and 47, representing 93% of data importance in the classification model.

Prediction model development

Two AI models: 1. Regression model (323 patients) and 2. Classification model (218 patients) using various algorithms were developed. Both the models were trained using 5-fold cross-validation, with a split of 80% data for training and 20% for testing respectively.

REGRESSION MODEL: This was developed to predict mechanical ventilation duration in days. The data set also included patients with tracheostomy and all were extubated alive. Seven models were trained; linear, tree, support vector machines (SVM), Gaussian process regression, least squares kernel, bagged ensemble and least squares boosting ensemble methods. Performance metrics for the regression model included root mean squared error (RMSE), mean squared error (MSE), and coefficient of determination (R²).

BINARY CLASSIFICATION MODEL: Designed to classify patients into short- (≤3 days) or long-term (>3 days) mechanical ventilation groups as positive class and negative class respectively, this model supports early clinical decisions, including tracheostomy. Although there is no specific cut-off for defining long-term mechanical ventilation, this 3-day cut-off was chosen for its clinical significance as the risk of developing ventilator associated pneumonia significantly increases beyond this period.^28,29 In addition, the data points in the binary class problem were observed to be equal and thus, did not require Synthetic Minority Oversampling Technique (SMOTE)^30,31 or data elimination for correcting class imbalance.

The outcome classes were equally stratified across training and testing groups to prevent any class bias.³² Six models were trained; logistic regression, discriminant, SVM, K-nearest neighbour (KNN), kernel and bagged ensemble methods. Misclassification costs were adjusted for KNN and bagged ensemble models to observe the difference in model performance. The focus was to correctly identify the patients who would require mechanical ventilation for >3 days (specificity) since they would require more resource allocation and extra care. Bayesian optimization was employed for hyperparameter tuning, an efficient method that leverages probabilistic models to optimize machine learning model performance.^33,34 Classification model metrics were accuracy, AUROC, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV).

Comparison of regression model features between training and test datasets and comparison of classification model features between short- (≤3 days) or long-term (>3 days) mechanical ventilation groups are presented in Supplemental Tables 2 and 3, respectively. The Mann Whitney-U and Chi-Square tests were applied to determine statistical significance (p-value < 0.05) of continuous and categorical features between the two groups respectively.

Results

Regression model

Table 1 shows the regression model performance for predicting mechanical ventilation duration. Compared to other models, the Least Squares Boosting Ensemble regression model achieved the best results giving a training RMSE of 5.24 days and R² of 0.63 and a test RMSE of 4.66 days and an R² of 0.65 using top 34 SHAP-selected features. Figure 1 highlights SHAP feature importance, where tracheostomy (53.66%), illness duration before ICU admission (4.66%), and SGPT (3.13%) were observed to be key factors.

Figure 1.

Top 20 SHAP ranked features for regression model.

Table 1.

Training and test performance for regression models for predicting duration of mechanical ventilation (days) in patients admitted in ICU.

Model	RMSE Training	RMSE Test	MAE Training	MAE Test	R² Training	R² Test
Linear regression	6.62 ± 0.30	4.84	4.67 ± 0.11	3.64	0.42 ± 0.03	0.51
Tree	5.74 ± 0.39	4.51	4.33 ± 0.37	3.32	0.57 ± 0.16	0.58
SVM	8.77 ± 0.24	6.96	6.85 ± 0.29	5.77	−0.01 ± 0	−0.01
Gaussian process	5.72 ± 0.56	4.68	4.09 ± 0.43	3.46	0.57 ± 0.08	0.54
LS Kernel	6.07 ± 0.15	4.79	4.56 ± 0.16	3.72	0.52 ± 0.01	0.52
Bagged ensemble	5.4 ± 0.32	4.26	3.86 ± 0.31	3.12	0.61 ± 0.04	0.62
LS Boost ensemble	5.24 ± 0.52	4.66	3.98 ± 0.33	2.95	0.63 ± 0.14	0.65

Classification model

Table 2 shows the classification model performance for predicting mechanical ventilation duration with SHAP selected features and default misclassification cost. Bagged Ensemble and KNN demonstrated the best performance when the misclassification costs for false negatives (FN) and false positives (FP) were initially set to equal weight (1:1). Given the superior performance of these two models, the misclassification cost parameters were adjusted to further optimize their predictive accuracy. The misclassification costs were adjusted through a trial-and-error approach, and after multiple iterations, setting 2.5:2.5 yielded the best performance for the KNN models, indicating the model's sensitivity to cost parameters (Table 3). Figure 2 demonstrates SHAP feature importance, with Type of ICU admission (8.1%), PO₂ (5.6%), and pH (5%) as key factors. KNN model achieved a training accuracy of 76%, AUROC 0.78, sensitivity of 73.9% and specificity of 78.2% and a test accuracy of 79.1%, AUROC of 0.82, sensitivity of 71.4%, and specificity of 86.4% using 47 SHAP-selected features. The training and test data were standardized using z-score normalization separately before model development. Additional hyperparameters for cost adjusted KNN model were: k = 9; distance metric = correlation; distance weight = equal. Figure 3 demonstrates ROC plots for cost-adjusted KNN model. A comparative assessment of classification models’ performances using all 100 features is shown in Supplemental Table 4.

Figure 2.

Top 20 SHAP ranked features for classification model.

Figure 3.

ROC plots for cost adjusted KNN model.

Table 2.

Training and test performance for classification models for predicting short- or long-term mechanical ventilation requirement in patients admitted in ICU.

Model	Training accuracy	Test accuracy	Training auroc	Test auroc	Training sensitivity (≤3days)	Test sensitivity (≤3days)	Training specificity (>3 days)	Test specificity (>3 days)
EFFICIENT LOGISTIC REGRESSION	0.65 ± 0.08	0.67	0.73 ± 0.08	0.69	0.60 ± 0.08	0.67	0.69 ± 0.12	0.68
DISCRIMINANT	0.73 ± 0.06	0.70	0.79 ± 0.09	0.83	0.71 ± 0.15	0.76	0.76 ± 0.14	0.64
SVM	0.69 ± 0.05	0.70	0.72 ± 0.06	0.70	0.71 ± 0.10	0.71	0.68 ± 0.06	0.68
KNN	0.75 ± 0.05	0.74	0.81 ± 0.07	0.83	0.72 ± 0.09	0.76	0.78 ± 0.08	0.73
KERNEL	0.72 ± 0.02	0.70	0.80 ± 0.02	0.80	0.74 ± 0.04	0.71	0.70 ± 0.02	0.68
BAGGED ENSEMBLE	0.75 ± 0.06	0.72	0.78 ± 0.07	0.83	0.75 ± 0.10	0.76	0.75 ± 0.08	0.68

Table 3.

Training and test performance of KNN and bagged ensemble classification models with altered misclassification cost (FP = FN = 2.5).

Model	Training accuracy	Test accuracy	Training auroc	Test auroc	Training sensitivity (≤3days)	Test sensitivity (≤3days)	Training specificity	Test specificity	Training PPV	Test PPV	Training NPV	Test NPV
KNN	0.76 ± 0.08	0.79	0.78 ± 0.10	0.82	0.74 ± 0.20	0.71	0.78 ± 0.06	0.86	0.76 ± 0.06	0.83	0.72 ± 0.15	0.76
BAGGED ENSEMBLE	0.74 ± 0.07	0.72	0.81 ± 0.05	0.80	0.72 ± 0.15	0.71	0.77 ± 0.10	0.73	0.75 ± 0.07	0.71	0.76 ± 0.13	0.73

Discussion

This study developed robust and effective ML-based models for predicting duration of mechanical ventilation in ICU patients in LMICs with low resource settings. Two approaches were taken; first, to predict duration of mechanical ventilation in days using regression model and second, to predict short or long-term mechanical ventilation requirement using a 3-day cut-off classification model. Among regression models, the Least Squares Boosting Ensemble model achieved the best test RMSE of 4.66 days and an R² of 0.65 using 34 SHAP-selected features (representing 86% of the data). Key features included tracheostomy (53.66% importance), duration of illness before ICU admission (4.66%), and SGPT (3.13%). Among classification models, the K-nearest neighbours model achieved the best test accuracy of 79.1%, AUROC of 0.82, sensitivity of 71.4%, and specificity of 86.4% using 47 SHAP-selected features (representing 93% of the data). Top features included type of ICU admission (8.1%), PO₂ (5.6%), and pH (5%).

Previously developed classification models for predicting duration of mechanical ventilation were by Juan et al.⁶ achieving an accuracy of 0.69 using logistic regression and by Parreco et al.⁹ securing an AUROC of 0.83 using gradient boosting machines, both using the 7-day cut-off model. Another study by Villar et al.¹⁰ focussing on ARDS patients observed an AUROC of 0.71 with multilayer perceptron model. The 3-day cut-off model developed by Vali et al.¹¹ achieved an accuracy between 0.69–0.77 and AUROC of 0.76. In addition, they tested multiple models with different binary classification cut-offs. The regression model developed by Sayed et al.¹² reported an RMSE of 6 days and by Zichen et al.¹³ the Ensemble models observed an RMSE of 5.6 days, focussing on ARDS patients in the respective studies. There is no universally agreed-upon definition of prolonged mechanical ventilation and studies have adopted varying thresholds depending on clinical context and study objectives. For example, Juan et al.⁶ and Parreco et al.⁹ used a 7-day threshold to guide decisions around tracheostomy placement, noting that approximately 30% of mechanically ventilated patients require support for more than a week. Villar et al.¹⁰ focused on ARDS patients and used a 14-day threshold to aid early identification of individuals requiring long-term ventilatory support. Vali et al.¹¹ explored multiple thresholds (>3, 5, 7, 10, 14, and 23 days) and highlighted the lack of consensus in prolonged mechanical ventilation definitions. The current study used a 3-day threshold to enable early risk stratification given the increased risk of VAP after 48–72 h, using balanced classes for both the outcomes, and to support decision-making in low-resource, high-patient-load ICU settings where this cut-off facilitates earlier interventions without requiring prolonged observation windows, which may be less feasible in such environments.

The regression model results developed by Sayed et al.¹⁰ and Zichen et al.¹¹ are comparable to the findings of this study. However, direct comparison is difficult due to variations in AI techniques, datasets and selected features. Predicting continuous variables involves complex interactions between patient physiology, clinical interventions, and other unmeasured factors, making it inherently more difficult than binary classification.^35,36 In addition, our regression study has limited utility in guiding an important clinical decision of performing tracheostomy. Additionally, as per the SHAP analysis, most important contributing factor to the performance of regression model in the study is the feature of Tracheostomy. This is also clinically explainable since an early tracheostomy can reduce the duration of mechanical ventilation,³⁷ which could be the reason for its high importance in determining the duration of mechanical ventilation as compared to other features. Notably, the SHAP plot (Figure 4) reveals that patients who underwent tracheostomy were predicted to have a higher duration of mechanical ventilation, likely reflecting that tracheostomy is more often performed in patients who are already critically ill and require prolonged respiratory support. However, the Tracheostomy feature makes it difficult to compare the importance of other features in the regression study. After tracheostomy, blood gas parameters appear in SHAP ranking list for regression model which is also observed in classification model; pH, PCO₂ and Lactate are common to both. A longer duration of illness prior to ICU admission, along with elevated values of cumulative lung ultrasound score, alkaline phosphatase (ALP), procalcitonin, and international normalized ratio (INR), were indicative of increased illness severity and could be associated with prolonged mechanical ventilation, as shown by the SHAP dependence plots with locally weighted scatterplot smoothing (LOWESS) of top predictors in the regression model (Figure 4). Additionally, a higher rapid shallow breathing index (RSBI) is associated with extubation failure, which also aligns with the SHAP findings (Figure 4).

Figure 4.

SHAP dependence plots with LOWESS trend lines for top 10 predictors in the regression model.

The classification study gives AUROC results similar to the model developed by Parreco et al.,⁹ however, this study was able to achieve the results using a much simpler KNN model. The classification model, when optimized with adjusted misclassification cost, provides predictions with higher accuracy (training 76% and test 79.1%) for the duration of mechanical ventilation in ICU patients. The model also demonstrated a higher specificity (training 78.2% and test 86.4%), indicating its ability to identify patients correctly requiring prolonged mechanical ventilation (negative class: > 3 days), which could be further beneficial for optimizing ICU resource planning.³⁸ Essentially, cost-sensitive learning involves assigning different misclassification costs to the different classes, based on their importance for the task at hand. The absolute cost scaling of 2.5:2.5 should increase the weight of the error function during optimization, making the model more sensitive to minimizing overall errors. However, after multiple iterations assigning same but higher misclassification cost yielded better specificity in comparison. This variation could be due to limited data set for testing. The cost-sensitive paradigm has been largely explored in the context of imbalanced data sets.³⁹ The effectiveness of the different cost-sensitive techniques is yet to be comparatively explored in high-dimensionality problems.⁴⁰

The classification model highlights type of ICU admission as the most important feature. One can assume that patients transferred from the operation theatre (OT) to the ICU could have shorter ventilation durations compared to those admitted from the emergency room or wards due to less severe underlying conditions or pre-existing health issues in OT patients who are admitted for elective procedures. To further explore this, we analysed the distribution of ICU admission sources across the two outcome classes (<3 days vs. ≥ 3 days) in the classification cohort and observed that a significantly greater proportion of patients admitted from the OT were indeed found in the <3 days ventilation group, while transfer from emergency, ward or other ICUs were more frequent in the ≥3 days group (Supplemental Table 3). This supports our earlier clinical reasoning and states the potential influence of admission source on ventilation duration. The routine blood tests and blood gas parameters are ranked higher which are clinically significant as well. The importance of lab data and blood gas parameters has also been highlighted in the study conducted by Vali et al.¹¹ Figure 5 represents SHAP dependence plots with LOWESS trendline illustrating the relationship between key acid-base markers and predicted duration of mechanical ventilation. The plots show that features indicative of acidosis i.e., lower pH, higher PCO₂ and elevated lactate levels, consistently corresponded to more negative SHAP values. This suggests that these features contributed to a higher likelihood of prolonged mechanical ventilation (>3 days), aligning well with established clinical understanding of the impact of acid-base imbalance on patient outcomes. A higher PO₂ corresponding to lower SHAP values, which in this model indicated prolonged duration of mechanical ventilation, could be due to patients being on high FiO₂ on other non-invasive oxygen support before or at the time of ICU admission. Respiratory system specific parameters such as RSBI, cumulative lung ultrasound score and ventilatory ratio also contributed towards model performance which is clinically meaningful in mechanically ventilated patients, as the elevated values pointed towards prolonged duration of mechanical ventilation. The SOFA score, a key prognostic indicator in the ICU, was among the top 20 features in the SHAP analysis of the classification model, with higher scores indicating worse prognosis. Thus, SHAP based feature ranking provides results which are clinically interpretable. The results suggest that multiple factors, some which are clinically relevant and used for the study and many more unmeasured parameters, are responsible for estimating duration of mechanical ventilation. Early clinical predictions of mechanical ventilation duration by intensivists are often imprecise,^5,6 however, machine learning models hold significant potential for improving predictive accuracy in the future.

Figure 5.

SHAP dependence plots with LOWESS trend lines for arterial blood gas markers in classification model.

Several merits of this study are noteworthy. First, the inclusion of a diverse dataset comprising patients with multiple comorbidities, along with a wide range of parameters and clinically significant composite variables, enhances diversity and supports the development of generalizable predictions. Second is estimating the mechanical ventilation duration with a clinically relevant 3-day cut-off model. The model may have limitations in predicting the exact duration of mechanical ventilation, but can serve as a useful tool for identifying mechanically ventilated patients at risk of VAP. The classification model could also be deployed in an emergency setting and a prediction of ≤3 days in a mechanically ventilated patient can prevent their transfer to ICU. Third is the higher specificity offered by the classification model that can identify patients requiring more resources and help in optimized resource allocation and making early clinical decision. Fourth, this study utilizes multiple machine learning models to address the problem and highlights that a simpler model, such as KNN, can achieve better predictive performance. The current study also explores two different approaches to evaluate the outcome duration of mechanical ventilation using separate regression and classification models. Additionally, the interpretable SHAP-based approach educates clinicians about the importance of specific features and can help in reliable and explainable AI adoption in clinical practice.

There are certain limitations associated with the study as well. The study uses limited data set, comprising of patients from a single hospital in India. The classification model may perform well in the data set collected for the study but could struggle with generalizing to different patient populations that differ in demographics, comorbidities, or hospital protocols. However, the inclusion of Indian ICU dataset in the study emphasizes the critical importance of developing models tailored to the unique challenges and characteristics of healthcare systems in low- and middle-income countries. Second, the study uses spot clinical parameters on the day of ICU admission for making predictions. Although this helps in making early estimations, the predictions may change if estimated on second or third day of ICU admission due to the change in treatment strategies. The proposed regression model has limited role in decision making regarding tracheostomy. The binary classification model, classifies all the cases requiring mechanical ventilation of >3 days as prolonged mechanical ventilation duration. This study differs from the previous studies where prolonged mechanical ventilation is estimated as >7 days or >14 days. It limits generalization of the results. We also employed a pragmatic approach to alter the misclassification cost in the classifier. While this method simplified the process by avoiding the assessment of model performance across all possible cost variations, it may have overlooked the models with potentially better performance.

Conclusion

The proposed interpretable AI model provides a comprehensive and automated solution for predicting the duration of mechanical ventilation, and may help to optimize clinical decisions and resource allocation, and transform critical care practices. Future directions for this study include external third-party validation using independent patient cohorts to further test the reliability and generalizability of the results. Although we have indirectly considered an imaging parameter of Cumulative Lung Ultrasound score, the use of direct imaging markers such as Chest X-rays or CT scans or signal markers such as ECG can add to the feature list and may lead to improvement in model performance by capturing features that may not be explicitly identified by a clinician. Duration outcomes can be heavily influenced by time-dependent factors e.g., progression of disease, treatment response over time and models capturing spot data may not fully incorporate these dynamics. There is a potential need for time-series models which could be more suited to handling time-dependent outcomes. Advancements in this area can have the potential to significantly enhance clinical decision-making and support more comprehensive diagnostic processes, ultimately improving patient care.

Supplemental Material

sj-docx-1-dhj-10.1177_20552076251352988 - Supplemental material for Predicting mechanical ventilation duration in ICU patients: A data-driven machine learning approach for clinical decision-making

Supplemental material, sj-docx-1-dhj-10.1177_20552076251352988 for Predicting mechanical ventilation duration in ICU patients: A data-driven machine learning approach for clinical decision-making by Shivi Mendiratta, Vinay Gandhi Mukkelli, Esha Baidya Kayal, Puneet Khanna and Amit Mehndiratta in DIGITAL HEALTH

Footnotes

ORCID iDs

Shivi Mendiratta

Vinay Gandhi Mukkelli

Esha Baidya Kayal

Puneet Khanna

Amit Mehndiratta

Ethical considerations

Data collected in 2023–24 after ethics approval from Institute Ethics Committee of All India Institute of Medical Sciences,New Delhi,India;ethical approval number: IECPG-190/20.04.2023,RT-05/07.06.2023,CTRI/2023/10/058972

Consent to participate

The Institute Ethics Committee of All India Institute of Medical Sciences,New Delhi,India waived the need for ethics approval and patient consent for the collection,analysis and publication of the retrospectively obtained and anonymised data for this non-interventional study.

Author contributions

All authors were involved in the conception and design of the study. Vinay Gandhi was responsible for data collection and curation,as well as obtaining ethical approval. Shivi Mendiratta handled the original manuscript writing,data analysis and interpretation. Esha Baidya Kayal offered valuable technical insights for data interpretation and visualization,contributed to critical revisions,and assisted with manuscript writing. Professor Puneet Khanna provided essential clinical support in patient recruitment,offered clinical insights,and facilitated resource allocation within the hospital. Professor Amit Mehndiratta supervised the entire project,provided critical revisions,facilitated technical support,and held overall responsibility. Additionally,all authors contributed to the manuscript editing and have reviewed and approved the final version.

Funding

The authors received no financial support for the research,authorship,and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Consent for publication

Not applicable

Data availability statement

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Supplemental material

Supplemental material for this article is available online.

References

Hickey

Sankari

Giwa

AO.

Mechanical Ventilation. In: StatPearls [Internet]. Treasure Island, Florida, United States: StatPearls Publishing; 2025. Updated 2024 Mar 30. Available from: https://www.ncbi.nlm.nih.gov/books/NBK539742.

Liaqat

Mason

Foster

, et al. Evidence-based mechanical ventilatory strategies in ARDS. J Clin Med 2022; 11: 19.

Zhang

, et al. Risk factors of ventilator-associated pneumonia in critically ill patients. Front Pharmacol 2019; 10: 437860.

Tonetti

Vasques

Rapetti

, et al. Driving pressure and mechanical power: new targets for VILI prevention. Ann Transl Med 2017; 5: 86.

Figueroa-Casas

Connery

Montoya

, et al. Accuracy of early prediction of duration of mechanical ventilation by intensivists. Ann Am Thorac Soc 2014; 11: 182–185.

Figueroa-Casas

Dwivedi

Connery

, et al. Predictive models of prolonged mechanical ventilation yield moderate accuracy. J Crit Care 2015; 30: 502–505.

Cherifa

Pirracchio

. What every intensivist should know about big data and targeted machine learning in the intensive care unit. Rev Bras Ter Intensiv 2019; 31: 444–446.

Greco

Caruso

Cecconi

. Artificial intelligence in the intensive care unit. Semin Respir Crit Care Med 2021; 42: 002–009. doi: https://doi.org/10.1055/s-0040-1719037

Parreco

Hidalgo

Parks

, et al. Using artificial intelligence to predict prolonged mechanical ventilation and tracheostomy placement. J Surg Res 2018; 228: 179–187.

10.

Villar

Fernández

, et al. Predicting the length of mechanical ventilation in acute respiratory distress syndrome using machine learning: the PIONEER study. J Clin Med 2023; 13: 1811.

11.

Vali

Paydar

Seif

, et al. Prediction of prolonged mechanical ventilation in trauma patients of the intensive care unit according to initial medical factors: a machine learning approach. Sci Rep 2023; 13: 1–14.

12.

Sayed

Riaño

Villar

. Predicting duration of mechanical ventilation in acute respiratory distress syndrome using supervised machine learning. J Clin Med 2021; 10: 3824.

13.

Wang

Zhang

Huang

, et al. Developing an explainable machine learning model to predict the mechanical ventilation duration of patients with ARDS in intensive care units. Heart Lung 2023; 58: 74–81.

14.

Mills

. Health care systems in low- and middle-income countries. N Engl J Med 2014; 370: 552–557.

15.

Lundberg

Lee

SI.

A unified approach to interpreting model predictions. In: Proc 31st International Conference on Neural Information Processing Systems; 2017. doi: https://doi.org/10.5555/3295222.3295230.

16.

Shobeiri

. Enhancing transparency in healthcare machine learning models using SHAP and DeepLIFT: a methodological approach. Iraqi J Inf Commun Technol 2024; 7: 56–72. doi: 10.31987/ijict.7.2.285

17.

Fenza

Gallo

Loia

, et al. Data set quality in machine learning: consistency measure based on group decision making. Appl Soft Comput 2021; 106: 107366.

18.

Jones

Trzeciak

Kline

. The sequential organ failure assessment score for predicting outcome in patients with severe sepsis and evidence of hypoperfusion at the time of emergency department presentation. Crit Care Med 2009; 37: 1649–1654.

19.

Lakhani

. SOFA vs APACHE II as ICU scoring system for sepsis. J Integr Health Sci 2015; 3: 3–7. doi: https://doi.org/10.4103/2347-6486.239792

20.

Vetrugno

Biasucci

Deana

, et al. Lung ultrasound and supine chest X-ray use in modern adult intensive care: mapping 30 years of advancement (1993–2023). Ultrasound J 2024; 16: 1–12.

21.

Méndez Hernández

Ramasco Rueda

. Biomarkers as prognostic predictors and therapeutic guide in critically ill patients: clinical evidence. J Pers Med 2023; 13: 33.

22.

Villar

Jerónimo

, et al. Predicting ICU mortality in acute respiratory distress syndrome patients using machine learning: the predicting outcome and STratifiCation of severity in ARDS (POSTCARDS) study. Crit Care Med 2023; 51: 1638–1649.

23.

Pellegrini

Green-Saxena

, et al. Supervised machine learning for the early prediction of acute respiratory distress syndrome (ARDS). J Crit Care 2020; 60: 96–102.

24.

Marcílio

Eler

DM.

From explanations to feature selection: assessing SHAP values as feature selection mechanism. In: Proc 33rd SIBGRAPI Conference on Graphics, Patterns and Images; 2020. doi: https://doi.org/10.1109/SIBGRAPI51738.2020.00053.

25.

Rasheed

Qayyum

Ghaly

, et al. Explainable, trustworthy, and ethical machine learning for healthcare: a survey. Comput Biol Med 2022; 149: 106043.

26.

Loh

Ooi

Seoni

, et al. Application of explainable artificial intelligence for healthcare: a systematic review of the last decade (2011–2022). Comput Methods Programs Biomed 2022; 226: 107161.

27.

Ali

Akhlaq

Imran

, et al. The enlightening role of explainable artificial intelligence in medical & healthcare domains: a systematic literature review. Comput Biol Med 2023; 166: 107555.

28.

Kalanuria

Zai

Mirski

. Ventilator-associated pneumonia in the ICU. Crit Care 2014; 18: 08.

29.

Modi

Kovacs

. Hospital-acquired and ventilator-associated pneumonia: diagnosis, management, and prevention. Cleve Clin J Med 2020; 87: 633–639.

30.

Chawla

Bowyer

Hall

, et al. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 2002; 16: 321–357.

31.

Pradipta

Wardoyo

Musdholifah

, et al. SMOTE for handling imbalanced data problem. In: Proc Sixth International Conference on Informatics and Computing (ICIC), Jakarta, Indonesia, 2021. doi: https://doi.org/10.1109/ICIC54025.2021.9632912.

32.

Wei

Dunbrack

Jr . The role of balanced training and testing data sets for binary classifiers in bioinformatics. PLoS One 2013; 8:e67863..

33.

Santos

. Bayesian optimization for hyperparameter tuning. J Bioinform Artif Intell 2022; 2. 1–13. Available from https://biotechjournal.org/index.php/jbai/article/view/13/12.

34.

Nguyen

Bayesian optimization for accelerating hyper-parameter tuning. In: Proc IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE); 2019. doi: https://doi.org/10.1109/AIKE.2019.00060.

35.

Abdulaal

Patel

Charani

, et al. Comparison of deep learning with regression analysis in creating predictive models for SARS-CoV-2 outcomes. BMC Med Inform Decis Mak 2020; 20: 99.

36.

Sekeroglu

Ever

Dimililer

, et al. Comparative evaluation and comprehensive analysis of machine learning models for regression problems. Data Intell 2022; 4: 620–652.

37.

Khammas

Dawood

. Timing of tracheostomy in intensive care unit patients. Int Arch Otorhinolaryngol 2018; 22: 437–442.

38.

Marti

Hall

Hamilton

, et al. One-year resource utilisation, costs and quality of life in patients with acute respiratory distress syndrome (ARDS): secondary analysis of a randomised controlled trial. J Intens Care 2016; 4: 56.

39.

López

Fernández

García

, et al. An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 2013; 250: 113–141.

40.

Pes

Lai

. Cost-sensitive learning strategies for high-dimensional and imbalanced data: a comparative study. PeerJ Comput Sci 2021; 7: e832.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.05 MB