Abstract
Keywords
Introduction
Predicting hospital admissions from the emergency department (ED) is critical for optimizing hospital resource allocation, improving patient outcomes, and reducing the burden on healthcare systems.1–4 With over 145 million ED visits annually in the United States, a substantial proportion of patients require hospitalization, making timely and accurate predictions a key priority for hospital management and clinical care teams.5,6 Traditionally, hospital admission decisions have been guided by clinicians’ assessments based on a combination of patient demographics, vital signs, clinical history, and presenting complaints.3,4,7 However, these methods are subjective and can vary widely between clinicians and institutions. Recent advances in machine learning offer the potential to improve admission predictions by integrating complex patterns from both structured clinical data and unstructured text data.8–10
Structured data, such as patient demographics, vital signs, and medical history, have long been used to support clinical decision-making.3,4 However, these data provide only a partial picture of a patient's condition. Unstructured data, including free-text chief complaints and clinical narratives, offer additional context that may not be captured by traditional structured data alone.11–14 The challenge lies in effectively combining these disparate data types to produce reliable and actionable predictions.11,12
In recent years, machine learning models such as gradient boosting classifiers (GBC) have demonstrated strong performance in predictive analytics using structured data.15–17 These models iteratively build decision trees, refining predictions with each iteration to capture complex relationships between variables. However, unstructured data, which often hold valuable clinical information, have been more challenging to integrate into machine learning workflows. Advances in natural language processing (NLP) techniques, particularly transformer-based models, such as the Generative Pre-trained Transformer 2 (GPT-2),18,19 offer a promising approach for processing and extracting insights from unstructured text.
The objective of this study was to develop and evaluate models for predicting hospital admissions from the ED by leveraging both structured and unstructured data. Specifically, we aimed to compare the performance of models using structured data alone, unstructured data alone, and a combined approach that integrates insights from both data types. We hypothesized that a model incorporating both structured clinical information and unstructured free-text descriptions would outperform models that rely on only one data type. By utilizing a combined machine learning approach, this study seeks to enhance hospital admission predictions, ultimately contributing to better decision-making in ED settings.
Methods
Data source and study population
This study is a secondary analysis of data collected as part of the 2021 National Hospital Ambulatory Medical Care Survey—Emergency Department (NHAMCS-ED). 20 NHAMCS-ED is a nationwide survey conducted by the U.S. Centers for Disease Control and Prevention (CDC) to provide insights into healthcare utilization and delivery patterns across emergency departments in the United States. The dataset includes detailed records of emergency department visits from a representative sample of hospitals during the calendar year 2021. Our analysis focused exclusively on adult patients aged 18 years and older. After excluding pediatric cases, the final study sample consisted of 13,115 adult patients. The NHAMCS-ED dataset contained both structured and unstructured data, which were used to predict hospital admissions.
Data collection and variables
Structured data included variables related to patient demographics, visit characteristics, and clinical information. Demographic information encompassed age, sex, and race/ethnicity, while visit characteristics included factors such as arrival time, mode of arrival (e.g., ambulance or private transport), and whether the visit was a follow-up or occurred within 72 hours. Clinical data included vital signs like temperature, heart rate, systolic and diastolic blood pressure, respiratory rate, and pulse oximetry, along with reported pain levels. We also incorporated the Emergency Severity Index (ESI), a five-level triage tool that helps prioritize patient care based on the severity of their condition. In addition to this, the structured data captured patients’ medical history, which included chronic conditions such as Alzheimer's disease, chronic obstructive pulmonary disease, diabetes, and coronary artery disease. Other factors included the patient's type of residence (private home, nursing home, homeless, or other) and insurance type. Information about injuries, trauma, poisoning, and adverse medical treatment effects was also considered. Missing values in the structured data were handled using median imputation to reduce bias.
Unstructured data used in this study were derived from free-text fields in the 2021 NHAMCS-ED dataset, which capture chief complaints and reasons for injury. These fields were documented by healthcare providers or staff during patient encounters in emergency departments and reflect provider-documented information based on patient-reported concerns. The NHAMCS documentation does not specify the precise methods used for recording this data (e.g., verbal reporting transcribed by staff or provider interpretation). As such, these entries should be regarded as semi-structured data reflecting provider-modified patient-reported information. This distinction is important for understanding the context and variability of the unstructured data used in our predictive modeling.
Statistical analysis
Descriptive statistics were used to summarize the structured variables of the study population, stratified by hospital admission status. Categorical variables were presented as frequencies and percentages, and comparisons between admitted and non-admitted patients were conducted using chi-square tests. To further explore the association between structured variables and hospital admission, multivariable logistic regression models were developed to estimate odds ratios (ORs) for each variable. To address missing values in the structured data, median imputation was applied prior to conducting multivariable analyses to minimize potential biases. Statistical significance was assessed for all analyses, with a threshold of
Predictive models
Three predictive modeling approaches were implemented: one for structured data and one for unstructured data, with a combined model integrating both. The first approach used a Gradient Boosting Classifier (GBC) for structured data. GBC is an ensemble learning technique that builds decision trees sequentially, where each subsequent tree attempts to correct the errors made by the previous trees. This method aggregates the output of weak learners (i.e., decision trees) to form a strong predictive model, effectively capturing complex interactions between variables.
For unstructured data, we employed a Generative Pre-trained Transformer 2 (GPT-2) model, a state-of-the-art deep learning model for natural language processing tasks. GPT-2 was fine-tuned to predict hospital admissions based on the free-text descriptions of patient complaints and injuries. The model was pre-trained on large corpora of general text data and then adapted to this specific task by retraining its final layers on the NHAMCS-ED dataset. This allowed GPT-2 to recognize important patterns in the free-text data that could predict hospital admission likelihood.
For the combined model, we integrated structured clinical data and unstructured text data into a unified feature matrix. Structured data, comprising demographic, clinical, and visit-related variables, were preprocessed by applying median imputation to handle missing values. Unstructured text data, including chief complaints and reasons for injury, were tokenized using the GPT-2 tokenizer, with padding and truncation applied to a maximum sequence length of 128 tokens. Contextualized embeddings for the text data were generated using a pre-trained GPT-2 model, with the hidden state vector of the first token from the model's last layer extracted as the embedding. The structured data and GPT-2-generated embeddings were concatenated into a single feature matrix, enabling simultaneous utilization of insights from both data types. A Gradient Boosting Classifier was trained on this combined feature set to predict hospital admissions.
To enhance the performance of the GBC, we optimized hyperparameters such as learning rate, the number of estimators, and tree depth. We employed a grid search to systematically optimize the hyperparameters of the Gradient Boosting Classifier. The hyperparameters evaluated included the learning rate, with values of 0.01, 0.05, 0.1, and 0.2; the number of estimators, tested at 50, 100, 150, and 200; and the maximum tree depth, examined at 3, 5, 7, and 10. The grid search was conducted within a 3-fold cross-validation framework for each fold of the outer 5-fold cross-validation. The evaluation metric used for both inner and outer folds was the area under the receiver operating characteristic curve (ROC). This setup ensured a robust selection of the best hyperparameters for each fold while minimizing overfitting and bias. We have included a table summarizing the tested hyperparameters and their ranges in Table S1 for reference.
Model training and evaluation
The models were trained and evaluated using 5-fold cross-validation, which splits the dataset into five equal parts. In each fold, one part was used as the testing set while the remaining four parts were used for training. This process was repeated five times, and the performance results were averaged across all folds. The cross-validation method was chosen to ensure robust evaluation and to minimize overfitting.
We assessed model performance using multiple metrics, including accuracy, precision, sensitivity (recall), and specificity. Accuracy measured the proportion of correctly predicted hospital admissions, while precision focused on the proportion of true positive predictions among all positive predictions. Sensitivity measured the model's ability to correctly identify patients who were admitted, and specificity assessed the model's ability to correctly identify patients who were not admitted. Additionally, we determined the optimal decision threshold for each model using the ROC curve,21,22 which allowed us to balance sensitivity and specificity to maximize predictive performance.
Ethical considerations
This study utilized publicly available data from the National Hospital Ambulatory Medical Care Survey—Emergency Department, conducted by the U.S. Centers for Disease Control and Prevention. The dataset is fully anonymized and does not contain any personally identifiable information. According to the CDC's guidelines for data use, specific permissions are not required for secondary analyses of these data. Additionally, this study was reviewed and approved as IRB exempt by the University of Pittsburgh Institutional Review Board under protocol STUDY24120115. As such, this study complies with ethical standards for research involving publicly available, de-identified datasets. All data processing and analyses were conducted in accordance with ethical guidelines for secondary data analysis, ensuring compliance with privacy and confidentiality standards.
Results
A total of 13,115 adult patients were included in the study, of which 2264 (17.3%) were admitted to the hospital and 10,851 (82.7%) were discharged from the emergency department (ED). The demographic and clinical characteristics of these patients are presented in Table 1. The gender distribution showed that 51.5% of admitted patients were female, compared to 54.5% among non-admitted patients (
Demographic and clinical characteristics of emergency department patients categorized by hospital admission status.
Note: The variables “Respiratory Rate,” “Temperature,” “Pulse Oximetry,” “Heart Rate,” “Payment Type,” “Seen Within Last 72 Hours,” and “Episode of Care” have missing data proportions ranging between 5% and 10%. The variables “Arrival Time,” “Patient Residence,” “Arrival by Ambulance,” “Systolic Blood Pressure,” “Diastolic Blood Pressure,” and “Visit Related to Injury/Trauma, Overdose/Poisoning, or Adverse Effect of Medical/Surgical Treatment” have missing data proportions of less than 5%.
Figure 1(a) and Figure 1(b) present forest plots of odds ratios (ORs) with 95% confidence intervals for various factors associated with hospital admission. The forest plots indicate that age, medical history, and arrival by ambulance were strong predictors of hospital admission. Patients aged 40–65 (OR = 1.23, 95% CI: 1.07–1.42,

(a) Forest plot of odds ratios with 95% CI (log scale). (b) Forest plot of odds ratios with 95% CI (log scale).
The performance of the three predictive models—structured data only, unstructured data only, and combined structured and unstructured data—are shown in Figure 2. The combined model outperformed both the structured and unstructured models in predicting hospital admission. The combined model had the highest accuracy (75.8%), precision (39.5%), and sensitivity (75.8%). In comparison, the structured data model achieved an accuracy of 73.8%, a precision of 36.6%, and a sensitivity of 70.8%, while the unstructured data model had an accuracy of 64.6%, a precision of 27.7%, and a sensitivity of 65.1%. ROC analysis showed that the combined model had a superior area under the curve (AUC), indicating a better overall performance in distinguishing between admitted and non-admitted patients. The structured data model and the unstructured data model performed similarly in terms of specificity, with values of 74.4% and 64.6%, respectively, compared to 75.8% for the combined model.

Mean ROC curve for the three classification models predicting emergency department hospitalization admission.
Discussion
In this study, we developed and evaluated machine learning models to predict hospital admissions from the ED using both structured and unstructured data. Our findings demonstrate that integrating structured clinical data with unstructured text data significantly improves prediction accuracy compared to models relying on either data type alone. The combined model, which utilized a gradient boosting classifier for structured data and a fine-tuned GPT-2 model for unstructured data, showed superior performance with an accuracy of 75.8%, precision of 39.5%, and sensitivity of 75.8%. These results highlight the potential of machine learning models in enhancing decision-making and optimizing resource allocation in ED settings.
Several factors were found to be significant predictors of hospital admission. Age, medical history, and arrival by ambulance were strongly associated with higher odds of admission, consistent with previous studies. Patients aged 65 years and older, for example, were much more likely to be admitted, as were those with chronic conditions such as chronic kidney disease, coronary artery disease, and diabetes mellitus type II. Interestingly, our model also identified race/ethnicity and insurance type as important predictors. Black patients and those covered by Medicaid/CHIP had lower odds of admission compared to White patients and those with private insurance, raising important questions about potential disparities in healthcare access and decision-making. Vital signs, such as heart rate, blood pressure, and oxygen saturation, were also significant in predicting hospital admissions. Patients with abnormal values, particularly those with heart rates exceeding 90 beats per minute or systolic blood pressure below 80 mm Hg, were more likely to be admitted. These clinical markers are routinely used in triage and assessment, reaffirming their importance in early prediction models. The incorporation of these variables into the GBC model likely contributed to its strong performance in distinguishing between admitted and non-admitted patients. The addition of unstructured data, such as chief complaints and reasons for injury, added a valuable layer of information to the prediction models. While structured data provides objective, quantitative measures, unstructured text captures the nuances of patient symptoms and physician assessments that may not be fully reflected in vital signs or medical history. The GPT-2 model, which processed this unstructured data, was able to extract meaningful patterns from free-text descriptions, contributing to the overall performance of the combined model.
Our results align with previous research showing the utility of structured data in predicting hospital admissions.10,23,24 However, the integration of unstructured data, particularly using advanced NLP techniques like GPT-2, represents a novel contribution to the field. Previous studies have demonstrated the potential of NLP in extracting clinical insights from text data, but few have integrated structured and unstructured data to predict hospital admissions. Our study extends this body of work by showing that models incorporating both types of data can outperform traditional structured-data models.
A related study by Lequertier et al. 25 focused on predicting length of stay in acute and emergency care settings using a deep neural network. Their work highlights the effectiveness of embedding-based representations for structured administrative data in predictive modeling tasks. While their approach used deep learning to achieve high performance on a multiclass classification problem, our study builds on this idea by integrating GPT-2 embeddings from unstructured text data with structured clinical data to predict hospital admissions as a binary classification problem. This distinction underscores complementary approaches to leveraging advanced feature representations for different healthcare applications. Furthermore, both studies emphasize the potential for machine learning to improve resource management in emergency care settings, while also pointing to the need for robust generalizability across different datasets and healthcare systems.
The findings from this study have several implications for clinical practice. First, integrating structured and unstructured data into predictive models can enhance the accuracy of hospital admission predictions, helping ED clinicians make more informed decisions. By improving the early identification of patients requiring hospitalization, hospitals can better allocate resources, reduce wait times, and optimize bed management, ultimately improving patient outcomes. Second, our findings highlight the need for healthcare systems to invest in infrastructure that supports the use of advanced machine learning and NLP techniques. Many EDs already collect large volumes of structured and unstructured data, but few fully leverage this information for predictive analytics. Implementing systems that can process both data types in real time could significantly enhance decision-making capabilities in the ED. Finally, the identification of potential disparities in admission likelihood, particularly among different racial/ethnic groups and insurance types, raises important ethical and policy considerations. While machine learning models can help standardize decision-making, care must be taken to ensure that these models do not perpetuate existing biases in the healthcare system. Future research should focus on addressing these disparities and developing models that promote equitable healthcare access.
This study has several limitations that should be considered when interpreting the results. First, the NHAMCS-ED dataset, while comprehensive and nationally representative, is cross-sectional in design. This limits the ability to evaluate temporal trends in hospital admission patterns or outcomes following ED visits. Future studies employing longitudinal data could provide deeper insights into the dynamics of admission processes over time. Second, the unstructured data used in this study, such as chief complaints and reasons for injury, were provider-documented rather than directly reported by patients. This documentation often involves varying levels of interpretation, refinement, and transcription, which may differ across institutions and providers. Such variability introduces a potential source of inconsistency, impacting the reliability and generalizability of the results. Additionally, the NHAMCS-ED dataset does not provide metadata on how these free-text fields were recorded, limiting the ability to fully characterize and standardize the nature of these entries. Third, while the GPT-2 model was fine-tuned on ED-specific data, its generalizability to other clinical environments or populations has not been established. Differences in healthcare systems, documentation practices, or patient demographics in other settings may necessitate additional model validation and refinement to ensure applicability. Fourth, the study lacked access to detailed clinical notes and physician assessments, which are not included in the NHAMCS-ED dataset. While unstructured text data were incorporated into the analysis, more granular documentation of provider decision-making processes could further enhance model performance. Future research should explore integrating comprehensive clinical documentation alongside structured data to create more robust predictive models. Fifth, although the combined model demonstrated improved predictive performance, the sensitivity (75.8%) indicates room for further enhancement. Incorporating additional variables, such as social determinants of health or longitudinal follow-up data, could potentially improve model sensitivity and overall accuracy in predicting hospital admissions. Finally, the cross-sectional nature of the NHAMCS-ED dataset, combined with the semi-structured nature of the unstructured data, highlights the need for caution in generalizing these findings. Future efforts should focus on addressing these limitations to refine predictive models and better support decision-making in EDs.
Conclusion
In conclusion, this study demonstrates that combining structured and unstructured data using machine learning models can significantly improve hospital admission predictions from the ED. The integration of GBC for structured data and GPT-2 for unstructured text data provides a robust approach to leveraging the full range of clinical information available during ED visits. These findings highlight the potential of machine learning to transform decision-making in the ED, ultimately contributing to better patient outcomes and more efficient hospital operations.
Supplemental Material
sj-docx-1-dhj-10.1177_20552076251331319 - Supplemental material for Machine learning-driven prediction of hospital admissions using gradient boosting and GPT-2
Supplemental material, sj-docx-1-dhj-10.1177_20552076251331319 for Machine learning-driven prediction of hospital admissions using gradient boosting and GPT-2 by Xingyu Zhang, Hairong Wang, Guan Yu and Wenbin Zhang in DIGITAL HEALTH
Footnotes
Acknowledgments
Human ethics and consent to participate
Ethical considerations
Consent to participate
Author contributions/CRediT
Funding
Conflicting interests
Data availability
Supplemental material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
