Sage Journals: Discover world-class research

Abstract

Objectives

This systematic review aims to evaluate studies that implemented and evaluated machine learning models in emergency department settings, focusing on their clinical and operational impact.

Methods

A comprehensive search was conducted across multiple databases from inception to January 2024. Studies were eligible if they assessed the implementation of machine learning models in emergency departments, with a particular focus on clinical and operational impact.

Results

A total of 84 studies met the inclusion criteria. Gradient boosting and neural networks were the most frequently used models. Mortality prediction models achieved AUC values ranging from 0.618 to 0.978, with key predictors including age, sex, race, vital signs, and comorbidities. Disposition prediction models showed AUC values of 0.675–0.96, often incorporating age, sex, vital signs, triage data, and past medical history. Length of stay prediction studies identified demographic data, triage level, chief complaints, and comorbidities as significant predictors, with gradient boosting models yielding the highest predictive accuracy. Machine learning-based treatment decision models showed promise in sepsis detection and cardiovascular triage. Wait time prediction models using gradient boosting decreased patient wait times by 18%–26%. Emergency department cost prediction studies were limited, with logistic regression models achieving AUCs of 0.71–0.76 for identifying high-cost patients.

Conclusion

Machine learning is widely used in emergency department research, but issues with generalizability and workflow integration limit its clinical use. Future work should improve data quality, representation, and ongoing model validation to enhance real-world utility.

Keywords

Emergency medicine machine learning artificial intelligence emergency department machine learning models

Introduction

Emergency departments (EDs) have seen growing strains over the past few years stemming from an eroding primary care system and human resources crisis emerging from the COVID-19 pandemic. This pressure amplified existing vulnerabilities in the emergency care system, resulting in overcrowding, prolonged patient stays, and extended wait times to see an ED provider.^1,2 EDs struggle with record-setting wait times and closures,³ highlighting the ongoing need for innovative strategies to improve healthcare delivery and patient management in these critical settings. The ED community has been a pioneer in developing and adopting data-driven clinical decision tools such as the Canadian CT Head Rule⁴and Quick Sequential Organ Function Assessment score,⁵ to support decision-making in high-pressure environments. However, these static and rudimentary tools are fixed in time and validity and unwieldy to use.⁶

Artificial intelligence (AI), specifically machine learning (ML), represents a potential paradigm shift in clinical prediction models.⁷ Unlike traditional statistical models that rely on linear and logistic regression algorithms using statistical datasets from the past, ML models leverage flexible, nonparametric algorithms, capable of capturing complex patterns and interactions between variables, potentially enhancing predictive accuracy.⁸ In ED settings, ML has shown promise in areas such as diagnostic accuracy, patient triage, and clinical decision-making.⁹ However, while previous reviews have highlighted the potential of ML models to improve diagnostic accuracy and patient care in EDs,^10,11 many questions remain unanswered. For example, to what extent do these models predict clinical outcomes such as mortality, length of stay, and disposition? Can they reduce wait times, optimize treatment decisions, and lower ED-associated costs?

This systematic review aims to summarize the evidence on ML implementation in EDs, with a particular focus on clinical and operational impacts.

Methods

Search strategy

We registered this systematic review with PROSPERO (registration number: CRD42024515933). An experienced information specialist (BS) developed and tested the search strategies through an iterative process in consultation with the review team. The MEDLINE strategy was peer reviewed by another senior information specialist prior to execution using the PRESS checklist.¹² Using the multi-file and de-duplication tool available on the Ovid platform, we searched Ovid MEDLINE^® ALL and Embase Classic + Embase. We also searched Cochrane Central Register of Controlled Trials (CENTRAL) (Wiley), CINAHL (EBSCOhost), and IEEE Xplore. The databases were searched from inception to January 9, 2024.

The strategies utilized a combination of controlled vocabulary (e.g., “Emergency Service, Hospital,” “Artificial Intelligence,” “Data Mining”) and keywords (e.g., “emergency department,” “AI,” “deep learning”). Vocabulary and syntax were adjusted across the databases. There were no language or date restrictions, but where possible, animal-only records, opinion pieces, and other irrelevant publication types (e.g., conference abstracts and preprints) were removed. A summary of the search strategies as run can be found in Supplementary File 1 and Supplementary File 4. Records were downloaded and de-duplicated using EndNote version 9.3.3 (Clarivate Analytics) and uploaded to Covidence (Veritas Health Innovation, Melbourne, Australia. Available at www.covidence.org) for efficient data management, extraction, and synthesis.

Study selection

Eligible studies were those that implemented, or prospectively or retrospectively evaluated, the performance of ML models in emergency department settings to predict clinical outcomes or operational outcomes. Studies limited to model development or focused only on disease-specific prediction tasks without clinical or operational evaluation were excluded. The following designs were also excluded: animal models, in vitro studies, systematic reviews, narrative reviews, opinion papers, case studies, and conference papers. Study participants were humans of all ages, genders, or ethnicities who presented themselves to the emergency department for any reason. The primary outcome of interest was clinical outcomes of ML models (how they can assist in predicting mortality, treatment decisions, and disposition). Secondary outcomes included operational efficiencies of ML models (the ability of ML models to predict patient-wait times, length of stay, and to reduce ED associated cost), and any reported limitations related to ML models’ implementation. The main goal was to examine and represent the structure and outcomes of the reviewed studies rather than to systematize the full scope of each study.

Covidence was used throughout the review to manage citations. We engaged and trained several individuals to assist with screening citations (AP, DP, ESA, KS, NM, NS, ZP). During both abstract and title screening and full-text screening, the reviewers used the eligibility criteria to evaluate and determine the inclusion or exclusion of studies, which were then reported in Covidence. First-level screening consisted of title and abstract screening of all uploaded studies. Each citation was reviewed by two people independently to select studies for full-text review (AP, NM, ZP, KS). If the eligibility criteria were met completely, as assessed by both reviewers, the studies were included. If studies did not meet eligibility criteria, as determined by both reviewers, they were excluded. Any disagreement was resolved by consensus or a third reviewer (AP). Second-level screening involved a thorough assessment of the full text of all the studies that passed the initial screening based on their title and abstracts, performed by two independent reviewers (AP, NM, ZP, KS), who excluded any studies that did not meet the same eligibility criteria in the primary step and were therefore considered ineligible.

Data extraction and assessment

Members of the study team assisted with data extraction (AP, KM, NM). To extract data from the included studies, an extraction form was uploaded onto Covidence, which was developed using the Cochrane guidelines.¹³ Pilot testing with the form was completed on five randomly selected studies by two reviewers (AP, KM). The data extraction was checked for consensus by one member of the study team (AP). Data was collected on the meta-data (study title, author name, year of publication) study design, study population (country, age groups, demographics), ML application type, purpose of ML application, research questions, data source, outcomes (training/testing before implementation, training/testing after implementation), sample size per outcome, study limitations, use of clinical applications, and conclusions and future findings. If information was not available from an article, it was noted. Eligible studies were categorized based on similar outcomes and presented in tabular format using data obtained from the extraction form (Supplementary File 2).

Risk of bias and quality assessment

Eligible studies were assessed independently for their risk of bias by two reviewers (KS, ZP) (Supplementary File 3). Methodological quality of the studies was determined, and the risk of bias was evaluated using the Prediction model Risk Of Bias Assessment Tool (PROBAST).¹⁴ This tool comprises questions tailored to identify potential biases in four domains (participant selection, predictors, outcome, and analysis) as well as an overall study risk of bias. When assessing, each question was answered as yes, probably yes, probably no, no, or no information, with yes indicating a low risk of bias and no indicating a high risk of bias. Each study was rated as low risk of bias, unclear risk of bias, or high risk of bias. Two independent reviewers assessed the risk of bias for each domain and overall bias within each included study. Any discordance on methodological quality was resolved by consensus or input of the third reviewer (AP). Authors also used the PRISMA 2020 checklist to evaluate the reporting outcomes of the review (Supplementary File 5).

Results

Study characteristics

Table 1 presents key data extracted from included studies Figure 1. The included studies were published between 2004 and 2024. Specifically, 73 (87%) of the studies were published within the last five years, with only 11 studies (13%) published prior to 2019. Geographical setting varied across the studies, with 30 studies (35.7%) from the USA,^15–44 12 (14.3%) from South Korea,^45–56 nine (10.7%) from Taiwan,^57–65 seven (8.3%) from Hong Kong and China,^66–72 six (7.1%) from Italy,^73–78 three (3.6%) from Israel,^79–81 two (2.4%) from Canada,^82,83 two (2.4%) from Singapore,^84,85 two (2.4%) from France,^86,87 two (2.4%) from Australia,^88,89 and one each (1.1%) for a total of nine from Portugal,⁹⁰ Netherlands,⁹¹ Switzerland,^92,93 Saudi Arabia,⁹³ Iran,⁹⁴ Greece,⁹⁵ United Kingdom,⁹⁶ Germany,⁹⁷ and Turkey.⁹⁸ Sample sizes in these studies ranged from 80 to 4,645,483 patients, indicating a wide variation in the population sizes examined. In terms of study design, retrospective cohort studies were predominant (n = 68, 80.95%) with a smaller proportion being prospective cohorts (n = 14, 16.67%), a single cross-sectional trial (n = 1, 1.19%), and a single longitudinal and simulation study (n = 1, 1.19%). A more detailed summary of the data extracted can be found in Supplementary File 2.

Figure 1.

Selection process of eligible studies from all identified citations (PRISMA flow diagram).

Table 1.

Summary of data extracted from the included studies

Study (last name, year)	Age	Best ML application/model used	Condition	Potential predictors, variables of interest	Outcome(s)	Metrics	Limitations
Goto, 2019	18 and under	Gradient-boosted decision tree (XBoost DT)	General	Demographic data (patient age, sex, mode of arrival, patient's address), Vital signs (temperature, pulse rate, systolic and diastolic blood pressure, respiratory rate, and oxygen saturation), Healthcare usage (visit reasons, ED visit in the preceding 72 h, mode of arrival (walk-in vs ambulance)), Chronic conditions	Critical care (admission to an intensive care unit and/or in-hospital death), hospitalization (direct hospital admission or transfer)	AUROC-Critical care: XBoost DT (0.84 (0.79–0.92)), Hospitalization: XBoost DT (0.80 (0.78–0.81), Sensitivity-Critical care: XBoost DT (0.78 (0.63–0.90))Hospitalization: XBoost DT (0.80 (0.78–0.81),Specificity -critical care: XBoost DT (0.78 (0.63–0.90),Hospitalization: XBoost DT (0.80 (0.78–0.81)	Selection bias, accuracy of data, and generalizability
Tan, 2021	65 and older	Random forest (RF)	Influenza	Demographic data (age, sex, bedridden, nursing home resident)Vital signs (laboratory biomarkers: white blood cell count (WBC), hemoglobin, bandemia, platelet, serum creatinine)Comorbidities (hypertension, malignancy, anemia, diabetes, COPD, congestive heart failure, dementia)	Hospitalizations Complications with pneumonia Complications with sepsis or septic shock Admitted to the intensive care unit (ICU) In-hospital mortality	AUROC-Hospitalizations: (0.840)Complications with pneumonia: (0.765)Complications with sepsis or septic shock: (0.857)Admitted to the intensive care unit (ICU): (0.885)In-hospital mortality: (0.875)	Interpretability and inferences about variables did not compare the accuracy between the model and physicians’ judgment; variable selection was not conducted in this study, issues with generalizability
Tu, 2022	18 and older	Logistic regression (LR)	Traumatic Brain Injury	Demographic data (age, sex)Vital signs (BMI, heart rate, body temperature, respiratory rate, pupil size, pupil reflex)	Mortality	AUROC- (0.925)	Retrospective, generalizability, and other features of TBI (coagulopathy, brain CT scan, etc.) were not included
Yu, 2022	20 and older	Score for emergency risk prediction (SERP)	General	Demographic data (age, sex)Vital signs (pulse, respiration, oxygen saturation, blood pressure)Co-morbidities (congestive heart failure, stroke, peptic ulcer disease, dementia, diabetes, cancer, liver disease)Healthcare use (in the past year: ED admissions, operations, and/or ICU admissions)	Mortality (30 days) Mortality (in-hospital after ED visit)	AUROC-30 Day0.795 (95% CI 0.789–0.801)In-hospital: 0.813 (95% CI 0.809–0.817)	Generalizability: Data may not represent all Koreans, as the dataset did not include every Korean
Walsh, 2004	Infants and toddlers	Artificial Neural Networks	Bronchiolitis	Demographic data (age)Patient and family medical history(maternal smoking, tachycardia on entry, increased work of breathing, poor feeding)Vital signs (oxygen saturation, respiratory rate, temperature, heart rate)	Need for hospitalization Length of stay (predicting the LoS—can the model determine how long a participant will remain in the hospital within one day)	Sensitivity: 78% Specificity: 82%
Hunter-Zinck, 2019	16 and older	[Binary Relevance] Multilayer Perceptron (MLP)	General	Clinical biomarkers (phosphate, calcium, urinalysis,Vital signs (respiration, pulse)Medical history (shortness of breath, circulatory system problems, kidney and urinary tract problems, respiratory system problems)	Length of stay (examine effects of ML on reducing LoS)Cost (ordering costs)	AUROC: 0.70	Generalizability, only analyzed common orders
Chamberlin, 2022	Over 18 years old	Deep Learning	COVID-19	Demographic data (sex, age, ethnicity, body max index (BMI))Medical history (history of hypertension, diabetes, COPD, imaging data, PT-PCR date, image impression, exam codes)	Mortality	AUROC: 0.829 (CI 0.782–0.876)Sensitivity-≥40%: 92 (0.735–0.842)>10%: 0.898 (0.852–0.934)>80%: 0.593 (0.528–0.656)Specificity-≥40%: 0.850 (0.791–0.891) 10%: 46 (0.578–0.709)>80%: 0.968 (0.936–0.987)	Generalizability, lacks external testing cohort
Yoon, 2023	All (no restrictions)	Deep Learning	Intracranial Hemorrhage	Clinical biomarkers (craniotomy, craniectomy, cranioplasty, ventricular catheter placement or removal, biopsy, interventional neuroradiology)Demographic data rates of admission	Hospital admission Neurosurgical intervention Mortality (30 days)	Positive predictive value for intracranial abnormalities (IC): 0.91 (CI: 0.84–0.96)Negative predictive value for IC: 0.94 (0.91–0.96)	Algorithm was trained only on scans where hemorrhage was present or absent
Kim, 2023	All (no restrictions)	Deep Learning	Community-Acquired Pneumonia	Vital signs (respiratory rate, blood pressure)Clinical biomarkers (blood urea nitrogen levels)Demographic data (age) Comorbidities (confusion)	Mortality (30-day)	Sensitivity-Before/CURB-65: 68% (CI 61–76)After/DL model: 68% (61–76)Specificity-Before/CURB-65: 59% (CI 57–62)After/DL model: 84% (81–86)	Only one outcome measured, retrospective
McCoy, 2023	18 and over	ML algodiagnostic (MLA): The sepsis prediction algorithm implemented was a ML program that provided risk scores that indicated the likelihood of sepsis onset	Severe sepsis or septic shock	Demographic data (age, sex)Comorbidities (sepsis, cardiovascular, renal, liver, mental health disorder)Length of hospital stay (days)	Mortality (sepsis-related in-hospital mortality rate)Length of stay (average sepsis-related hospital length of stay- does the implementation of the model reduce the number of days)Readmission (sepsis-related 30-day readmission rate)	AUROC 0.96 (CI 0.95–0.97)Sensitivity-0.90 (CI 0.88–0.91) Specificity-0.85 (CI 0.82–0.87)	Generalizability, based on a small hospital in New Jersey
Chrusciel, 2021	All	Random forest	General	Demographic data (sex, age, zip/postal code)Medical history (LoS in EDs, recent visit flag, assigned ward after stay)	Length of stay (ML models predicting LoS)	Specificity-structured: 70.4% Unstructured: 72.7%	High dimensionality of data, single-to-noise ratio within extracted semantic concepts
Fernandes, 2020	18 and over	XGBoost	Cardiopulmonary Arrest	Demographic data (sex, age)Arrival mode (walk-in, ambulance, other)Disability (wheelchair, stretcher)Vital signs (heart rate, blood pressure, respiratory rate, temperature, glycemia)Priority (emergent or very urgent or urgent)	Mortality	AUROC: 0.96 (95% CI 0.95–0.97)Sensitivity: 0.84 (CI 95% 0.80–0.88)Specificity: 0.94 (CI 95% 0.93–0.94)
Falavigna, 2018	All	Neural networks	Severe outcomes after a syncope	Demographic data (sex, age)Medical history and comorbidities (cardiovascular disease, cerebrovascular disease, hypertension)Syncope (during excretion, trauma following, absence of symptoms, history of, severe short-term outcomes)	Risk stratification (of severe short-term outcomes such as death, need for major medical therapeutic procedures, and early within 10-day re-admission to the hospital)	AUROC-Before: OESIL (0.714)After: ANN (0.798)Sensitivity- Before: OESIL (76%, 95% CI 60–88%)After: ANN (93%, 95% CI 80–98%)Specificity-Before: OESIL (67%, 95% CI 62–72%)After: ANN (67%, 95% CI 61–72%)	Data compiled was compiled for differing reasons previously
Frost, 2017	All	Logistic regression	General	Medical and family history (allergies, medications, active problems list, social history, risk factors)	Cost (3 or more ED visits, being in the top 1% and 5% in healthcare expenditures, and individual-level total healthcare system use cost)	AUROC-1%: 0.7875%: 0.784Sensitivity-1%: 0.6605%: 0.713Specificity-1%: 0.7655%: 0.730	Generalizability, limited to EMR data, and only 12 months of data
Goto, 2018	18 and over	Gradient boosting, random forest	Asthma or COPD exacerbation	Demographic data (sex, age)Arrival mode (walk-in, ambulance)Vital signs (temperature, pulse rate,blood pressure, respiratory rate, and oxygen saturation)Chief complaints (dyspnea, cough, chest pain)Comorbidities	Critical care (direct admission to an intensive care unit or in-hospital death)Hospitalization (admission to an inpatient care site or direct transfer to an acute care hospital)	HospitalizationAUROC-0.83Critical care: (Gradient boosting)AUROC-0.80Sensitivity-Gradient boosting: 0.79Specificity-0.68	Selection bias, relies on survey data, hence misclassification
Guo, 2023	Under 18	Random forest	General	Demographic data (sex, age)Clinical diagnoses Medical visit patterns	Wait times (predicting; to access consultation or treatment)	Mean absolute error (SD): 23.337 (0.007)	Generalizability
Hong, 2023	18 and over	Extreme gradient boosting (XGBoost)	General	Demographic data (sex, age)Vital signs (triage)Medical history (comorbidities)Clinical biomarkers (basic metabolic panel, complete blood count)Final diagnosis	Hospital admission in-hospital mortality	AUROC-XGBoost: 0.96 (95% CI 0.94–0.97)Sensitivity- 0.89Specificity- 0.90	Did not control for cofounding from high utilizers, lack of generalizability
Chang, 2022	20 and over	XGBoost	General	Demographic data (sex, age)Vital signs (heart rate, blood pressure, respiratory rate, BMI)Healthcare use (chief complaint, bad request, transferred facility, ED visits over 3 times in a month, ambulance use)	Discharge length of stay (ML model's prediction of short (less than 4 hours) DLOS- measured by time interval between being registered in the ED triage and being discharged from the ED)	AUROC-0.761 (95% CI: 0.742- 0.765)Sensitivity-57.64% (95% CI: 57.16–58.12)Specificity-81.43% (95% CI: 80.62–82.22)	Trauma and pediatric patients not included, confounding factors, which may represent ED crowding conditions not included
Heldt, 2021	18 to 100	Random forest	COVID-19	Demographic data (age)Clinical biomarkers (creatinine, Troponin T, blood urea, red blood cell width, alanine aminotransferase)Vital signs (respiratory rate, heart rate)	Admission to ICU Need for mechanical ventilation mortality	AUROC- 0.77 (SD: 0.030)Sensitivity- 0.76 (SD: 0.102)Specificity- 0.67 (SD: 0.072)	Generalizability
Davis, 2022	18 and over	Aidoc ML algorithm (based on convolutional neural network)	Intracranial hemorrhage	Head CT images (intra-axial hemorrhage, extra-axial hemorrhage)	Reduction in report turnaround time (RTAT)Hospital and ED length of stay (ML predicting)	Sensitivity- 95% Specificity- 99%	Confounding variables
Dipaola, 2023	All (no restrictions)	TensorFlow tabular-textual model	COVID-19	Demographic data (age, sex)Vital signs (heart rate, oxygen saturation, respiratory rate, temperature)Symptoms (fever, cough, diarrhea, fatigue, shortness of breath, nausea, anosmia)Comorbidities (hypertension, diabetes, cardiovascular diseases, COPD, obesity, malignancy, asthma, cerebrovascular disease)Chest X-ray and CT scan (suggestive of COVID-19 pneumonia)	ICU admission 30-day mortality	AUROC-0.87 (SD: 0.06)Sensitivity- 0.77 (SD: 0.13)Specificity-0.82 (SD: 0.08)	Generalizability, external validation required, absence of vaccinated patients in dataset
Ke, 2022	18 and over	Gradient boosting decision tree	Acute coronary syndrome	Demographic data comorbidities physical examination dataClinical biomarkers (thrombolytic therapy, laboratory tests)	In-hospital mortality	AUROC-0.918 (95% CI 0.906–0.930)Sensitivity-0.812Specificity-0.908	Generalizability, required external validation, missing data
Jiang, 2021	14 and over	XGBoost	Suspected cardiovascular disease	Demographic data Vital signs Blood glucose	Treatment decisions (triage levels of Chinese emergency triage scale)	AUROC- 0.937	Missing variables, generalizability and external validation, individual bias
Qiao, 2022	75 and over	XGBoost to make the cancer frailty assessment tool (cFAST)	Cancer	Demographic data (sex, age)Comorbidities (lipedema, hypertension, respiratory failure, shock)Cancer and treatment characteristics (metastasis and palliative care)	In-patient mortality	AUROC- 0.92 (95% CI 0.915–0.921)	Database lacked data on various parameters
Son, 2023	All (no restrictions)	Logistic regression	General	Demographic data (age, sex)Medical history and comorbidities (medication use, mental status, allergies, hepatitis, diabetes, pulmonary tuberculosis, chest x-ray, nebulizer)Vital signs (blood pressure, oxygen saturation, pulse rate, respiratory rate, blood temperature)	Mortality	AUROC-0.9622Sensitivity-0.7368	Retrospective nature of the study, generalizability, external validation needed
VanDoorn, 2023	18 and over	LightGBM (RISK INDEX Tool)	General	Demographic data (sex, age)Clinical biomarkers (number of tests, creatinine, complete blood count, sodium, potassium, urea, platelets, glucose)	31-day mortality	AUROC-meander, amersfoort: 0.978 (95% CI, 0.973–0.982)MUMC, Maastricht: 0.944 (95% CI, 0.935–0.951)Zuyderland, Sittard: 0.877 (95% CI, 0.866–0.888)Zuyderland, Heerlen: 0.904 (95% CI, 0.894–0.914)Sensitivity-[identified as low risk by RISK INDEX] MUMC, Maastricht: 86.9 (95% CI 84.2–88.9) Meander, Amersfoort: 75.9 (95% CI 72.3–78.6) Zuyderland, Sittard: 85.5 (95% CI 83.2–87.3) Zuyderland, Heerlen: 82.7 (95% CI 80.0–85.2)[Identified as high risk by RISK INDEX] Meander, Amersfoort: 60.6 (95% CI 56.9–64.1) MUMC, Maastricht: 45.7 (95% CI 42.3–49.5) Zuyderland, Sittard: 14.7 (95% CI 12.6–16.7) Zuyderland, Heerlen: 27.1 (9% CI 24.2–29.4)Specificity-[Identified as High Risk by RISK INDEX] Zuyderland, Sittard: 99.7 (95% CI 99.6–99.8)MUMC, Maastricht: 99.0 (95% CI 98.8–99.1) Meander, Amersfoort: 99.2 (95% CI 99.0–99.3) Zuyderland, Heerlen: 99.5 (95% CI 99.4–99.7)	Retrospective study, algorithmic bias
Li, 2021	18 and over	LightGBM	General	Demographic data Clinical biomarkers Vital signs	ED mortality	AUROC-0.966	Small sample size, generalizability, variable availability
Lee, 2022	18 and over	Artificial neural network	Patients presenting with syncope	Demographic data (age, sex, race)Hospital demographics (hospital ownership, teaching status, urban-rural designation)Hospitalization-specific patient characteristics (diagnosis, procedures performed, disposition from the ED, inpatient admission)Insurance provider	Length of stay (among syncope patients, predicting times)	AUROC-[≤0 days] 0.78[≤24 h] 0.79[≤48 h] 0.81[≤4 days] 0.84[≤7 days] 0.88	Retrospective, availability of clinically relevant data
Kim, 2022	All (no restrictions)	Conventional neural network (quantitative ECG)	Myocardial infarction	Demographic data (sex, age)Chief complaints and medical history (chest pain, dyspnea, epigastric pain)Clinical biomarkers (troponin)Hospital-specific characteristics (tests for patients, QCG score, final diagnosis, time delays)	Medical cost of QCG 1-year Mortality prediction Door-to-Balloon time	AUROC-QCG score: 0.947 EPs’ initial decision: 0.710 (P < .001 compared to QCG)Joint decision by EPs and cardiologists: 0.761 (P < .001 compared to QCG)Sensitivity-QCG: 98.1% (95% CI, 94.6%-100.0%)Specificity-QCG: 76.9% (95% CI, 60.7%-93.1%)	Generalizability, external validation required
Klug, 2020	18–100	XGBoost	General	Demographic data (age, sex)Admission date and arrival mode (walk-in, by ambulance, intensive care ambulance)Vital signs (temperature, heart rate, blood pressure, oxygen saturation)Medical history (previous hospitalizations, previous ED visits, referral code, home medications, comorbidities)	Early mortality (up to 2 days from registration to the ED)Short-term mortality (2–30 days from registration to the ED)	AUROC-Early mortality cohort: 0.962 (95% CI 0.956–0.968)Short-term mortality cohort: 0.923 (95% CI 0.919–0.926)Sensitivity-[Early mortality cohort] Youden's index cut-off: 91.9 (95% CI 86.1–93.6)Fixed specificity 95% cut-off: 81.2 (95% CI 78.4–84.1)Fixed specificity 97.5% cut-off: 67.2 (95% CI 63.4–70.6)[Short-term mortality cohort] Youden's index cut-off: 90.5 (95% CI 88.9–91.8)Fixed specificity 95% cut-off: 54.8 (95% CI 52.9–56.7)Fixed specificity 97.5% cut-off: 37.0 (95% CI 37.3–41.1)Specificity-[Early mortality cohort] Youden's index cut-off: 88.2 (95% CI 87.6–93.6)Fixed specificity 95% cut-off: 95Fixed specificity 97.5% cut-off: 97.5[Short-term mortality cohort] Youden's index cut-off: 80.8 (95% CI 80.0–82.3)Fixed specificity 95% cut-off: 95 Fixed specificity 97.5% cut-off: 97.5	Generalizability, external validation required, ICU admission, or in-hospital mortality were not evaluated
Kolcun, 2022	All (no restrictions)	Neural networks	Patients whom experienced a motor vehicle collision	Demographic data (age, ethnicity, social data)	In-hospital mortality, overall hospital length of stay (predicting in days)	AUROC-0.976 (95% CI 0.974–0.978)Sensitivity-92%Specificity-90%	Generalizability, excluded outliers, hence loss of accuracy, and potentially, variability in data recording
Lee, 2023	18 and over	Logistic regression	General	Demographic data (age, sex)Admission and hospital stay (visit time and day and type, referring area and hospital type)Comorbidities (number of diagnoses, neoplasm diseases, circulatory diseases, respiratory diseases)	ED length of stay (time between admission and discharge; factors that impact)	AUROC-0.85	Generalizability
Horng, 2017	All (no restrictions)	Support vector machine	General	Demographic data (age, sex)Vital signs (temperature, heart rate, blood pressure, oxygen saturation, pain scale, respiratory rate)Chief complaint and nursing assessment	Diagnosed infection in the ED (treatment decisions)	AUROC- 0.67 (95% CI 0.65–0.69)Sensitivity-Vitals model: 0.56Chief complaint model: 0.75Bag of words model: 0.78Topic model: 0.80Specificity-Vitals model: 0.68Chief complaint model: 0.75Bag of words model: 0.79Topic model: 0.75	External validation required
Jenny, 2015	All (no restrictions	Random forests	General	Demographic data (sex, age)Clinical biomarkers (c-reactive protein, serum urea, creatinine, phosphate, albumin, potassium, calcium, sodium)Medications (ACE inhibitors, AR blockers, corticosteroids, antihypertensive drugs, number of drugs)Comorbidities (congestive heart failure, renal disease, psychiatric conditions, connective tissue disease, heart disease, tumors)Vital signs (heart rate, respiratory rate, blood pressure, BMI, venous pressure)Complaints (In-appetence, dizziness)	30-day mortality Acute morbidity Presence of an infectious disease	AUROC-Median: 0.82 (IQR: 0.77–0.85)Physicians’ intuitive judgments of how ill patients looked: 0.67Flexible discriminant analysis: 0.860.80 (IQR: 0.80–0.81)0.71 (IQR: 0.69–0.72)	Generalizability
Klang, 2020	18–100	CatBoost (gradient boosting model)	Patients in need of a non-contrast head CT exam	Demographic data (sex, age)Arrival mode (walk-in, ambulance, intensive care ambulance, other)Vital signs (heart rate, temperature, oxygen saturation, blood pressure)Comorbidities and medications	Identify need of non-contrast head CT during triage (treatment decisions)	AUROC-0.932 (95% CI 0.930–0.934)Sensitivity-88.1%Specificity-85.7%	Use of digitally stored data, generalizability
Perng, 2019	18 and over	Convolutional neural network + SoftMax	Septic patients	Demographic data (sex, age)Vital signs (heart rate, blood pressure)Comorbidities (tumor, UTI, liver cirrhosis, shock episode, intra-abdominal infection, other infection, antibiotic used within 24 h)Clinical biomarkers (hemoglobin, white blood cells, lymph, troponin I, pH, lymphocytes, promyelocyte, myelocyte, sodium, potassium, sugar, albumin)	72-hour mortality 28-day mortality	AUROC-[72-hour mortality] 0.94[28-day mortality]0.95	Confounding factors not taken into account
Aldhoayan, 2022	18 and over	Logistic regression, XGBoost	General	Visit characteristics (month, day, hour, weather conditions, mean wind speed, mean temperature, elderly, working hour, weekend)Demographic data (sex, age)	Length of stay in ED (identify non-clinical factors leading to delay in; delay threshold was 6 or more hours without being discharged or admitted)	Sensitivity-Logistic regression: 58%XGBoost: 47%Specificity- Logistic regression: 61%XGBoost: 74%	Lack of access to clinical data, guidelines on ED LoS have changed since the data were obtained (now 4 hours rather than 6 hours)
Chien, 2022	All (no restrictions)	Neural networks	Intracranial Hemorrhage	Demographic data medical history (number of prescriptions, the brand and generic name of the prescribed drugs, the date of prescriptions, dosage of medication, and diagnosis)	ED length of stay (time period from the patient registering at ED triage to the patient leaving ED; predicting in minutes)	Sensitivity- 96.43% (N: 81.65%-99.91%)Specificity- 99.52% (N: 97.36%-99.99%)	Generalizability, several unknowns need to be clarified
Kang, 2020	18 and over	Random forest	Pneumonia	Demographic data (sex, age)Comorbidities (pneumonia type, intubation, hypertension, lung disease, liver disease, diabetes, cancer, confusion)Vital signs (heart rate, body temperature, respiratory rate, oxygen saturation)	30-day mortality ICU admission from the ED	AUROC-E-CURB-RF: 0.844 (95% CI 0.843–0.845)CURB-65:0.615 (95% CI 0.614–0.616)CURB-RF:0.701 (95% CI 0.700–0.702)Sensitivity-E-CURB-RF: 0.803 (95% CI 0.803–0.803) CURB-65: 0.366 (95% CI 0.364–0.368) CURB-RF: 0.924 (95% CI 0.922–0.925)Specificity-E-CURB-RF: 0.711 (95% CI 0.709–0.714) CURB-65: 0.820 (95% CI 0.819–0.822) CURB-RF: 0.270 (95% CI 0.266–0.274)	Model not compared to previous models
Hosseini-Shokouh, 2022	All (no restrictions)	Neural networks	General	Hospital resources (Number of: physicians, nurses, beds, triage space)	Wait times (identify how to reduce) Raise the percentage of resource employment	Units’ efficiency coefficient after- Triage: 0.69Fast track: 0.84ED specialist: 0.84 Hospitalization: 0.9	Confounding factors not considered, reliability issues as ED waiting time and other indicator data were not comprehensively integrated in the hospital information system
Butler, 2022	18 and over	Light gradient boosting	COVID-19	Demographic data (sex, age, race)Comorbidities (COVID-19 infection, valvular disease, cardiac arrthmyias cardiovascular disease, dementia, neurological diseases, hypothyroidism, renal failure, obesity, electrolyte disorders, liver disease)Vital signs (oxygen levels)	COVID-19 infections Acute respiratory distress syndrome in patients with and without COVID-19 ICU admission In-hospital death (Mortality)	AUROC-COVID-19 infection: 0.790 (95% CI 0.746–0.835)ARDS: 0.753 (95% CI 0.675–0.831)ICU admission: 0.675 (95% CI 0.620–0.713)In-hospital death: 0.683 (95% CI 0.606–0.761)	Generalizability, data imbalance
Karlafti, 2023	16 and over	Neural network	General	Vital signs (body temperature, heart rate, respiratory rate, oxygen saturation, blood pressure, work of breathing)Chief complaints	Waiting time (minutes) ICU admission (%) Length of stay (in hospital; variables that determine the number of days)	Accuracy (F1 score)- 72.2%	Overstated risk aversion
Lee, 2021	16 and over	Gradient boosting (GB)	Severe trauma	Demographic data (age)Vital signs(heart rate, respiratory rate, arterial pressure)Comorbidities (prehospital cardiac arrest, abbreviated injury scales of head and neck, thorax, and abdomen, ED interventions)	Mortality	Sensitivity-98% Specificity-54.8%	Generalizability
Taylor, 2015	18 and over	Random forest	Sepsis	Demographic data (age, sex, ethnicity, marital status)Vital signs (heart rate, temperature, blood pressure, respiratory rate)Comorbidities (abdominal hernia and pain, renal disease, cardiovascular disease, anxiety disorders, asthma, cancer, developmental disorders etc.)	In-hospital mortality (within 28 days of admission without interval transfer or discharge)	AUROC-0.860 (95% CI 0.819–0.900)	Does not include unstructured data elements, limited to available data
Wu, 2021	17 and over	Stacked system: combined XGradient boosting, AdaBoosting, random forest	General	Demographic data (age, sex)Vital signs (blood pressure, pulse rate, respiratory rate, body temperature, level of consciousness)	In-hospital mortality (6, 24, 72, 168 hours)	ML model- 6 hours: 0.939 (95% CI 0.931–0.946)168 hours: 0.902 (95% CI 0.898–0.905)MEWS model-6 hours: 0.897 (95% CI 0.886–0.908)168 hours:0.816 (95% CI 0.812–0.822)	Did not compare to all other ML models, did not trace out of hospital after discharge deaths, data imbalance issues
Yu, 2021	All (no restrictions	Gradient boosting	COVID-19	Demographic data (race, sex)Vital signs (blood pressure, pulse, temperature, BMI)Comorbidities (cancer, steroids)	Prediction of mechanical ventilationIn-hospital mortality (once patient is admitted)	AUROC-Mechanical ventilation: 68%Mortality: 85%Specificity-95.5%	Unbalanced dataset
Ratnovsky, 2020	All ages	Neural network	General	Demographic data (age)Arrival(time of registration, mode of arrival, chief complaint, the acuity level, type of consult, the presence of at least one lab test, at least one radiology test, ECG exam)	Length of stay (in orthopedic emergency unit; in the ED Fast Track; of all admitted patients; ED LOS for admitted patients)Door-to-different activity TimeOperational indicators andmMedical IndicatorsThe rate of patients that left against medical advicePatient satisfaction	AUROC-0.79Sensitivity-0.69
Stonko, 2023	18 and over	Neural network	Trauma	Demographic data (age, sex, race)ED disposition (floor, ICU, operating room)	Predict prolonged hospital length of stay	AUROC-0.80 (95% CI: 0.786–0.814)Sensitivity-0.32Specificity-0.95	Generalizability
Zhai, 2020	16 and over	Gradient boosting	General	Vital signs (oxygen saturation, fever, respiratory rate, heart rate)Comorbidities (cancer, palpitations, confusion, arrhythmia)Clinical biomarkers (sodium, glucose, hemoglobin, white blood cells)	Mortality (up to 7 days after admission into the emergency department)	AUROC- 0.849 (95% CI 0.81–0.89)Sensitivity- 0.756Specificity-0.806	Generalizability
Wrenn, 2005	All (no restrictions)	Neural network	General	Demographic data (age, language)ED capacity levelComorbidities (latex allergy, blood-borne disease, respiratory isolation)	Length of stay (predicting, minutes)	Training: predicted length of stay within an average of 2 hValidation: predicted length of stay within an average of 7.5 h
Xie, 2022	21 and over	The score for emergency ReAdmission prediction (SERAP)	General	Number of emergency admissions last year, age, history of malignancy, history of renal diseases, serum creatinine level, serum albumin level	Time to emergency readmission (within 90 days post-discharge)	AUROC-0.737 (95% CI 0.730–0.743)	Not all information was available, generalizability
Faqar-Uz-Zaman, 2022	18 and over	Natural language (Ada App)	General	Clinical biomarkers (sodium, white blood cells, potassium, platelets, C-reactive protein)Demographic data (age, sex)Comorbidities (nausea, vomiting, diarrhea, loss of appetite, abdominal pain)	Length of stay (predicting)	Accuracy of ADA: 52.0% (234/450, 95% CI 0.47–0.57)ADA and the doctor suggest the same diagnosis, accuracy: 91.1%	Generalizability
Hollander, 2004	Aged ≥ 24 years (only below 24 if they self-reported cocaine use)	Neural network	Chest pain	Demographic data (age, sex, race)Comorbidities (chest pain, hypertension, family history, diabetes, angina, heart failure)Vital signs (pulse, blood pressure)Clinical biomarkers (creatine kinase, troponin I)	Treatment decisions (admit/discharge decision made before versus after the implementation)	Admission rate-Before: 62.7% [95% confidence interval (CI) 61.3%–64.1%]After: 66.6% [95% CI 62.2%–71.0%])
Kim, 2021	Over 18 years old	Neural network	Chest radiographs	Demographic data (age, sex)	Treatment decisions (consistency of clinical decision-making)Change in the CR interpretation performance	AUROC-0.801 (95% CI 0.774–0.828)Sensitivity-68.89% (95% CI 64.68–73.10)Specificity-91.36% (95% CI 88.49–94.23)	Selection bias, generalizability
Kraevsky-Phillips, 2023	All (no restrictions)	K-means clustering	Heart failure	Comorbidities (dyspnea, indigestion, faintness, mental health complaints, heart rhythm complaints, sweating)	Treatment decisions (acute HF exacerbation: 30-day major adverse cardiac events)	Low-risk group=∼40% with neither dyspnea nor indigestionHigh-risk group=∼25% with symptoms of indigestion, with or without dyspnea	Generalizability
Mou, 2022	18 and over	Logistic regression	Trauma	Demographic data (age, sex)Vital signs (blood pressure, heart rate, respiratory rate, oxygen saturations, oxygen requirement, cardiac rhythm)Clinical biomarkers (blood pH, sodium, potassium, blood urea nitrogen, white blood cell count, hematocrit, platelet count)	Mortality escalation of care (admission rate)	AUROC-Mortality, Max EDI within 24 hours of admission: 0.98Unplanned ICU admission, EDI within 24 h of ICU admission: 0.66 Sensitivity- Mortality, Max EDI within 24 h of admission: 0.93Unplanned ICU admission, EDI within 24 hours of ICU admission: 0.06Specificity-Mortality, Max EDI within 24 hours of admission: 0.94Unplanned ICU admission, EDI within 24 hours of ICU admission: 0.6	Lack of comparison with other models, generalizability
Hinson, 2022	18 and over	Random forest	COVID-19	Demographic data (age, sex, race)Chief complaint (shortness of breath, chest pain, COVID-19 concerns, fever, abdominal pain)Comorbidities (cancer, atrial fibrillation, heart failure, immunosuppression, etc.)Vital signs (blood pressure, heart rate, respiratory rate, oxygen saturation)COVID-19 status	Clinical deterioration (including mortality): Critical care (24 hours) inpatient care (72 hours)	AUROC- Critical care outcome: 0.85 (95% CI 0.83–0.87) [silent prospective validation], 0.85 (95% CI 0.84–0.87) [when visible to ED clinicians] Inpatient care outcome: 0.80 (95% CI 0.78–0.83) [prospective silent cohort], 0.82 (95% CI 0.81–0.84) [when visible to ED clinicians]	Generalizability, cofounding factors due to pandemic in the hospital and ED
Mushtaq, 2020	18 and over	Neural network	COVID-19 (chest radiograph)	Demographic data (age, sex)Vital signs (heart rate, oxygen saturation, temperature)Comorbidities (hypertension, diabetes, kidney disease, coronary artery disease, neurodegenerative disease)	Death and critical COVID-19 (admission to ICU or deaths occurring before ICU admission)	AUROC- Mortality: 0.66 (Qure AI), 0.67 (RALE)Critical COVID-19: 0.77 (Qure AI), 0.75 (RALE)Sensitivity-Qure AI: 76.9%, RALE: 57.9% Specificity-Qure AI: 58.8%, RALE: 77.0%	Generalizability, observer bias, and confounding factors related to the pandemic
Di Napoli, 2022	22 and over	Neural network	COVID-19 (chest CT)	Demographic data (age, sex)Comorbidities (hypertension, diabetes, heart disease, lung failure, COPD, renal insufficiency, immunodeficiency, obesity)Symptoms(dyspnea, cough, ageusia, chest pain, headache, fatigue, fever)	Mortality, intubation, ICU admission	AUROC- Mortality: 72%ICU admission:70%Intubation:64%Sensitivity-Mortality: 55.6%ICU admission: 77%Intubation: 74.7%Specificity- Mortality: 74.8%ICU admission: 46%Intubation: 45.7%	Imbalance on differing outcomes
Lin, 2021	20 and over	Gradient boosting	Sepsis	Vital signs (heart rate, blood pressure, body temperature, respiratory rate)Clinical biomarkers (white blood cell count, hematocrit, platelet count, basophils, lymphocytes, eosinophils etc.)	In-hospital mortality (sepsis patients)	Sensitivity- 67%Specificity-70%	Generalizability
Sariyer, 2019	All	Logistic regression	General	Demographic data (age, sex)Arrival mode and time	ED length of stay (predicting)	Sensitivity-73.834%Specificity-68.274%
Lee, 2022	All	Neural network	Physical trauma	Demographic data (age, sex)Symptoms (intentionality, injury mechanism, and emergent symptom)	In-hospital mortality	AUROC- 0.9513 (SD: 0.0023)Sensitivity- 0.8599 (SD: 0.0151)Specificity-0.8838 (SD: 0.0097)	Did not use other crucial data, such as physiological signals
Raita, 2019	18 and over	Neural network	General	Vital signs (heart rate, blood pressure, body temperature, respiratory rate, oxygen saturation)Demographic data (age, sex)Chief Complaint (general, injuries, respiratory related, musculoskeletal related, gastrointestinal related, neurological related, urological related, skin-related, intoxication, etc.)Mode of arrival	Critical care outcome (Direct admission to ICU or in-hospital death)Hospitalization outcome (admission to an inpatient care site or direct transfer to an acute care hospital)	AUROC-Critical care outcome: 0.86 (95% CI 0.85–0.87) Hospitalization outcome: 0.82 (95% CI 0.82–0.83)Sensitivity- Critical care outcome: 0.80 (95% CI 0.77–0.83) Hospitalization outcome: 0.79 (95% CI 0.78–0.80)Specificity- Critical care outcome: 0.76 (95% CI 0.73–0.78)Hospitalization outcome: 0.71 (95% CI 0.69–0.72)	Selection bias, generalizability
Cardosi, 2021	All	Gradient boosting (XGBoost)	Trauma	Demographic data (age, sex, race)Vital signs(oxygen saturation, blood pressure, pulse, temperature)Comorbidities(alcoholism, angina, chemotherapy, heart failure, current smoker, diabetes, hypertension requiring medication, obesity, etc.)Injury intent(assault, other, self-inflicted, undetermined, unintentional)	Mortality	AUROC-Children0.86 (95% CI 0.85–0.87)Adults0.85 (0.85–0.85)All ages0.85 (0.85–0.85)Sensitivity-Children0.78 (95% CI 0.77–0.79)Adults0.76 (0.76–0.76)All ages0.74 (0.74–0.74)Specificity-Children0.78 (95% CI 0.77–0.79)Adults0.80 (0.80–0.80)All ages0.81 (0.81–0.81)	Lack of availability of all vital information for all patients
Kuo, 2020	All (no restrictions)	Linear regression	General	Demographic data (age, sex)Arrival time triage category (of patient)Number of doctors within 3 hours of patient's arrival	Wait times (predicating in ED)	Baseline Category 3: 45.8 minCategory 4: 82.9Category 5: 106.5Primary outcome Category 3: 45.9Category 4: 84.2Category 5: 100.9Secondary outcome Category 3: 44.6Category 4: 76.4Category 5: 89.1	Removal of outliers
Hsu, 2021	16 and over	Random forest	Traumatic brain injury	Demographic data (age, sex)Vital signs (blood pressure, heart rate)	In-hospital mortality	AUROC-92.1	Generalizability, did not include potentially influential variables such as comorbidities
Kadri, 2022	18 and under	Generative adversarial network	General	Demographic data (age, sex)Arrival (time, date)Diagnosis	Length of stay (predicting; minutes)	RMSE = 100.309 MAE = 61.722 MDAE = 37.322 R ² = 0.871
Seo, 2024	19 and over	Extreme gradient boosting (XGB)	General	Demographic data (age, sex, birth)ED admission (Time, Location, Decision Form)	Likelihood of hospitalizations within 24 h wait times	AUROC-without NLP 0.860 (SD 0.025)NLP 0.922 (SD 0.030)	Generalizability, imbalance datasets,
Choi, 2023	18 and over	ML-based CDSS	General	Demographic data (age, sex)Injury type	Clinical decisions (predicting hospital length of stay, probability of discharge disposition to a facility, probability of inpatient mortality)	AUROC- >0.9	Potential bias, data accuracy due to manual data entry by medical staff
Ko, 2022	19 and over	Balanced random forest classifier (BRF)	Stage 4 Solid cancer patients with septic shock	Demographic data (age, sex)Vital signs (body temperature, oxygen saturation, heart rate, respiratory rate)Clinical biomarkers (lactate, albumin, troponin, potassium, hemoglobin)	28-day mortality	AUROC-0.826 (0.77–0.881)	Generalizability, small sample size, potential bias, confounding variables
Fransvea, 2022	65 and over	Multilayer perceptron	Post-operative (elderly patients)	Demographic data (age, sex, weight, height)Vital signs(oxygen saturation, heart rate, respiratory rate)Clinical biomarkers (creatinine, white blood cells, sodium, hemoglobin)	Mortality (post-operative in ED on elderly patients)	AUROC-0.83Sensitivity-99.15Specificity-66.16
Jeon, 2023	18 and over	Light gradient boosting machine (LightGBM)	Sepsis	Demographic data (sex, age)Vital signs (blood pressure, body temperature, heart rate, respiration rate, oxygen saturation)Comorbidities (diabetes, hypertension, malignancy, lung disease, liver disease, cardiovascular disease, cerebrovascular disease, AIDS, other)Infection and treatment source(respiratory, genitourinary, gastrointestinal, bacteremia, antibiotics, surgery, removal of infected device, percutaneous drainage, endoscopic intervention)Clinical biomarkers(white blood cells, glucose, C-reactive protein, lactate, platelets, creatinine	Mortality (7 days, 14 days, 30 days)	AUROC-7 Day ((95% CI)-0.89 (0.84–0.94)14 Day ((95% CI)-0.89 (0.84–0.94)30 Day ((95% CI)-0.87 (0.82–0.92)	Generalizability, lack external validation
Chiew, 2019	21 and over	Support vector machine	Sepsis	Demographic data (sex, age, ethnicity)Vital signs(blood pressure, temperature, heart rate, respiration rate)	In-hospital mortality (30 days)	Sensitivity-0.63	Generalizability, small sample size, not confirmed sepsis
Radhachandran, 2021	All	XGBoost	Acute heart failure	Demographic data (sex, age, ethnicity, race)Comorbidities (diabetes, hypertension, cancer, dementia, depression, kidney disease, atrial fibrillation, COPD, etc.)	Mortality (7 days)	AUROC-3 feature: 0.8435 feature: 0.830Sensitivity-3 feature: 0.8955 feature: 0.895Specificity-3 feature:0.7495 feature:0.618	Generalizability, small sample size
Carlile, 2020	All (no restrictions)	Neural networks	COVID-19 pneumonia	Physicians (overall, resident, attending)Chest radiograph images	Treatment decisions (chest radiographs)	Contribution to clinical decision-making: resident cohort(18 or 26% agreed)Attending cohort (23 or 17% agreed)Ease of use: resident cohort (61 or 87% strongly agreed and 6 or 9% somewhat agreed)Attending cohort (89 or 67% strongly agreed and 22 or 17% somewhat agreed)	Survey lacked detail, confirmation bias, selection bias, generalizability
DeMichieli, 2023	18 and over	Neural networks	Left ventricular systolic dysfunction	Demographic data (sex, age)Comorbidities (diabetes, hypertension, heart disease, cancer, COPD, immunodeficiency, chronic lung failure, etc.)	Mortality, ICU admittance, intubation	AUROC- Internal mortality: 96%ICU admittance: 94%Intubation: 95%External mortality: 74%ICU admittance: 69%Intubation: 66%Sensitivity-Internal mortality: 90.5%ICU admittance: 89%Intubation: 90%External mortality: 66%ICU admittance: 74.6%Intubation: 72.3%Specificity-Internal mortality: 93.7%ICU admittance: 86.7% Intubation: 90.3%External mortality: 71%ICU admittance: 52%Intubation: 50%	Diagnostic misclassification, generalizability
Gupta, 2018	18 and over	Naïve Bayes (NB)	Sepsis	Demographic data (sex, age)Vital signs (blood pressure, temperature, heart rate, respiratory rate)Hospital location(Urban, Midwest, Northeast, South, West)	In-hospital mortality (28 days)	AUROC-SIRS: 0.653 (0.635–0.650)qSOFA: 0.696 (0.688–0.703)Sensitivity- SIRS:0.623qSOFA: 0.616Specificity-SIRS:0.580qSOFA:0.664	Lack of clear definitions when identifying sepsis patients
Kolossváry, 2023	18 and over	Deep learning	Acute chest pain syndrome	Demographic data (sex, age, race and ethnicity)Clinical biomarkers (serum)Chest radiography characteristics	Mortality (30 days)Determine (30 days): acute coronary syndrome, pulmonary embolism, aortic dissection	AUROC- Internal: 0.85 (95% CI: 0.84, 0.86) External: 0.77 (95% CI: 0.77, 0.78)	Selection bias
Pak, 2021	All (no restrictions)	LASSO	General	Queueing and service flow variablesWeather (humidity, temperature, pressure)Time variables (weekend, weekday, month, holiday)Patient-specific variables (triage status, mental health, ambulance transport)	Wait times	Minutes-482	Generalizability, did not include supply-side factors (confounding factors)
Sax, 2021	18 and over	XGBoost	Acute heart failure	Demographic data (sex, age, race, ethnicity)Cardiovascular history comorbidities (kidney disease, cancer, COPD, dementia, diabetes)Vital signs (blood pressure, respiratory rate, pulse rate, oxygen saturation)	30-day serious adverse event (death, cardiopulmonary resuscitation, balloon-pump insertion, intubation, new dialysis, myocardial infarction, or coronary revascularization)	AUROC-0.85 (0.83–0.86)	Generalizability
Singh, 2022	18 and younger	Neural networks	General	Demographic data (sex) Comorbidities (vomiting, pain, appendicitis, palpitations, dyspnea, swelling, fever, coughing)Location and time (Home from hospital, date and time of triage)Vital signs (oxygen saturation, respiratory rate, heart rate, blood pressure, body temperature)	Wait times (time difference between patient triage completion and test ordering)	Abdominal ultrasonography PPV = 0.86, TPR = 0.10, FPR = 0.0006, AUROC = 0.94ECG PPV = 0.84, TPR = 0.60, FPR = 0.003, AUROC = 0.96Urine dipstick PPV = 0.91, TPR = 0.30, FPR = 0.004, AUROC = 0.88Testicular ultrasonography PPV = 0.88, TPR = 0.40, FPR = 0.0003, AUROC = 0.99Bilirubin levelPPV = 0.94, TPR = 0.90, FPR = 0.001, AUROC = 0.99Forearm radiographPPV = 0.77, TPR = 0.10, FPR = 0.0004, AUROC = 0.98	Generalizability, cost-effectiveness prior to implementation
Walker 2022	All (no restrictions)	Random forests	General	Demographic data (sex, age)ED location (area, type (large, major, etc.))Ambulance patients	Wait times (triage to provider, predicted at triage)	MAE-varied from 22.6 minutes (95%CI 22.4–22.9) for H7 to 44.0 minutes (95%CI 43.4–44.4) for H2	Reliance on administrative data, resource availability,
Zeleke, 2023	All	Gradient boosting	General	Demographic data (sex, age)Arrival mode Comorbidities(lower limb injury, pain at the side, fever/hyperthermia, chest pain, irregular wrist, problems during pregnancy)	Predict the: Length of stay (LOS) (time between admission and discharge; under 6 days) and prolonged LOS (6 days and over)	AUROC-(0.754)	Availability of data (vital signs and laboratory test results not available), generalizability
Zhao, 2020	All (no restrictions)	Logistic regression	Septic patients	Demographic data (sex, age)Vital signs (respiratory rate, heart rate, temperature, oxygen saturation)Clinical biomarkers (white blood cells, platelets, etc.)Interventions: first 24 h (vasopressor, renal replacement therapy, mechanical ventilation, septic shock)	Mortality (28 days)	AUROC-0.813 (95% CI:0.790–0.837)

The reviewed studies used ML models for seven types of predictions: mortality, admission to hospital, ED or hospital length of stay, treatment decision, costs, and COVID-related outcomes. In the context of COVID-19, ML models were primarily applied to predict mortality, hospital admission, and treatment decisions in SARS-CoV-2 patients rather than detecting SARS-CoV-2 infection itself.

Quality assessments

The risk of bias assessment revealed that most studies exhibited a high risk of bias (n = 74, 88.1%). Of these 74 studies, the majority displayed a high risk of bias (n = 72) in the analysis domain, where limitations existed in using missing data and in the absence of calibration models and threshold values. Six studies (7.1%) were classified as having an unclear risk, due to insufficient detail in domains 2 and 3, namely unclear predictor and outcome definitions and assessments, while four studies (4.8%) were identified as low risk, indicating robust methodologies. A summary of the overall risk across all studies is presented in Figure 2, and detailed breakdowns of each study's bias assessment can be found in Supplementary File 3.

Figure 2.

Risk of bias summary plot.

Applications of ML models in emergency departments

To facilitate interpretation, the included studies were grouped according to their primary ML application: (1) mortality prediction, (2) disposition prediction, (3) length of stay estimation, (4) treatment decision-making, (5) wait time prediction, and (6) cost prediction. Table 2 provides a summary of the number of studies in each category by population type (adult/ mixed, pediatric) and predominant ML algorithm (gradient boosting, random forest, neural network, other).

Table 2.

Overview of machine learning applications in emergency departments by outcome category.

Target outcome	Number of studies	Outcome definition	Adults/mixed	Pediatric only	Primary ML model(s) and AUC or (equivalent performance) range	Key features
Prediction of mortality ^15,16,18,20^22–24^{,29–33,35–38,41–48,50,52,54–60,62,64,65,67,69,70,72,74–77,79,85,90–92,96}	50	Predicting short-term (i.e., in-hospital or within 72 hours to 7 days of ED admission) and/or long-term (30–90 days) mortality	49	1 ⁴⁴	Gradient boosting (0.61–0.97), random forest (0.77–0.92), neural network (0.66–0.97)	Age, sex, and race, vital signs
Disposition prediction ^{15,16,22,29,31,35,39}^42–44^{,50,53,64,75–78,84,95}	21	Various disposition outcomes following an ED visit, specifically hospital admission, ICU admission or repeat ED visit with admission after discharge	19	2 ^39,44	Gradient boosting (0.67 –0.96), random forest (0.77–0.88), neural networks (0.66–0.94)	Demographics, vital signs, and comorbidities
Length of stay prediction (ED) ^{19,26,40,63,81,93,95,97,98}	9	Length of stay within ED	8	1 ³⁹	Gradient boosting (0.81–0.85)	Triage level, vital signs, chief complaints, and previous ED visits
Length of stay prediction (Hospital) ^{17,20,25,39,43,49,54,61,73,86,87}	11	Hospital length of stay for patients admitted from the ED	11	0	Gradient boosting (0.80–0.87)	Demographic data, arrival mode, clinical markers, and comorbidities
Treatment decision-making ^{21,27,28,34,51,68,78,80}	8	Sepsis detection, risk stratification, imaging interpretation, treatment planning, and triage-based decision-making	8	0	Deep learning (0.79–0.94),	Vital signs, biomarkers, imaging features, and triage notes
Wait time prediction ^{66,71,83,88,89,94,95}	7	ED wait times, including the time spent waiting to access medical assessment or medical treatment	5	2 ^66,83	Gradient boosting (0.81–0.95)	Triage level, patient volume, arrival time, and department occupancy
ED cost prediction ^40,48,82	3	ED costs and resource utilization	3	0	Logistic regression, multilayer perceptron model (0.71–0.76)	Demographics, prior healthcare use, and triage decisions

Mortality prediction

A total of 50 studies^15,16,18,20^,22–24^{,29–33,35–38,41–48,50,52,54–60,62,64,65,67,69,70,72,74–77,79,85,90–92,96} assessed the use of ML to predict mortality rates, including short-term mortality outcomes (i.e., in-hospital or within 6 hours to 7 days of ED admission)^{15,16,18,20,22,24,30,31,33,44,45,52,56,58,59,62,64,67,70,74,79,92} and long-term mortality outcomes (i.e., 28 days to 1 year). ^23,30^,35–38^{,42,45,46,48,50,55,56,62,72,77,79,85,91,92}

Most studies focused on adults, with only one study specifically focusing on children aged 18 years or younger⁴⁴ and three studies focusing on the elderly (aged 65 and older) population.^18,64,74 Across these studies, 43 different ML models were employed, with gradient boosting, random forest, and neural networks being the most common types. AUROC values for these models ranged from 0.618 to 0.978 for gradient boosting, 0.77–0.921 for random forest, and 0.66–0.976 for neural networks.

Key features (i.e., the most significant variables used by ML models to make predictions) often remained consistent between short- and long-term mortality, including demographic variables age, sex, and race, along with vital signs (i.e., heart rate, temperature, and respiratory rate). However, the variables used often varied depending on the condition being assessed. For example, studies focusing on COVID-19-related outcomes frequently included comorbidities, such as hypertension, diabetes, or cancer in their ML models,^22,30,41,45^,75–77^,96 whereas studies on sepsis often included clinical biomarkers, including white blood cell or platelet count.^{23,36,43,55,56,58,62,72,85}

Some studies compared feature importance between different mortality timeframes.^45,56,62,79 Two studies explored mortality prediction in septic patients.^56,62 Perng et al. assessed 72-hour and 28-day mortality, presenting AUROC values of 0.94 and 0.95 for their combined neural network and SofMax ML model.⁶² The study suggests base excess was the most influential feature for both 72-hour and 28-day mortality, with shock episodes (administration of inotropic agents during ED admission), and red cell distribution being crucial factors, in 72-hour and 28-day mortality, respectively.⁶² Similarly, Jeon et al. examined 7-day, 14-day, and 30-day mortality.⁵⁶ The study showed that septic shock, lactate levels, malignancy, age, and oxygen saturation were the most important features for all three mortality timeframes, yet respiratory infection, which was included in the set of best features for predicting 14-day and 30-day mortality, was not included in the set for 7-day mortality.⁵⁶ In comparison, two studies explored varying timeframes of mortality in relation to triage scores.^45,79 Klug et al. investigated early mortality, defined as mortality up to 2 days following ED registration, and short-term mortality, defined as mortality 2–30 days post ED registration.⁷⁹ The study found that age and structured chief complaint were the strongest predictors of mortality across all timeframes.⁷⁹ The gradient boosting model demonstrated high predictive performance, with an AUC of 0.962 for early mortality and 0.923 for short-term mortality. Notably, a simplified model incorporating nine key features (age, arrival mode, chief complaint, five primary vital signs, and emergency severity index) yielded an AUC of 0.962 for early mortality, comparable to the full-feature model, which had an AUC of 0.964.⁷⁹

Four studies focused on pediatric⁴⁴ or elderly patients only.^18,64,74 Goto et al. investigated the use of ML in pediatric emergency department triage, evaluating its ability to predict critical care prediction, including ICU admission and in-hospital mortality, as well as hospitalization.⁴⁴ The study found that deep neural networks outperformed traditional triage systems, achieving an AUC of 0.85 for critical care prediction and 0.80 for hospitalization.⁴⁴ The three studies focused on elderly patients explored ML applications in different clinical contexts, including cancer, influenza, and emergency surgery.^18,64,74 Qiao et al. developed the Cancer Frailty Assessment Tool (cFAST) using an extreme gradient boosting model to predict in-hospital mortality among older patients with cancer.¹⁸ Their model, which incorporated 240 features, achieved an AUC of 0.92, significantly outperforming traditional risk indices such as the Charlson Comorbidity Index (AUC 0.62) and the Hospital Frailty Risk Score (AUC 0.71).¹⁸ Key predictors included comorbidities, frailty markers, and hospital variables.¹⁸ Tan et al. applied ML to predict clinical outcomes in older ED patients diagnosed with influenza, including hospitalization, pneumonia, sepsis, ICU admission, and in-hospital mortality.⁶⁴ The XGBoost model achieved the highest AUC (0.902) for ICU admission, while a logistic regression model achieved an AUC of 0.889 for in-hospital mortality, and a random forest model obtained an AUC of 0.840 for hospitalization.⁶⁴ Key predictors included oxygen saturation, pulse rate, blood pressure, and comorbidities.⁶⁴ Fransvea et al. developed an explainable Multi Layer Perceptron model to predict 30-day postoperative mortality in elderly patients undergoing emergency surgery.⁷⁴ Their model achieved an accuracy of 94.9%, with a sensitivity of 92.0% and specificity of 95.2%. Key predictors included non-chronic cardiac-related comorbidities, low oxygen saturation, elevated creatinine levels, and reduced functional capacity.⁷⁴

Disposition prediction

Twenty-one studies used ML models to predict various disposition outcomes following an ED visit, specifically hospital admission, ICU admission, or repeat ED visit with admission after discharge.^{15,16,22,29,31,35,39}^,42–44^{,50,53,64,75–78,84,95} The majority of studies explored focused on adults or individuals of all ages, with two studies specifically exploring children^39,44 and one focusing on elderly patients.⁶⁴ The most commonly used models were gradient boosting, random forest, and neural networks, with AUROC values for admission prediction ranging from 0.675 to 0.96 for gradient boosting, 0.77 to 0.885 for random forest, and 0.66 to 0.94 for neural networks.

Studies that focused on general conditions frequently incorporated demographic data, such as sex and age, and vital sign data, including heart rate and body temperature, into their models. In studies exploring patients with respiratory illnesses, such as acute respiratory infections, or those experiencing asthma or COPD exacerbations, patient comorbidity data (e.g., heart failure, cancer, lung disease) was often included.^{15,22,39,50,64}^,75–77 Notably, some models also integrated chief complaint data as a potential predictor.^15,31,95 Studies examining re-admission often leveraged features related to co-morbidities, such as the history or presence of conditions such as renal disease.^43,78,84 A few studies explored both hospital admission and ICU admission,^15,31,44 using predicting variables such as the mode of arrival to the ED.

Analysis of ML models in pediatric ED revealed differences in the key features influencing admission predictions. For instance, Goto et al. identified respiratory rate, ambulance use, oxygen saturation, and pulse rate as key variables for predicting both hospital and ICU admission in children.⁴⁴ However, dehydration, increased work of breathing, poor feeding, and maternal smoking were significant predictors for hospitalization in children presenting with bronchiolitis.³⁹

Length of stay prediction

Nine studies applied ML models to predict LoS within the ED.^{19,26,40,63,81,93,95,97,98} Age ranges varied across studies, with most studies focusing on adult populations (18 years and older), while some specifically examined pediatric populations (0–18 years)³⁹ or older adults (≥65 years).^93,98

Studies consistently demonstrated that gradient boosting models achieved the highest predictive accuracy, with AUC values ranging between 0.81 and 0.85.^19,63,81,93 These models outperformed traditional regression-based approaches and other machine learning models, such as artificial neural network (ANN),²⁶ support vector machines (SVM), and decision trees.^40,98 Random forest models also performed well, particularly in studies analyzing structured triage data.^40,81 Deep learning models, while less commonly used, showed potential for applications where large-scale real-time data integration is required, with chief complaint, vital signs, and previous ED visits being the most significant predictors.⁹⁵

The most significant predictors of ED LoS varied across studies, but several key factors were consistently identified, including triage level,^63,95 vital signs, particularly heart rate, blood pressure, and oxygen saturation,^19,40,63 chief complaints, and previous ED visits.^81,95 Some models also incorporated comorbidity data, demonstrating that chronic conditions such as diabetes, hypertension, and cardiovascular disease were associated with increased ED LoS.^19,93

Eleven studies developed ML models to predict hospital LoS for patients admitted from the ED,^{17,20,25,39,43,49,54,61,73,86,87} with most studies focusing on adult populations (18 years and older), while some specifically examined elderly patients (≥65 years).^20,25,49,73

Among the ML models used, gradient boosting models showed the highest predictive performance, with AUC values ranging from. Random forest and artificial neural networks were also commonly used but showed slightly lower predictive performance.^20,39,49 Deep learning models, particularly generative adversarial networks and convolutional neural networks, demonstrated superior accuracy (Sensitivity = 94%, Specificity = 92%) in specialized applications such as image-based prediction models for intracranial hemorrhage and sepsis-related hospital stays.^61,87 However, these models required large datasets and external validation to ensure generalizability.

The most significant predictors of hospital LoS included demographic variables and arrival mode,^49,73 as well as physiological and clinical markers, such as injury severity score, Glasgow Coma Scale, white blood cell count, lactate levels, and respiratory distress.^20,43,61 Comorbidities, including hypertension, diabetes, chronic respiratory diseases, and malignancies, were also associated with increased hospital LoS, particularly in elderly populations.^{17,20,25,39,43,49,54,61,73,86,87}

Treatment decision-making

Eight studies examined the role of ML models in making treatment decisions during an ED visit.^{21,27,28,34,51,68,78,80} While the majority of studies focused on adult populations, four included patients of all age groups.^21,28,34,78

ML models have shown potential in supporting treatment decision-making in ED settings, particularly in sepsis detection, cardiovascular risk stratification, imaging interpretation, and triage-based decision-making. While deep learning models demonstrated high accuracy in image-based applications, gradient boosting and neural network models were more frequently used for risk stratification and decision support.

In sepsis detection, an SVM model incorporating vital signs, free-text triage assessments, and structured patient history achieved an AUC of 0.86, compared to an AUC of 0.67 when only structured data (vital signs and demographics) were used.²¹ For chest pain and cardiovascular risk assessment, a neural network model was tested for its ability to guide admission or discharge decisions in ED patients presenting with chest pain.²⁷ However, despite its diagnostic accuracy, it did not significantly impact admission rates (pre vs post implementation: 63% vs 67%). The lack of impact was attributed to delays in obtaining cardiac marker results, which meant disposition decisions were often made before ML-based recommendations were available.²⁷ In another study on cardiovascular triage, a gradient boosting model for ED triage in suspected cardiovascular disease demonstrated the highest performance, with an AUC of 0.937, effectively classifying patients into appropriate triage levels.⁶⁸ For emergency triage, an ANN model was used to improve risk stratification in syncope patients, focusing on the decision to hospitalize patients to prevent severe short-term outcomes. This model demonstrated a sensitivity of 100% and a specificity of 79%.⁷⁸

For heart failure management, an unsupervised ML model was developed to identify symptom patterns predictive of acute decompensation and adverse cardiac events in ED patients with heart failure. The model identified indigestion as a novel predictor of adverse outcomes, a feature not commonly included in traditional heart failure risk scores.²⁸

In radiographic diagnosis and treatment planning, deep learning models demonstrated strong predictive accuracy for pneumonia detection on chest radiographs. One model achieved an AUC of 0.906 when incorporating body mass index (BMI) and age, compared to an AUC of 0.829 when using airspace opacities alone.³⁴ Additionally, a deep learning-based assistive system for chest radiograph interpretation significantly improved emergency physician diagnostic performance, with an AUROC of 0.801 and a kappa value of 0.902 for decision-making consistency.⁵¹ The model was trained on ED chest radiographs annotated by radiologists.⁵¹ In another study, a gradient boosting model was designed to optimize head CT utilization in the ED triage process. The model effectively predicted non-contrast head CT usage at triage level, achieving an AUC of 0.9.⁸⁰

Studies incorporating free-text clinical notes along with structured clinical data showed higher predictive performance, particularly in sepsis detection.²¹

Wait time prediction

Seven studies applied ML models to predict ED wait times, including the time spent waiting to access medical assessment or medical treatment.^{66,71,83,88,89,94,95} These studies focused on all ages, with two studies specifically examining children under 18 years old,^66,83 and one study focusing on individuals over 16 years old.⁹⁵

Gradient boosting models outperformed other models in predicting waiting times, with reported reductions in mean squared errors ranging from 15% to 22%,^71,89 and reductions in prediction errors by up to 19%.⁹⁴ Moreover, studies reported overall decreases in patient wait times ranging from 18% to 26%, particularly in pediatric emergency care, where decision trees and logistic regression reduced median wait times by 26% through automated early diagnostic decision-making.⁸³ Queueing-based models combined with quantile regression improved prediction reliability, reducing underpredicted wait times by 42%.⁸⁸ In workflow optimization, gradient boosting models integrated with discrete event simulation led to a 25% reduction in total ED wait times by optimizing staff allocation and process efficiency.⁹⁴ Deep learning models, including convolutional neural networks and long short-term memory (LSTM), improved patient prioritization and reduced wait times by 18%.⁹⁵

Key predictors of ED waiting times included triage level, patient volume, time of arrival, and department occupancy. Higher-acuity patients had shorter waits, while lower-acuity cases faced delays.^66,89 Congestion and staffing availability affected wait times, with peaks during busy hours and weekends.^71,88 Ambulance transport reduced wait times compared to walk-ins.^89,95 ML models identified ESI scores, vital signs, and chief complaints as critical for triage-based predictions.⁹⁵ Studies integrating historical patient flow data and discrete event simulation highlighted resource availability and procedural delays as key factors.⁹⁴

ED cost prediction

Three studies applied ML models to predict ED costs and resource utilization.^40,48,82 Two of the studies^48,82 did not pose any age restrictions, while one study specifically focused on individuals aged 16 and older.⁴⁰

Logistic regression models achieved an AUC of 0.71 for predicting frequent ED visits and 0.76 for identifying patients in the top 5% of ED users.⁸² Multilabel machine learning models, particularly multilayer perceptron classifiers, were used to predict ED orders at triage, achieving a median F1 score of 0.56. Simulations integrating these models showed that reducing ED LOS by an average of 7 minutes could lead to increased efficiency but also resulted in a rise in ordering costs from $21 to $45 per visit.⁴⁰

Key predictors of ED costs included patient demographics, prior healthcare utilization, and triage decisions. Patients with a history of frequent ED visits had higher predicted costs.⁸² Triage-based ML screening models for high-cost conditions, such as ST-elevation myocardial infarction (STEMI), significantly improved early detection and reduced costs associated with delayed treatment.⁴⁸

Discussion

This systematic review highlights the various applications of ML models in ED settings. Across the included studies, ML models were most frequently used for mortality prediction, disposition decisions, LOS estimation (both ED and hospital), treatment decision making, wait time forecasting, and cost prediction.

The primary data sources used were electronic patient records, which have been pivotal in enabling the development of ML models in healthcare, particularly in the EDsettings.⁹⁹ The digitization of health records over the past decades has provided the depth and accessibility of data required for developing and testing ML models that rely on detailed patient information to predict outcomes and recommend interventions with greater accuracy.¹⁰⁰ In addition to electronic patient records, several studies leveraged administrative databases. These databases provide large-scale, longitudinal data that can be valuable for identifying trends, conducting population-level analyses, and evaluating the long-term efficacy of medical interventions. However, the lack of real-time availability of administrative data limits their utility in clinical decision-making within ED settings.^101,102

Commonly applied ML models

Regarding methodologies, neural networks, random forests, and gradient boosting emerged as the most commonly applied ML models in the reviewed studies. These models were likely chosen for their ability to handle large datasets with missing data,¹⁰³ and to predict the nonlinear relationship between parameters.¹⁰⁴ Their flexibility and robustness make them particularly suitable for the complex and dynamic nature of emergency care environments.

Neural networks³⁸ use supervised learning techniques where relationships between inputs and outputs do not follow traditional mathematical models. This allows neural networks to predict the probability of an outcome for an individual rather than for populations and to include cases with missing data. However, neural networks struggle when data is scarce and are more effective with larger datasets. Moreover, neural networks are known to reduce the interpretability of data features, sometimes to the extent that they become meaningless for understanding performance.¹⁰⁵ In contrast, random forests, a model that operates by creating an ensemble of decision trees, are often regarded as one of the most popular techniques for solving classification problems on large datasets.⁵⁰ The use of multiple decision trees makes the model resistant to noisy data points, often resulting in lower error rates and more stable predictions.⁵⁰ While random forests may require more time and system resources, they can perform well on both large and small datasets.⁵⁰ Similarly, gradient boosting also uses decision trees, but unlike random forests, it builds decision trees sequentially rather than independently.^73,79 This sequential construction allows gradient boosting to reduce errors made by previous trees, enabling the model to learn complex patterns in the data.^73,79 However, this also makes gradient boosting more sensitive to noisy data, which can reduce its performance.⁷³

Our review further revealed that these ML models are often chosen based on the specific outcomes being predicted in emergency care settings. Gradient boosting demonstrated high accuracy in predicting mortality, ICU admissions, and treatment decisions, with its sequential learning process making it particularly suited to capturing complex patterns in clinical data. Random forests were most effective in noisy datasets and were widely applied to disposition prediction and wait time estimation. Neural networks excelled in predicting length of stay and treatment decisions, though their limited interpretability posed challenges when understanding variable contributions to predictions. Our study highlights the importance of selecting the appropriate machine learning model based on the problem and dataset being addressed. As suggested by Zeleke et al.,⁷³ it may be beneficial to systematically compare the performance of different algorithms and identify the best model for a given dataset to ensure accurate and reliable predictions.

ML applications in EDs

This review highlights the growing role of ML in EDs, with models applied to mortality prediction, patient disposition, LOS estimation, treatment decision-making, wait time forecasting, and cost prediction.

While ML has demonstrated strong predictive performance across these domains, key challenges remain in external validation, workflow integration, and the ability to translate predictions into real-world improvements in patient care and ED efficiency, as noted in previous reviews.^106,107

Recent evidence further emphasizes that enhancing triage and forecasting processes through ML models, such as natural language processing (NLP) and feature engineering, can substantially improve operational efficiency and patient flow in EDs. For example, a recent systematic review¹⁰⁸ highlighted that ML and NLP models can enhance triage accuracy by integrating free-text triage notes with structured data, outperforming traditional triage scales. Similarly, a retrospective multicenter study using datasets from 11 EDs across hospitals in Australia, the United States, and the Netherlands¹⁰⁹ showed that feature engineering in ML-based forecasting significantly improved the prediction of patient arrivals, supporting more efficient staffing and resource allocation. These findings underscore the growing implementation of ML not only in clinical prediction but in operational optimization within the ED.

One of the most significant findings of this review is that ML models often identified key predictors of patient outcomes that align with clinical intuition. For example, studies on mortality prediction showed that while early mortality is often driven by acute physiological deterioration (such as cytokine storms in sepsis), long-term mortality is more influenced by immune dysfunction and underlying health conditions. Additionally, some studies demonstrated that simplified models with fewer variables could achieve predictive performance comparable to complex models, suggesting that streamlined, interpretable models may be sufficient for clinical decision support.⁷⁹

ML-based disposition models successfully integrated structured patient data, including demographics, comorbidities,^{15,22,39,50,64}^,75–77 as well as chief complaints,^15,31,95 to refine risk assessments for hospital and ICU admission. However, reliance on structured clinical data may limit model performance, as free-text triage notes and clinician assessments often contain critical information not captured in standardized datasets.¹¹⁰ Despite promising results, the variability in model performance across studies suggests that external validation and site-specific calibration are required before broad clinical adoption. Models trained on single-center datasets may not generalize well to different patient populations, particularly in settings with varying healthcare resources and admission practices. Additionally, while some studies integrated mode of arrival as a predictor of ICU admission,^15,31,44 this variable is highly context-dependent and may not be a reliable feature across different hospitals or healthcare systems. Beyond predictive accuracy, further research should explore the impact of ML-driven disposition prediction on ED efficiency, patient outcomes, and healthcare costs to fully realize its potential in optimizing care.

Similarly, ML-based treatment decision models have shown promise in risk stratification for conditions such as heart failure, cardiovascular disease, sepsis, and pneumonia.^{21,27,28,34,51,68,78,80} However, their effectiveness in clinical practice depends on workflow integration.²⁷ Models that provide risk assessments without actionable recommendations may have limited impact on physician decision-making. This emphasizes the need for these models to be integrated into clinical workflows in a way that complements, rather than replaces, clinician judgment; a point also raised previously,¹¹¹ which highlights the importance of clinician-ML collaboration in improving patient outcomes. For ML to be a meaningful addition to clinical practice, it must enhance efficiency while preserving the critical role of human expertise in patient care.

ML models predicting ED and hospital LOS demonstrated that incorporating real-time operational variables, such as department occupancy and historical patient flow data, improved predictive accuracy compared to models relying solely on patient characteristics.⁹⁵ This suggests that LOS prediction should not be static but instead dynamically adjust based on ED conditions. However, the ability of ML-driven predictions to improve operational efficiency depends on real-time implementation; if hospitals do not adjust staffing and resource allocation based on model outputs, predictive gains may not translate into clinical improvements.¹¹² ML-based wait time forecasting faces similar challenges, as most studies have focused on retrospective predictions rather than real-time applications.^71,89 Future research should evaluate how ML-driven LoS predictions influence clinical workflows and patient outcomes when actively used for decision-making. Moreover, discrepancies in hospital admission policies and discharge protocols limit the applicability of ML models across different healthcare settings. To maximize clinical impact, future research should focus on developing adaptive models that continuously learn from hospital-specific data while ensuring multi-center validation to enhance generalizability.

Cost prediction remains the least explored area, with studies suggesting that reducing ED LOS can improve efficiency but may increase diagnostic ordering costs.⁴⁰ Further research should evaluate how ML models can optimize both cost-effectiveness and patient outcomes.

To maximize clinical impact, future ML applications in ED should prioritize multi-center validation, real-time implementation, and integration into existing clinical workflows to ensure that predictive models translate into tangible improvements in patient care and ED operations.¹¹² Furthermore, data collection efforts should extend across multiple centers with diverse patient demographics and treatment approaches, supported by standardized data frameworks to reduce variability and promote consistency. Beyond data collection, traditional approaches to model development and validation require re-evaluation. As highlighted by Youssef et al.,¹⁰⁷ the traditional method of external validation on secondary datasets may not always suffice. Instead, a recurring local validation approach that continuously evaluates the model's performance on the primary dataset over time is recommended.¹⁰⁷ This method ensures that ML models remain accurate, relevant, and responsive to changes in the specific clinical settings where they are deployed.¹⁰⁷ Finally, ethical considerations regarding ML applications in emergency care, including patient privacy, informed consent, and addressing biases in ML decision-making, must be systematically explored. Clear guidelines for the ethical use of ML models in EDs are essential to ensure that these technologies enhance patient care while upholding ethical principles.

Common limitations of ML models

Several recurring limitations were identified across the included studies, primarily high data dimensionality, data imbalances, and selection bias, all of which contribute to reduced generalizability. Additionally, most studies were conducted in high-income countries, particularly the United States and parts of Asia, highlighting a lack of representation from low- and middle-income countries, which may limit the global applicability of findings.

High data dimensionality, where the number of features in a dataset exceeds the number of observations, was a common issue leading to model overfitting, where ML models learn patterns from noise rather than meaningful relationships. This overfitting makes pattern interpretation difficult and reduces the model's ability to generalize to new data.

Data imbalances were also a significant concern.²² When certain patient subgroups or clinical outcomes were underrepresented in the dataset, models disproportionately favored the majority class, failing to make accurate predictions for underrepresented populations.¹¹³ As a result, an ML model trained primarily on patients from a specific demographic group or hospital system may struggle to generalize effectively to other populations or settings.²² Additionally, variability in the availability and quality of clinical variables across databases, such as patient demographics, symptoms, and medical histories, was a major challenge, particularly when datasets were compiled from multiple sources.⁷⁸ One study emphasized that including more comprehensive features, such as detailed medical history and presenting symptoms, improved the predictive accuracy of hospital length-of-stay models.⁷³

Selection bias was also problematic, with many studies removing outliers to improve model performance. Additionally, failure to account for confounding factors, such as differences in healthcare access, physician decision-making, and variations in treatment protocols across institutions, further distorted model predictions and limited their applicability across diverse settings.^88,94

Proper data curation was identified as essential for reducing bias in ML models.⁹⁵ One study highlighted the necessity of large, well-curated datasets, meaning systematically cleaned, validated, and representative of real-world patient populations, to improve fairness and predictive accuracy.⁹⁵ Without proper curation, biases present in raw clinical data, such as differences in how certain conditions are diagnosed or documented across hospitals, can be reinforced by the model, leading to inaccurate or inequitable predictions.¹¹⁴

Strengths and limitations

Our review has several strengths, including a comprehensive and detailed search strategy that imposed no restrictions on time or language. Furthermore, to minimize bias and enhance the reliability of our findings, we involved two independent reviewers at both the first and second levels of screening, as well as during the data extraction phase. However, our review is not without limitations. The heterogeneity among the studies regarding their methodologies, outcomes, and applications of ML precluded the possibility of performing a meta-analysis, thus limiting our capacity to provide a quantitative synthesis of the data. Moreover, with over 90% of the studies focusing on adult populations, the scope of our review is limited with respect to pediatric emergency settings, identifying a significant gap in the literature and highlighting an urgent need for further research in this area. Finally, while our review included studies implementing ML models in ED workflows, with a particular focus on clinical and operational impacts, we acknowledge that studies limited to model development without clinical or operational evaluation, or those restricted to disease-specific prediction tasks without evaluation in ED settings, were excluded. These represent important and evolving areas of ED machine learning research that warrant dedicated future systematic reviews. In addition, although no eligible studies employing large language models (LLMs) were identified during our search, this likely reflects the early stage of their adoption in clinical practice. As LLM-based applications become more prevalent in emergency care, future reviews should evaluate their implementation, impact on clinical workflows, and integration with existing decision-support systems.

Conclusion

ML models have been applied in EDs for predicting mortality, patient disposition, length of stay, treatment decisions, wait times, and costs, with gradient boosting and neural networks being the most commonly used. While some models demonstrated improvements over traditional methods, challenges in data quality, generalizability, and clinical integration remain key barriers to real-world implementation. Addressing these issues through larger, more diverse datasets, ongoing validation, and ethical oversight is critical to determining ML's clinical utility in emergency settings. Large language models offer new opportunities to enhance ED decision-making as they can process free-text inputs from health records and clinician notes, potentially improving context-aware predictions. This could enhance real-time adaptability in ED workflows, but their accuracy, interpretability, and impact on patient outcomes require further study. Future research should focus on evaluating their integration into clinical practice.

Supplemental Material

sj-docx-1-dhj-10.1177_20552076251411209 - Supplemental material for Implementation of machine learning in emergency departments: A systematic review

Supplemental material, sj-docx-1-dhj-10.1177_20552076251411209 for Implementation of machine learning in emergency departments: A systematic review by Banafshe Hosseini, Atushi Patel, Megan Landes, Samuel Vaillancourt, Muhammad Mamdani, Kevin Maruthananth, Neha Matharu, Zuha Pathan, Krishihan Sivapragasam, Onlak Ruangsomboon, Becky Skidmore and Andrew D Pinto in DIGITAL HEALTH

Supplemental Material

sj-docx-2-dhj-10.1177_20552076251411209 - Supplemental material for Implementation of machine learning in emergency departments: A systematic review

Supplemental material, sj-docx-2-dhj-10.1177_20552076251411209 for Implementation of machine learning in emergency departments: A systematic review by Banafshe Hosseini, Atushi Patel, Megan Landes, Samuel Vaillancourt, Muhammad Mamdani, Kevin Maruthananth, Neha Matharu, Zuha Pathan, Krishihan Sivapragasam, Onlak Ruangsomboon, Becky Skidmore and Andrew D Pinto in DIGITAL HEALTH

Supplemental Material

sj-docx-3-dhj-10.1177_20552076251411209 - Supplemental material for Implementation of machine learning in emergency departments: A systematic review

Supplemental material, sj-docx-3-dhj-10.1177_20552076251411209 for Implementation of machine learning in emergency departments: A systematic review by Banafshe Hosseini, Atushi Patel, Megan Landes, Samuel Vaillancourt, Muhammad Mamdani, Kevin Maruthananth, Neha Matharu, Zuha Pathan, Krishihan Sivapragasam, Onlak Ruangsomboon, Becky Skidmore and Andrew D Pinto in DIGITAL HEALTH

Supplemental Material

sj-docx-4-dhj-10.1177_20552076251411209 - Supplemental material for Implementation of machine learning in emergency departments: A systematic review

Supplemental material, sj-docx-4-dhj-10.1177_20552076251411209 for Implementation of machine learning in emergency departments: A systematic review by Banafshe Hosseini, Atushi Patel, Megan Landes, Samuel Vaillancourt, Muhammad Mamdani, Kevin Maruthananth, Neha Matharu, Zuha Pathan, Krishihan Sivapragasam, Onlak Ruangsomboon, Becky Skidmore and Andrew D Pinto in DIGITAL HEALTH

Supplemental Material

sj-docx-5-dhj-10.1177_20552076251411209 - Supplemental material for Implementation of machine learning in emergency departments: A systematic review

Supplemental material, sj-docx-5-dhj-10.1177_20552076251411209 for Implementation of machine learning in emergency departments: A systematic review by Banafshe Hosseini, Atushi Patel, Megan Landes, Samuel Vaillancourt, Muhammad Mamdani, Kevin Maruthananth, Neha Matharu, Zuha Pathan, Krishihan Sivapragasam, Onlak Ruangsomboon, Becky Skidmore and Andrew D Pinto in DIGITAL HEALTH

Footnotes

Acknowledgments

The authors thank Lesley Anne Pablo, Disha Patel (DP), Navreet Singh (NS), and Ellah San Antonio (ESA) for assisting in conducting the systematic review. The authors also thank Kaitryn Campbell, MLIS, MSc, for the peer review of the MEDLINE search strategy. Moreover, this work was supported by the Ontario Ministry of Health and Ministry of Long-Term Care—Research Planning and Management Unit; Strategic Policy, Planning and French Language Services Division (Grant ID#: 693A). We were unable to update the corresponding PROSPERO registration because the record was created by a former staff member, and the associated login credentials are no longer available to our team. Consequently, the author list and title in the PROSPERO entry do not reflect the final version presented in this manuscript.

ORCID iD

Banafshe Hosseini

Ethical considerations

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Author contributions

Banafshe Hosseini (BH) and Andrew D. Pinto (ADP) conceived the study and secured funding. Atushi Patel (AP), Kevin Maruthananth (KM), Neha Matharu (NM), Zuha Pathan (ZP), and Krishihan Sivapragasam (KS) screened the studies and performed data extraction. AP drafted the initial manuscript. Becky Skidmore (BS) designed and executed the search strategy. BH and ADP supervised all stages of the review, from inception to data extraction and manuscript preparation. BH, Megan Landes (ML), Samuel Vaillancourt (SV), and Muhammad Mamdani (MM) revised the manuscript and prepared the final version. All authors contributed to critical revisions, approved the final manuscript for publication, and agreed to be accountable for all aspects of the work. BH is the guarantor.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Ontario Ministry of Health and Ministry of Long-Term Care—Research Planning and Management Unit; Strategic Policy, Planning and French Language Services Division (Grant ID#: 693A).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

The datasets used and/or analysed during the current study are available from the corresponding author upon reasonable request.

Supplemental material

Supplemental material for this article is available online.

References

Lee

Jung

Lou

, et al. The impact of COVID-19 on a large, Canadian community emergency department. West J Emerg Med 2021; 22: 572–579.

Chenais

Lagarde

Gil-Jardiné

. Artificial intelligence in emergency medicine: viewpoint of current applications and foreseeable opportunities and challenges. J Med Internet Res 2023; 25: e40031.

Varner

. Emergency departments are in crisis now and for the foreseeable future. Can Med Assoc J 2023; 195: E851–E852.

Stiell

Wells

Vandemheen

, et al. The Canadian CT head rule for patients with minor head injury. Lancet 2001; 357: 1391–1396.

Seymour

Liu

Iwashyna

, et al. Assessment of clinical criteria for sepsis: for the third international consensus definitions for sepsis and septic shock (sepsis-3). JAMA 2016; 315: 762–774.

Sanders

Doust

Glasziou

. A systematic review of studies comparing diagnostic clinical prediction rules with clinical judgment. PLoS One 2015; 10: e0128233.

Collins

Moons

KGM

. Reporting of artificial intelligence prediction models. Lancet 2019; 393: 1577–1579.

Rajkomar

Dean

Kohane

. Machine learning in medicine. N Engl J Med 2019; 380: 1347–1358.

Richens

Lee

Johri

. Improving the accuracy of medical diagnosis with causal machine learning. Nat Commun 2020; 11: 3923.

10.

Boonstra

Laven

. Influence of artificial intelligence on the work design of emergency department clinicians a systematic literature review. BMC Health Serv Res 2022; 22: 669.

11.

Masoumian Hosseini

Qayumi

, et al. The aspects of running artificial intelligence in emergency care; a scoping review. Arch Acad Emerg Med 2023; 11: e38.

12.

McGowan

Sampson

Salzwedel

, et al. PRESS Peer review of electronic search strategies: 2015 guideline statement. J Clin Epidemiol 2016; 75: 40–46. http://www.jclinepi.com/article/S0895-4356(16)00058-5/pdf. Available.

13.

Higgins

Thomas

Chandler

, et al. Cochrane Handbook for Systematic Reviews of Interventions version 6.4. Chichester, UK: Cochrane, 2023.

14.

Wolff

Moons

KGM

Riley

, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med 2019; 170: 51.

15.

Goto

Camargo

Faridi

, et al. Machine learning approaches for predicting disposition of asthma and COPD exacerbations in the ED. Am J Emerg Med 2018; 36: 1650–1654.

16.

Hong

Rudas

Bell

, et al. Association of red blood cell distribution width with hospital admission and in-hospital mortality across all-cause adult emergency department visits. JAMIA Open 2023; 6: ooad053.

17.

Davis

Rao

Cedeno

, et al. Machine learning and improved quality metrics in acute intracranial hemorrhage by noncontrast computed tomography. Curr Probl Diagn Radiol 2022; 51: 556–561.

18.

Qiao

Qian

Nalawade

, et al. Evaluating high-dimensional machine learning models to predict hospital mortality among older patients with cancer. JCO Clin Cancer Inform 2022; 6: e2100186.

19.

Lee

Reddy Mudireddy

Kumar Pasupula

, et al. Novel machine learning approach to predict and personalize length of stay for patients admitted with syncope from the emergency department. J Pers Med 2022; 13: 7.

20.

Kolcun

JPG

Covello

Gernsback

, et al. Machine learning to predict passenger mortality and hospital length of stay following motor vehicle collision. Neurosurg Focus 2022; 52: E12.

21.

Horng

Sontag

Halpern

, et al. Creating an automated trigger for sepsis clinical decision support at emergency department triage using machine learning. PLoS One 2017; 12: e0174708.

22.

Butler

Karabayir

Samie Tootooni

, et al. Image and structured data analysis for prognostication of health outcomes in patients presenting to the ED during the COVID-19 pandemic. Int J Med Inform 2022; 158: 104662.

23.

Taylor

Pare

Venkatesh

, et al. Prediction of in-hospital mortality in emergency department patients with sepsis: a local big data–driven, machine learning approach. Acad Emerg Med 2016; 23: 269–278.

24.

Halalau

Dalal

, et al. Machine learning methods to predict mechanical ventilation and mortality in patients with COVID-19. PLoS One 2021; 16: e0249285.

25.

Stonko

Weller

Gonzalez Salazar

, et al. A pilot machine learning study using trauma admission data to identify risk for high length of stay. Surg Innov 2023; 30: 356–365.

26.

Wrenn

Jones

Lanaghan

, et al. Estimating patient’s length of stay in the Emergency Department with an artificial neural network. AMIA Annu Symp Proc 2005; 2005: 1155.

27.

Hollander

Sease

Sparano

, et al. Effects of neural network feedback to physicians on admit/discharge decision for emergency department patients with chest pain. Ann Emerg Med 2004; 44: 199–205.

28.

Kraevsky-Phillips

Sereika

Bouzid

, et al. Unsupervised machine learning identifies symptoms of indigestion as a predictor of acute decompensation and adverse cardiac events in patients with heart failure presenting to the emergency department. Heart Lung 2023; 61: 107–113.

29.

Mou

Godat

El-Kareh

, et al. Electronic health record machine learning model predicts trauma inpatient mortality in real time: a validation study. J Trauma Acute Care Surg 2022; 92: 74–80.

30.

Hinson

Klein

Smith

, et al. Multisite implementation of a workflow-integrated machine learning system to optimize COVID-19 hospital admission decisions. NPJ Digit Med 2022; 5: 94.

31.

Raita

Goto

Faridi

, et al. Emergency department triage prediction of clinical outcomes using machine learning models. Crit Care 2019; 23: 64.

32.

Cardosi

Shen

Groner

, et al. Machine learning for outcome predictions of patients with trauma during emergency department care. BMJ Health Care Inform 2021; 28: e100407.

33.

Radhachandran

Garikipati

Zelin

, et al. Prediction of short-term mortality in acute heart failure patients using minimal electronic health record data. BioData Min 2021; 14: 23.

34.

Carlile

Hurt

Hsiao

, et al. Deployment of artificial intelligence for radiographic diagnosis of COVID-19 pneumonia in the emergency department. J Am Coll Emerg Physicians Open 2020; 1: 1459–1464.

35.

De Michieli

Knott

Attia

, et al. Artificial intelligence–augmented electrocardiography for left ventricular systolic dysfunction in patients undergoing high-sensitivity cardiac troponin T. Eur Heart J Acute Cardiovasc Care 2023; 12: 106–114.

36.

Gupta

Liu

Shepherd

, et al. Using statistical and machine learning methods to evaluate the prognostic accuracy of SIRS and qSOFA. Healthc Inform Res 2018; 24: 139.

37.

Kolossváry

Raghu

Nagurney

, et al. Deep learning analysis of chest radiographs to triage patients with acute chest pain syndrome. Radiology 2023; 306: e221926.

38.

Sax

Mark

Huang

, et al. Use of machine learning to develop a risk-stratification tool for emergency department patients with acute heart failure. Ann Emerg Med 2021; 77: 237–248.

39.

Walsh

Cunningham

Rothenberg

, et al. An artificial neural network ensemble to predict disposition and length of stay in children presenting with bronchiolitis. Eur J Emerg Med 2004; 11: 259–264.

40.

Hunter-Zinck

Peck

Strout

, et al. Predicting emergency department orders with multilabel machine learning techniques and simulating effects on length of stay. J Am Med Inform Assoc 2019; 26: 1427–1436.

41.

Chamberlin

Aquino

Nance

, et al. Automated diagnosis and prognosis of COVID-19 pneumonia from initial ER chest X-rays using deep learning. BMC Infect Dis 2022; 22: 637.

42.

Yoon

Pomerantz

Mercaldo

, et al. Incorporating algorithmic uncertainty into a clinical machine deep learning algorithm for urgent head CTs. PLoS One 2023; 18: e0281900.

43.

McCoy

Das

. Reducing patient mortality, length of stay and readmissions through machine learning-based sepsis prediction in the emergency department, intensive care unit and hospital floor units. BMJ Open Qual 2017; 6: e000158.

44.

Goto

Camargo

Faridi

, et al. Machine learning–based prediction of clinical outcomes for children during emergency department triage. JAMA Netw Open 2019; 2: e186937.

45.

Xie

Nan

, et al. An external validation study of the score for emergency risk prediction (SERP), an interpretable machine learning-based triage score for the emergency department. Sci Rep 2022; 12: 17466.

46.

Kim

Hwang

Choi

, et al. A deep learning model using chest radiographs for prediction of 30-day mortality in patients with community-acquired pneumonia: development and external validation. Am J Roentgenol 2023; 221: 586–598.

47.

Son

Myung

Shin

, et al. Improved patient mortality predictions in emergency departments with deep learning data-synthesis and ensemble models. Sci Rep 2023; 13: 15031.

48.

Kim

Hwang

Cho

, et al. A retrospective clinical evaluation of an artificial intelligence screening method for early detection of STEMI in the emergency department. J Korean Med Sci 2022; 37: 81.

49.

Lee

Kim

. Factors affecting the length of stay in the emergency department for critically ill patients transferred to regional emergency medical center. Nurs Open 2023; 10: 3220–3231.

50.

Kang

Cha

Yoo

, et al. Predicting 30-day mortality of patients with pneumonia in an emergency department setting using machine-learning models. Clin Exp Emerg Med 2020; 7: 197–205.

51.

Kim

Han

Cho

, et al. Effect of deep learning-based assistive technology use on chest radiograph interpretation by emergency department physicians: a prospective interventional simulation-based study. BMC Med Inform Decis Mak 2021; 21: 311.

52.

Lee

Kang

Seo

, et al. Model for predicting in-hospital mortality of physical trauma patients using artificial intelligence techniques: nationwide population-based study in Korea. J Med Internet Res 2022; 24: e43757.

53.

Seo

Ahn

Gwon

, et al. Prediction of hospitalization and waiting time within 24 h of emergency department patients with unstructured text data. Health Care Manag Sci 2024; 27: 114–129.

54.

Choi

Chung

, et al. Development of a machine learning-based clinical decision support system to predict clinical deterioration in patients visiting the emergency department. Sci Rep 2023; 13: 8561.

55.

Jeon

Son

, et al. Machine learning model development and validation for predicting outcome in stage 4 solid cancer patients with septic shock visiting the emergency department: a multi-center, prospective cohort study. J Clin Med 2022; 11: 7231.

56.

Jeon

E-T

Song

Park

, et al. Mortality prediction of patients with sepsis in the emergency department using machine learning models: a retrospective cohort study according to the Sepsis-3 definitions. Signa Vitae 2023; 20: 112–124.

57.

Hsu

S-D

Chao

Chen

S-J

, et al. Machine learning algorithms to predict in-hospital mortality in patients with traumatic brain injury. J Pers Med 2021; 11: 1144.

58.

Lin

P-C

Chen

K-T

Chen

H-C

, et al. Machine learning model to identify sepsis patients in the emergency department: algorithm development and validation. J Pers Med 2021; 11: 1055.

59.

K-H

Cheng

F-J

Tai

H-L

, et al. Predicting in-hospital mortality in adult non-traumatic emergency department patients: a retrospective comparison of the modified early warning score (MEWS) and machine learning approach. PeerJ 2021; 9: e11988.

60.

Lee

K-C

Lin

T-C

Chiang

H-F

, et al. Predicting outcomes after trauma. Medicine (Baltimore) 2021; 100: e27753.

61.

Chien

H-WC

Yang

T-L

Juang

W-C

, et al. Pilot report for intracranial hemorrhage detection with deep learning implanted head computed tomography images at emergency department. J Med Syst 2022; 46: 49.

62.

Perng

J-W

Kao

I-H

Kung

C-T

, et al. Mortality prediction of septic patients in the emergency department based on machine learning. J Clin Med 2019; 8: 1906.

63.

Chang

Y-H

Shih

H-M

J-E

, et al. Machine learning–based triage to identify low-severity patients with a short discharge length of stay in emergency department. BMC Emerg Med 2022; 22: 88.

64.

Tan

T-H

Hsu

C-C

Chen

C-J

, et al. Predicting outcomes in older ED patients with influenza in real time using a big data-driven and machine learning approach to the hospital information system. BMC Geriatr 2021; 21: 280.

65.

K-C

Eric Nyam

T-T

Wang

C-C

, et al. A computer-assisted system for early mortality risk prediction in patients with traumatic brain injury using artificial intelligence algorithms in emergency room triage. Brain Sci 2022; 12: 612.

66.

Guo

, et al. Characteristics and admission preferences of pediatric emergency patients and their waiting time prediction using electronic medical record data: retrospective comparative analysis. J Med Internet Res 2023; 25: e49605.

67.

Chen

Wang

, et al. Machine learning-based in-hospital mortality prediction models for patients with acute coronary syndrome. Am J Emerg Med 2022; 53: 127–134.

68.

Jiang

Mao

, et al. Machine learning-based models to support decision-making in emergency department triage for patients with suspected cardiovascular disease. Int J Med Inform 2021; 145: 104326.

69.

Zhang

Ren

, et al. Machine learning based early mortality prediction in the emergency department. Int J Med Inform 2021; 155: 104570.

70.

Zhai

Lin

, et al. Using machine learning tools to predict outcomes for emergency department intensive care unit patients. Sci Rep 2020; 10: 20919.

71.

Kuo

Y-H

Chan

Leung

JMY

, et al. An integrated approach of machine learning and systems thinking for waiting time prediction in an emergency department. Int J Med Inform 2020; 139: 104143.

72.

Zhao

Wei

Chen

, et al. Prognostic value of an inflammatory biomarker-based clinical algorithm in septic patients in the emergency department: an observational study. Int Immunopharmacol 2020; 80: 106145.

73.

Zeleke

Palumbo

Tubertini

, et al. Machine learning-based prediction of hospital prolonged length of stay admission at emergency department: a gradient boosting algorithm analysis. Front Artif Intell 2023; 6: 6.

74.

Fransvea

Liuzzi

, et al. Study and validation of an explainable machine learning-based mortality prediction following emergency surgery in the elderly: a prospective observational study. Int J Surg 2022; 107: 106954.

75.

Di Napoli

Tagliente

Pasquini

, et al. 3D CT-Inclusive deep-learning model to predict mortality, ICU admittance, and intubation in COVID-19 patients. J Digit Imaging 2022; 36: 603–616.

76.

Mushtaq

Pennella

Lavalle

, et al. Initial chest radiographs and artificial intelligence (AI) predict clinical outcomes in COVID-19 patients: analysis of 697 Italian patients. Eur Radiol 2021; 31: 1770–1779.

77.

Dipaola

Gatti

Giaj Levra

, et al. Multimodal deep learning for COVID-19 prognosis prediction in the emergency department: a bi-centric study. Sci Rep 2023; 13: 10868.

78.

Falavigna

Costantino

Furlan

, et al. Artificial neural networks and risk stratification in emergency departments. Intern Emerg Med 2019; 14: 291–299.

79.

Klug

Barash

Bechler

, et al. A gradient boosting machine learning model for predicting early mortality in the emergency department triage: devising a nine-point triage score. J Gen Intern Med 2020; 35: 220–227.

80.

Klang

Barash

Soffer

, et al. Promoting head CT exams in the emergency department triage using a machine learning model. Neuroradiology 2020; 62: 153–160.

81.

Ratnovsky

Rozenes

Bloch

, et al. Statistical learning methodologies and admission prediction in an emergency department. Australas Emerg Care 2021; 24: 241–247.

82.

Frost

Vembu

Wang

, et al. Using the electronic medical record to identify patients at high risk for frequent emergency department visits and high system costs. Am J Med 2017; 130: 601.e17–601.e22.

83.

Singh

Nagaraj

Mashouri

, et al. Assessment of machine learning–based medical directives to expedite care in pediatric emergency medicine. JAMA Netw Open 2022; 5: e222599.

84.

Xie

Liu

Yan

, et al. Development and validation of an interpretable machine learning scoring tool for estimating time to emergency readmissions. EClinicalMedicine 2022; 45: 101315.

85.

Chiew

Liu

Tagami

, et al. Heart rate variability based machine learning models for risk prediction of suspected sepsis patients in the emergency department. Medicine (Baltimore) 2019; 98: e14197.

86.

Chrusciel

Girardon

Roquette

, et al. The prediction of hospital length of stay using unstructured data. BMC Med Inform Decis Mak 2021; 21: 351.

87.

Kadri

Dairi

Harrou

, et al. Towards accurate prediction of patient length of stay at emergency department: a GAN-driven deep learning framework. J Ambient Intell Humaniz Comput 2023; 14: 11481–11495.

88.

Pak

Gannon

Staib

. Predicting waiting time to treatment for emergency department patients. Int J Med Inform 2021; 145: 104303.

89.

Walker

Jiarpakdee

Loupis

, et al. Emergency medicine patient wait time multivariable prediction models: a multicentre derivation and validation study. Emerg Med J 2022; 39: 386–393.

90.

Fernandes

Mendes

Vieira

, et al. Risk of mortality and cardiopulmonary arrest in critical patients presenting to the emergency department using machine learning and natural language processing. PLoS One 2020; 15: e0230876.

91.

van Doorn

WPTM

Helmich

van Dam

PMEL

, et al. Explainable machine learning models for rapid risk stratification in the emergency department: a multicenter study. J Appl Lab Med 2024; 9: 212–222.

92.

Jenny

Hertwig

Ackermann

, et al.

Are mortality and acute morbidity in patients presenting with nonspecific complaints predictable using routine variables?

Acad Emerg Med 2015; 22: 1155–1163.

93.

Aldhoayan

Al Harbi

Arajhi

, et al. Statistical and machine learning analysis of non-clinical factors impacting emergency room delays. Inform Med Unlocked 2022; 33: 101098.

94.

Hosseini Shokouh

Mohammadi

Yaghoubi

. Optimization of service process in emergency department using discrete event simulation and machine learning algorithm. Arch Acad Emerg Med 2022; 10: e44.

95.

Karlafti

Anagnostis

Simou

, et al. Support systems of clinical decisions in the triage of the emergency department using artificial intelligence: the efficiency to support triage. Acta Med Litu 2023; 30: 2.

96.

Heldt

Vizcaychipi

Peacock

, et al. Early risk assessment for COVID-19 patients from emergency department data using machine learning. Sci Rep 2021; 11: 4200.

97.

Faqar-Uz-Zaman

Anantharajah

Baumartz

, et al. The diagnostic efficacy of an app-based diagnostic health care application in the emergency room: eRadaR-trial. A prospective, double-blinded, observational study. Ann Surg 2022; 276: 935–942.

98.

Sariyer

Öcal Taşar

Cepe

. Use of data mining techniques to classify length of stay of emergency department patients. Bio-Algorithms and Med-Systems 2019; 15: 1–15.

99.

Barbazza

Allin

Byrnes

, et al. The current and potential uses of electronic medical record (EMR) data for primary health care performance measurement in the Canadian context: a qualitative analysis. BMC Health Serv Res 2021;21:820.

100.

Tam

Gullick

Saavedra

, et al. Combining structured and unstructured data in EMRs to create clinically-defined EMR-derived cohorts. BMC Med Inform Decis Mak 2021; 21: 91.

101.

Ravaut

Harish

Sadeghi

, et al. Development and validation of a machine learning model using administrative health data to predict onset of type 2 diabetes. JAMA Netw Open 2021; 4: e2111315.

102.

Ravaut

Sadeghi

Leung

, et al. Predicting adverse outcomes due to diabetes complications with machine learning using administrative health data. NPJ Digit Med 2021; 4: 24.

103.

Wang

, et al. Random bits forest: a strong classifier/regressor for big data. Sci Rep 2016; 6: 30086.

104.

Nakamoto

. Neural Networks and Deep Learning. North Charleston, SC, United States: Createspace Independent Publishing, 2017.

105.

Zolfaghari

Golabi

. Modeling and predicting the electricity production in hydropower using conjunction of wavelet transform, long short-term memory and random forest models. Renew Energy 2021; 170: 1367–1381.

106.

Park

Y-J

Pillai

Deng

, et al. Assessing the research landscape and clinical utility of large language models: a scoping review. BMC Med Inform Decis Mak 2024; 24: 72.

107.

Youssef

Pencina

Thakur

, et al. External validation of AI models in health should be replaced with recurring local validation. Nat Med 2023; 29: 2686–2687.

108.

Porto

. Improving triage performance in emergency departments using machine learning and natural language processing: a systematic review. BMC Emerg Med 2024; 24: 219.

109.

Porto

Fogliatto

. Enhanced forecasting of emergency department patient arrivals using feature engineering approach and machine learning. BMC Med Inform Decis Mak 2024; 24: 377.

110.

Stewart

Goudie

, et al. Applications of natural language processing at emergency department triage: a narrative review. PLoS One 2023; 18: e0279953.

111.

Parikh

Obermeyer

Navathe

. Regulation of predictive analytics in medicine. Science (1979) 2019; 363: 810–812.

112.

Fitzgerald

Pelletier

Reznek

. A queue-based monte carlo analysis to support decision making for implementation of an emergency department fast track. J Healthc Eng 2017; 2017: 1–8.

113.

Gianfrancesco

Tamang

Yazdany

, et al. Potential biases in machine learning algorithms using electronic health record data. JAMA Intern Med 2018; 178: 1544.

114.

Yang

Soltan

AAS

Eyre

, et al. Algorithmic fairness and bias mitigation for clinical machine learning with deep reinforcement learning. Nat Mach Intell 2023; 5: 884–894.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.03 MB

0.13 MB

0.08 MB

0.26 MB

0.24 MB