Sage Journals: Discover world-class research

Abstract

Cardiovascular disease is the leading cause of death worldwide so, early prediction and diagnosis of cardiovascular disease is essential for patients affected by this fatal disease. The goal of this article is to propose a machine learning–based 1-year mortality prediction model after discharge in clinical patients with acute coronary syndrome. We used the Korea Acute Myocardial Infarction Registry data set, a cardiovascular disease database registered in 52 hospitals in Korea for 1 November 2005–30 January 2008 and selected 10,813 subjects with 1-year follow-up traceability. The ranges of hyperparameters to find the best prediction model were selected from four different machine learning models. Then, we generated each machine learning–based mortality prediction model with hyperparameters completed the range fitness via grid search using training data and was evaluated by fourfold stratified cross-validation. The best prediction model with the highest performance was found, and its hyperparameters were extracted. Finally, we compared the performance of machine learning–based mortality prediction models with GRACE in area under the receiver operating characteristic curve, precision, recall, accuracy, and F-score. The area under the receiver operating characteristic curve in applied machine learning algorithms was averagely improved up to 0.08 than in GRACE, and their major prognostic factors were different. This implementation would be beneficial for prediction and early detection of major adverse cardiovascular events in acute coronary syndrome patients.

Keywords

clinical decision-making data mining decision support systems information and knowledge management machine learning

Background and introduction

Cardiovascular disease (CVD) is the primary cause of deaths in the world.^1–3 Acute coronary syndrome (ACS) is defined as unstable angina, non-Q wave myocardial infarction (MI), and Q wave MI.⁴ It is important to recognize a patient with ACS promptly because appropriate therapy can markedly improve the patient’s prognosis. As a tool of assisting to improve the patient’s prognosis, many CVD risk prediction models have been developed through regression-based methods and machine learning–based approaches using the prognostic factors. Examples of regression-based method, which aimed to apply to clinical diagnosis by converting the prognostic factors into risk indices, are Framingham risk score (FRS),^5–8 QRISK,^9,10 and GRACE^11–14 models. On the other hand, there are machine learning–based prediction models of CVD occurrences such as random forests (RFs),¹⁵ neural networks (NNs),¹⁶ and support vector machines (SVMs).¹⁷ The machine learning–based approaches are known as methods to solve the limitations of traditional regression-based prediction models of the CVD occurrences. The basic objectives of machine learning–based mortality prediction models are to find the associations between different diseases and have a high accuracy of prediction results and acquire an excellent ability to process missing and outlier data. It is also possible to perform the data analysis on small and incomplete training data sets with dependent variables which is the disadvantage of the regression-based model (logistic regression and Cox proportional hazard regression models).¹⁸

We can summarize the critical and challenging issues of ACS patients in previous prediction models of the CVD occurrence as follows. First, most of the previous regression-based CVD prediction models do not provide a high accuracy in the prognosis and diagnosis of the CVD occurrences in patients with moderate risk. For example, approximately half of MIs and strokes occur in those people who are not predicted to be at the risk of CVD.¹⁹ Even though the guidelines for CVD risk diagnosis and prediction were provided, the doctor often administers unnecessary treatment to the patients with moderate risk. Second, we implicitly assume that each prognostic factor in the regression-based CVD prediction model is associated with the occurrence of major adverse cardiovascular events (MACEs): a composite of death, MI, or repeat coronary revascularization of the target lesion, and therefore, the non-linear interactive relations among prognostic factors oversimplify.¹⁸ Third, conventional regression-based CVD prediction tools include major prognostic factors such as age, blood pressure (BP), heart rate, diabetes, cholesterol, smoking, and history of heart disease, whereas machine learning–based CVD prediction models involve different prognostic factors¹⁹ mentioned in Table 6. Fourth, machine learning–based CVD prediction models have already been used in various medical areas but mainly focus on analyzing the medical images using convolutional neural network (CNN).²⁰ Especially, there is little research on the machine learning–based mortality prediction model in clinical patients with ACS, which suggests the need to analyze and predict about mortality for this research. Just because of unavailability of medical facilities, growing cost of health care, and staff shortage in emergency situation, it is essential to find a solution that redress the aforementioned problems, predict the degree of risk according to patients’ previous medical follow-up record provided by hospital emergency departments, and identify factors affecting the seriousness of patients.^21–24

Therefore, this article proposes a machine learning–based mortality prediction model during 1-year follow-up tracking after hospital discharge in clinical ACS patients. Its aim is to assess the degree of risk in patients with CVD and develop a clinical decision support system to accurately predict the mortality of ACS patients during 1-year follow-up after discharge. Our contributions can be summarized as follows. First, we used a data set of the Korea Acute Myocardial Infarction Registry (KAMIR) and preprocessed it using one-hot encoding rule. Second, we selected 8962 subjects excluding 5923 people who failed to follow up after hospital discharge from the population and had missing values. We finally selected 8227 subjects (7832 alive and 395 dead), excluding 735 patients who died during hospital admission from the 8962 subjects. Third, the selected data set was divided into a training data set of 6606 (80.297%) and testing of 1621 (19.703%) through random sampling. Fourth, we implemented our machine learning–based mortality prediction model using gradient boosting machine (GBM),^25,26 generalized linear model (GLM),²⁷ RF,¹⁵ and deep neural network (DNN)^16,28 during the 1-year follow-up after discharge. Then, we compared the performances of machine learning–based mortality prediction models by using the area under the receiver operating characteristic (ROC) curve (AUC), precision, recall, accuracy, and F-score.

Method

Data collection

KAMIR was the first nationwide, multicenter online registry designed to describe the characteristics and clinical outcomes of patients affected with MI and reflected current management of patients with the ACS in Korea.²⁹ This is the data set of Asian patients affected with acute MI, and it reflects the real-world medical information and treatment practice of all patients. This data set is collected from expert coordinators by using the standardized form and protocol which is approved by the ethics committee of all institutions participated in it. The registry includes 52 community and university hospitals with the capability of primary percutaneous coronary intervention (PCI) treatment. Data were collected at each site by trained study coordinators with the standardized protocol retrospectively. All enrolled subjects were emergency patients who were diagnosed with ACS and had a chest pain within 24 h. In this article, we used 14,885 ACS subjects enrolled in KAMIR from 1 November 2005 to 31 January 2008. The experimental data set consisted of 14,885 records with 22 continuous values (e.g. age, body mass index (BMI), waist-to-hip ratio (WHR), symptom-to-balloon time, and arrival-to-balloon time) and 43 categorical values (gender, pain, dyspnea, previous angina before MI symptom, etc.), as described in Table 1. The main outcome of this article is defined as cardiac and sudden deaths during 1-year clinical follow-up tracking after hospital discharge and predict about MACEs which will be helpful to determine the risk of patients’ mortality. The death after hospital discharge includes cardiac and non-cardiac death. There were also four discrete variables, such as Killip class and lesion type, which show the severity of the patient’s condition.

Table 1.

Applied variables for the mortality prediction model.

Data type	Variables
Continuous(22)	Age, BMI, WHR, symptom-to-balloon time (minute), arrival-to-balloon time (minute), SBP, DBP, heart rate, LV ejection fraction in echocardiography, glucose (on admission), creatinine (on admission), maximum CK, maximum CK-MB, maximum troponin I, maximum troponin T, total cholesterol, triglyceride, HDL cholesterol, LDL cholesterol, hsCRP, NT-proBNP, BNP
Categorical(43)	Gender, resuscitation prior to arrival, DOA or not resuscitated state at arrival, symptoms at admission, pain, dyspnea, previous angina before MI symptom, ECG on admission, ischemia location, heart rhythm, history of ischemic heart disease, history of hypertension, history of diabetes mellitus, history of dyslipidemia, history of smoking, family history of heart disease in the first degree, comorbidities, past regular medication, initial therapeutic strategy, initial therapeutic strategy for STEMI, initial therapeutic strategy for NSTEMI, thrombolysis, why thrombolysis was not performed, PCI, coronary angiographic findings, angiographic findings, target vessel, treated vessel, PCI with stent, if yes target lesion stent type, PCI success, state of revascularization, complications, kind of complications, supportive treatment of complication, echocardiography, stress test, what kind of stress test, result stress test, CABG, electrophysiology study, medical therapy in hospital, discharge medication, final diagnosis
Discrete (4)	Killip class, lesion type, LV ejection fraction in coronary angiographic findings, mitral regurgitation grade

BMI: body mass index; WHR: waist-to-hip ratio; SBP: systolic blood pressure; DBP: diastolic blood pressure; LV: left ventricular; CK-MB: creatine kinase-muscle brain; DOA: dead on arrival; ECG: electrocardiogram; PCI: percutaneous coronary intervention; CABG: coronary artery bypass grafting; HDL: high-density lipoprotein; LDL: low-density lipoprotein; NT-proBNP: N-terminal of the prohormone brain natriuretic peptide; hsCRP: high-sensitivity C-reactive protein; STEMI: ST-segment elevation myocardial infarction; NSTEMI: non-ST-segment elevation myocardial infarction.

Data preprocessing

During the data preprocessing, all outliers in numeric data (e.g. special characters, numeric values like 999 that are out of range, and invalid datetime) are converted into a null value. In data source, all attributes that can be subdivided are subdivided into independent classes, and each class generates a new attribute. The new attribute is encoded to be a 1 if an attribute value is true for the new class; 0 otherwise, in accordance with the one-hot encoding rule, a representation of categorical variables as binary vectors. This first requires that each categorical value be mapped to an integer value. Then, each integer value is represented as a binary vector, that is, all zero values except the index of the integer, which is marked with a 1.

In this article, we skip the detailed encoding rules for numerical variables and categorical variables. A numeric value is converted into a 1 if an attribute value is true for the new class and 0 otherwise, in accordance with the reference values at Mayo Clinic which is a global reference laboratory that provides healthcare facilities and shares their medical care advices globally and make their practices and specialized tests accessible to physicians worldwide.³⁰ For example, attribute “age” is converted into six new attributes such as “<36,” “36–45,” “46–55,” “56–65,” “66–75,” and “⩾76.” WHR is preprocessed as a “1” for obesity, as ⩾1 in men and ⩾0.85 in women; otherwise a “0” for normal. BMI is converted into four attributes: “⩽18.5,” “18.5–22.99,” “23–24.99,” and “⩾25.”³¹ During the preprocessing of categorical variables, all values for each variable generate new attributes, and then, each attribute has a value 1 in the column that corresponds to the true for this category and 0 otherwise. During this conversion of categorical variables, all null values are replaced with a 0.

Data extraction

For the experiment of this study, we used the data of 14,885 ACS patients enrolled in KAMIR from 1 November 2005 to 31 January 2008.²⁹ We selected 8962 subjects from the original data set and excluded 5923 people who failed to follow up after hospital discharge. Table 2 is the criteria failed at the 1-year follow-up tracking after discharge. In Table 2, the Null value means that the patient’s tracking during 1-year follow-up after hospital discharge failed. However, our data set excludes all the subjects who failed at the 1-year follow-up tracking after discharge but includes the subjects of cardiac and non-cardiac death during the follow-up period only.

Table 2.

Criteria in patients who failed at the 1-year follow-up after hospital discharge.

Variables	1 month	6 months	12 months
Criteria	Alive	Null	Null
	Alive	Alive	Null
	Null	Alive	Null
	Null	Null	Null

After that, we finally selected 8227 subjects (7832 alive and 395 dead) and excluded 735 patients who had died during the hospital admission from the 8962 subjects. The overall data extraction processes are shown in Figure 1. The 8227 subjects are then subdivided into a training data set of 6606 (80.297%) for model learning and a testing data set of 1621 (19.703%) for evaluating the prediction model through random sampling. The training and testing data set includes the deaths of 305 and 90 patients, respectively.

Figure 1.

Experimental data extraction.

Architecture of the proposed mortality prediction model

To develop a mortality prediction model for patients with ACS, we employed machine learning algorithms such as GBM,^25,26 GLM,²⁷ RF,¹⁵ and DNN.^16,28 First, DNN is an artificial neural network (ANN) with multiple hidden layers between the input and output layers, comprising three hidden layers in artificial networks and non-linear patterns in unstructured data.^28,32 Second, GBM is a boosting method plus gradient descent. It creates a model, generates a fitting model to the residual, and combines both models. Next, if the residual is found in the combined model again, then a fitting model creates in the residual, and a final prediction model generates by repeating until the residual does not exist. Third, GLM is an extension of the linear regression model which enhances the linear model so that it is analyzed even when the dependent variables are not in the normal distribution. The GLM is a combination of traditional statistical methods and machine learning techniques in which dependent variables are linearly related to independent variables through a specified link function and finds the combination of hyperparameter values through the grid search approach. RF builds multiple decision trees and merges them together to get a more accurate and stable prediction. It is a flexible, simple supervised ensemble machine learning algorithm that mostly produces the accurate result without hyperparameter tuning and can be used for both classification and regression tasks.

Our mortality prediction model employed four machine learning algorithms, and Figure 2 shows the overall processing architecture of mortality prediction model for patients with ACS. Its processing phases can be summarized in detail as follows. First, the preprocessed data are subdivided through random sampling into two classes with a training data set (80%) for learning the models and a test data set (20%) for evaluation. During the random sampling, the rate of death and survival should maintain constantly. Second, we selected the ranges of hyperparameters to find the best prediction model of each machine learning model, including RF, GBM, GLM, and DNN. According to the machine learning algorithms, we created a machine learning–based mortality prediction model with the hyperparameters for clinical patients with ACS, which completes the range fitness through grid search using training data, and then it is evaluated by fourfold stratified cross-validation. Third, we found the best prediction model with the highest performance in each machine learning algorithm and extracted its hyperparameters. Fourth, each machine learning–based model employed the best hyperparameters and was evaluated by the test data. Finally, we compared the performances of mortality prediction models and then selected the best mortality prediction model during the 1-year follow-up tracking in patients with ACS.

Figure 2.

Processing architecture of our proposed mortality prediction model during the 1-year follow-up tracking using machine learning algorithms.

Statistical analysis and implementation environments

In statistical analysis, continuous variables (e.g. age and BP) are represented as median ± standard deviation and categorical variables (e.g. gender, discharge medication (DM), and smoking) do as the rate and frequency. We use independent t test to compare the association between two continuous variables and perform the chi-square test for categorical variables. Statistical significance is set to p < 0.05. We implemented all the statistical analyses and data processing in data sets using SPSS 18 for Windows (SPSS Inc., Chicago, Illinois).³³ It is also developed in an open source web application of Jupyter Notebook in which we can use the H₂O package³⁴ for the machine learning algorithm and Python language (version 3.6).

Performance measures

We apply the test data sets to evaluate the accuracy of the mortality prediction model in patients with ACS. We will describe the prediction results as a table and the AUC. The performance measures of the machine learning–based mortality prediction model and regression-based model (GRACE) will be compared as a table including AUC, precision, recall, accuracy, and F-score between real class and predicted class.

Results

In this chapter, we implemented a machine learning–based mortality prediction model in patients with ACS during the 1-year follow-up tracking after hospital discharge. Before the evaluation of prediction models, the baseline characteristics of subjects were analyzed in survival and death groups during the 1-year follow-up after hospital discharge. Then, we compared the top nine primary prognostic factors between the regression-based prediction model, GRACE, and machine learning–based models, GBM, DNN, GLM, and RF, as well as the performances of their mortality prediction models after hospital discharge in accordance with AUC, precision, recall, accuracy, and F-score.

Baseline characteristics

In this article, we selected 8227 experimental subjects with survivals of 7832 and deaths of 395 after hospital discharge and excluded 5923 people who failed at the 1-year clinical follow-up and in-hospital deaths of 735 subjects from the population of 14,885 with ACS. The subjects were then subdivided into two groups such as survival (alive) and death, and their baseline characteristics were summarized in Table 3. The average age of the subjects is 62.19 ± 12.54 years, and the difference between the survival group (61.67 ± 12.40) and death group (72.43 ± 10.77) was around 10 years and highly significant (p < 0.05). Of the total subjects, 28.5 percent were women; among them, the female ratio in survival and death groups was 28.1 and 37.5 percent, respectively, and was higher in the death group. In the clinical findings, the values of attribute “LV ejection fraction” were 52.44 ± 11.97 percent in the survival group and 43.13 ± 12.67 percent in the death group. The “LV ejection fraction” in the death group was lower than that in the survival group, and the difference was around 9 percent. Creatinine was 1.14 ± 1.17 ng/mL in the survival group and 1.74 ± 2.00 ng/mL in the death group. Creatinine in the death group was 0.6 ng/mL higher than that in the survival group and significantly high. N-terminal of the prohormone brain natriuretic peptide (NT-proBNP) was 2080 ± 5197 pg/mL in the survival group and 9647 ± 11,440 pg/mL in the death group, and the difference was four times higher in the death group. In “Killip class Ⅲ” subjects were 7.0 percent in the survival group and 27.3 percent in the death group, and thus, the difference was four times higher in the death group. In “Killip class Ⅳ,” subjects were 3.1 percent in the survival group and 10.4 percent in the death group, and the difference was three times higher in the death group. In electrocardiogram (ECG) admission, it was statistically significant in “ST segment elevation,” “ST segment depression,” “right bundle branch blocking (RBBB),” and “left bundle branch blocking (LBBB).” In RBBB, subjects were 2.2 percent in the survival group and 4.65 percent in the death group, and the difference was twice higher in the death group. In LBBB, subjects were 0.8 percent in the survival group and 3.0 percent in the death group, and the difference was three times higher in the death group.

Table 3.

Baseline characteristics of subjects after hospital discharge.

Variable	Descriptive statistics
Variable	All	Alive (N = 7832)	Death (N = 395)	p value
Demographic characteristics
Age (years)	62.19 ± 12.54	61.67 ± 12.40	72.43 ± 10.77	<0.001*
Female (%)	28.5 (2348)	28.1 (2200)	37.5 (148)	<0.001*
BMI (kg/m²)	24.01 ± 3.19	24.08 ± 3.17	22.51 ± 3.14	<0.001*
WHR	0.942 ± 0.063	0.942 ± 0.063	0.945 ± 0.066	0.378
Clinical findings
Systolic blood pressure (mmHg)	129.89 ± 27.99	129.89 ± 27.75	129.94 ± 32.40	0.974
Diastolic blood pressure (mmHg)	79.17 ± 16.59	79.26 ± 16.45	77.37 ± 19.01	0.056
Heart rate (bpm)	77.35 ± 19.21	76.86 ± 18.85	87.19 ± 23.26	<0.001*
Pain, yes	82.6% (6797)	83.4% (6470)	67.6% (267)	<0.001*
Dyspnea, yes	24.0% (1978)	23.1% (1807)	43.3% (171)	<0.001*
Medical history
Previous angina before MI symptom, yes	44.9% (3848)	45.5% (3562)	39.0% (286)	0.001
Family history of heart disease, yes	6.9% (592)	7.3% (570)	3.0% (22)	<0.001*
History of ischemic heart disease, yes	15.5% (1331)	14.9% (1169)	22.1% (162)	<0.001*
History of dyslipidemia, yes	9.9% (848)	10.4% (815)	4.5% (33)	<0.001*
History of hypertension, yes	47.8% (4096)	47.1% (3686)	55.9% (410)	<0.001*
History of diabetes mellitus, yes	27.4% (2349)	26.5% (2076)	37.2% (273)	<0.001*
History of smoking (current), yes	42.2% (3616)	43.9% (3439)	24.1% (177)	<0.001*
Laboratory findings
LV ejection fraction (%)	52.027 ± 12.16	52.44 ± 11.97	43.13 ± 12.67	<0.001*
Glucose (mg/dL)	167.61 ± 77.14	166.14 ± 75.57	196.50 ± 99.05	<0.001*
Creatinine (ng/dL)	1.17 ± 1.23	1.14 ± 1.17	1.74 ± 2.00	<0.001*
Maximum CK (mg/dL)	1377.01 ± 1.87	1385.43 ± 1.89	1200.37 ± 1.48	0.029
Maximum CK-MB (mg/dL)	138.8 ± 300.4	140.64 ± 305.8	104.33 ± 154.1	<0.001*
Maximum Troponin I (ng/mL)	45.90 ± 137.7	45.53 ± 137.49	52.77 ± 142.96	0.343
Maximum Troponin T (ng/mL)	5.31 ± 19.79	5.30 ± 20.24	5.52 ± 8.98	0.885
Total cholesterol (mg/dL)	182.83 ± 44.19	183.31 ± 43.69	173.01 ± 52.49	<0.001*
Triglyceride (mg/dL)	127.07 ± 98.80	128.14 ± 100.1	104.59 ± 60.77	0.177
HDL cholesterol (mg/dL)	45.26 ± 19.17	45.33 ± 19.43	43.91 ± 12.44	<0.001*
LDL cholesterol (mg/dL)	116.73 ± 38.93	117.10 ± 38.71	108.85 ± 42.61	0.067
HsCRP (mg/dL)	10.83 ± 48.22	10.59 ± 48.81	15.68 ± 33.69	<0.001*
NT-proBNP (pg/mL)	2455 ± 5902	2080 ± 5197	9649 ± 11440	0.014
BNP (pg/mL)	681.33 ± 2.93	623.99 ± 2.84	2176.2 ± 4.49	0.014
Killip classification
ClassⅠ	73.5% (6047)	74.9% (5870)	44.8% (177)	<0.001*
ClassⅡ	12.0% (988)	11.8 (924)	16.2% (64)	0.009
ClassⅢ	8.0% (660)	7.0% (552)	27.3% (108)	<0.001*
Class Ⅳ	3.5% (285)	3.1% (244)	10.4% (41)	<0.001*
ECG on admission
ST segment elevation, yes	56.5% (4648)	56.8% (4450)	50.1% (198)	0.009
ST segment depression, yes	17.2% (1413)	16.7% (1309)	26.3% (104)	<0.001*
RBBB, yes	2.3% (193)	2.2% (175)	4.65% (18)	0.005
LBBB, yes	0.9% (73)	0.8% (61)	3.0%(12)	<0.001*

BMI: body mass index; WHR: waist-to-hip ratio; LV: left ventricular; CK-MB: creatine kinase-muscle brain; NT-proBNP: N-terminal of the prohormone brain natriuretic peptide; ECG: electrocardiogram; RBBB: right bundle branch blocking; LBBB: left bundle branch blocking; MI: myocardial infarction; HDL: high-density lipoprotein; LDL: low-density lipoprotein; hsCRP: hsCRP: high-sensitivity C-reactive protein.

Note. The asterisk (*) with p-value < 0.001 indicates that the variable difference between alive group and death group is statistically significant.

Table 4 described the medication characteristics of the subjects after hospital discharge. The survival rate was high in patients who had prescribed medicines, such as aspirin, angiotensin-converting enzyme (ACE) inhibitor, clopidogrel, statin, and nitrate, after hospital discharge, while the death rate was high in patients who had prescribed medicines such as diuretics, digoxin, amiodarone, and spironolactone.

Table 4.

Discharge medication characteristics of all participants.

Variable	Descriptive statistics
Variable	All	Alive (N = 7832)	Death (N = 395)	p value
Discharge medication
Aspirin	95.8% (7879)	96.5% (7557)	81.5% (322)	<0.001*
Angiotensin II receptor blocker	15.4% (1271)	15.4% (1204)	17.0% (67)	0.433
ACE inhibitor	66.1% (5441)	66.9% (5237)	51.6% (204)	<0.001*
Beta-blocker	71.9% (5918)	72.1% (5651)	67.8% (267)	0.066
Ca-channel blocker	12.1% (997)	12.2% (958)	9.9% (39)	0.179
Cilostazol	32.1% (2639)	32.5% (2546)	23.5% (93)	<0.001*
Clopidogrel	90.8% (7468)	91.5% (7167)	76.2% (301)	<0.001*
Diuretics	21.9% (1805)	20.9% (1636)	42.8% (169)	<0.001*
Digoxin	2.1% (174)	1.9% (151)	5.8% (23)	<0.001*
Fibrate	0.4% (32)	0.4% (32)	0.0% (0)	0.272
Nicorandil	18.7% (1542)	18.8% (1471)	18.0% (71)	0.693
Oral anticoagulant	3.1% (259)	3.2% (248)	2.8% (11)	0.769
Statin	73.1% (5776)	73.7% (5776)	59.7% (236)	<0.001*
Oral hypoglycemic agent	15.0% (1232)	15.0% (1173)	14.9% (59)	1
Insulin	2.2% (185)	2.2% (171)	3.5% (14)	0.081
Vytorin	4.1% (336)	4.2% (327)	2.3% (9)	0.067
Nitrate	48.4% (3986)	49% (3836)	38.0% (150)	<0.001*
Spironolactone	7.9% (651)	7.5% (586)	16.5% (65)	<0.001*

ACE: angiotensin-converting enzyme.

The angiographic characteristics of the subjects after hospital discharge were described in Table 5. In the coronary angiographic findings, attribute “coronary angiography was not performed” was 2.9 percent in the survival group and 21.0 percent in the death group, and the latter was seven times as high as the former. In the angiographic findings, the survival rate in patients with one vessel was twice higher than that in the death group and statistically significant. In case of attribute “LV ejection fraction,” it was under 35 % the value in the death group was three times high, while in case of over 50 percent, the value in the survival group was three times high. In attribute “PCI stent types with Taxus and Cypher,” the value in the survival group was significantly high.

Table 5.

Angiographic characteristics of the subjects after hospital discharge.

Characteristics	Descriptive statistics
Characteristics	All	Alive (N = 7832)	Death (N = 395)	p value
Coronary angiographic findings
Significant stenosis	90.7% (7462)	91.4 (7158)	77.% (304)	<0.001*
Coronary angiography was not performed	3.8% (309)	2.9% (226)	21.0% (83)	<0.001*
No significant stenosis	4.7% (389)	4.9% (382)	1.8% (7)	0.005
Angiographic findings
One vessel	38.9% (3203)	39.9% (3122)	20.5% (81)	<0.001*
Two vessels	27.0% (2219)	27.1% (2125)	23.8% (94)	0.147
Three vessels	21.1% (1737)	20.8% (1630)	27.1% (107)	0.003
Left main, isolated	0.4% (32)	0.4% (30)	0.5% (2)	1
Left main, complex	2.6% (214)	2.55 (196)	4.6% (18)	.016
Lesion type
Type A	4.1% (335)	4.1% (325)	2.5% (10)	0.118
Type B1	15.0% (1235)	15.3% (1196)	9.9% (39)	0.004
Type B2	23.8% (1956)	23.9% (1873)	21.0% (83)	0.203
Type C	42.4% (3486)	42.5% (3328)	40.0% (158)	0.348
LV ejection fraction
<35%	2.0% (165)	1.8% (143)	5.6% (22)	<0.001*
>50%	10.7% (882)	11.1% (867)	3.8% (15)	<0.001*
PCI with stent	79.4% (6535)	80.6% (6313)	56.2% (222)	<0.001*
Target lesion stent type
BMS	7.4% (610)	7.3% (568)	10.6% (42)	0.014
Taxus	26.2% (2157)	26.6% (2083)	18.7% (74)	0.001
Cypher	34.7% (2856)	35.6% (2785)	18.0% (71)	<0.001*
Endeavor	6.0% (494)	6.0% (468)	6.6% (26)	0.664
Other DESs	4.3% (352)	4.4% (343)	2.3% (9)	0.054
PCI	84.5% (6951)	85.5% (6698)	64.1% (253)	<0.001*
PCI result
Successful	81.0% (6663)	82.1% (6427)	59.7% (236)	<0.001*
Suboptimal	1.5% (127)	1.5% (120)	1.8% (7)	0.834
PCI was not preferred	6.0% (494)	5.7% (444)	12.7% (50)	<0.001*
Failed PCI	0.9% (77)	0.9% (68)	2.3% (9)	0.011

LV: left ventricular; PCI: percutaneous coronary intervention; BMS: bare metal stent; DESs: drug-eluting stents.

Variable significance in mortality prediction model after hospital discharge

The significance of all variables in the prediction model was calculated as a percentage. The significance degree of the variables ranged from 0 to 1, where 1 for the most significance (100%) and 0 for the least significance (0%). Table 6 described the top nine primary prognostic factors that each prediction model needs to predict the mortality during the 1-year clinical follow-ups in ACS patients. The primary prognostic factors in the prediction models were very different, depending upon the applied model such as DNN, GBM, GLM, RF, and GRACE. For example, variable “age >76” played an important role in mortality prediction models such as RF, GBM, and GLM, and an elder age had a big impact on the death rate. Next, variable “age ranging from 66 to 75” had also a big impact on mortality prediction models. So, we divided the age into six groups accordingly. In addition, variables “coronary angiogram was not performed in angiographic findings,” “diuretics,” “LV ejection fraction,” “aspirin discharge medication,” “creatinine,” and “Killip class Ⅲ” had an important impact on mortality prediction models using RF and GBM, and among them, variables “coronary angiogram was not performed in angiographic findings,” “diuretics,” and “Killip class Ⅲ” yielded higher power in the death group than in the survival group (Tables 3 to 5). It was certain that the higher the level of creatinine, the higher the death rate, while the lower the level of variable “LV ejection fraction,” the higher the death rate. In previous works, the variables age, creatinine, and Killip class were significantly important in the machine learning algorithms. However, note that the importance of variables in DNN was very different from that in machine learning models (RF, GBM, and GLM), as shown in Table 6.

Table 6.

Descending ranks of the top nine primary prognostic factors during the 1-year clinical follow-up after hospital discharge.

Ranking	Regression models	Machine learning algorithms
Ranking	GRACE	RF	GBM	GLM	DNN
1	Age 80–89 years	Age >76 years	CA_wasnotperformed	PCI with stent	Pain
2	Age 70–79 years	CA_wasnotperformed	Age >76 years	BMI >25 kg/m²	Very high LDL-C
3	Killip III	Diuretics-MT	LV-ef >40%	BMI 18.5–22.9 kg/m²	Mucomyst-MT
4	HR <70 bpm	LV-ef >40%	Normal_Creatinine	PCI_performed_In_TH_wasnotperformed	Malignant-neoplasm
5	Killip IV	Aspirin-DM	Aspirin-DM	BMI 23–24.9 kg/m²	Borderline_high_LDL cholesterol
6	Killip II	Abnormal_Creatinine	Diuretics-M	Glucose >140 mg/dL	Not indicated_In_TH_wasnotperfromed
7	Creatinine > 2.0 ng/ dL	Normal_Creatinine	Supportive treatment	Cypher stent type	Early invasive therapy
8	Killip I	LV-ef <40%	Killip III	Age >76 years	Anterior (ischemia location)
9	Age <30 years	Killip III	Abnormal_Creatinine	Glucose 70–140 mg/dL	Fibrate-DM

RF: random forest; GBM: gradient boosting machine; GLM: generalized linear model; DNN: deep neural network; PCI: percutaneous coronary intervention; BMI: body mass index; LDL: low-density lipoprotein; HR: heart rate; DM: discharge medication; TH: thrombolysis; MT: medical therapy.

Discussion

In this article, we employed four machine learning algorithms such as RF, GBM, GLM, and DNN in the mortality-based prediction model during the 1-year clinical follow-up tracking in patients with ACS, and then, their performances were compared with GRACE Risk Score 2.0.^11–14 Normally, machine learning models are evaluated based on different performance measures such as AUC, precision, recall, accuracy, and F-score.³⁵ Using the test data set, we compared the performances in the mortality prediction models in accordance with AUC, precision, recall, accuracy, and F-score, as described in Table 7. In the experimental results, the AUC values were GBM (0.898), DNN (0.898), RF (0.883), GLM (0.873), and GRACE (0.810), and AUCs in models GBM and DNN were the same. The GBM model had the AUC of 0.898, and its performance was superior to that of other approaches, and the AUC was 0.088 higher than that of GRACE. In both GBM and DNN, we can summarize the results as follows. The results of accuracy were GBM (0.947) and DNN (0.911). The F-score showed GBM (0.972) and DNN (0.951), and thus, GBM outperformed DNN. Compared with prediction model GRACE Risk Score 2.0, the AUC in applied machine learning algorithms averagely improved up to 0.08 higher than that in GRACE. Consequently, it was clear that the machine learning–based models outperformed regression-based mortality prediction model GRACE. In addition, we found major prognostic factors in machine learning–based prediction models were mainly “age >76 years,” “coronary angiogram was not performed in angiographic findings,” “diuretics,” “Killip class III,” “creatinine,” and “LV ejection fraction.”

Table 7.

Comparison of the performance in mortality prediction models during the 1-year clinical follow-up tracking after hospital discharge.

Algorithms	AUC	Precision	Recall	Accuracy	F-score
DNN	0.898	0.977	0.927	0.911	0.951
GBM	0.898	0.967	0.977	0.947	0.972
GLM	0.873	0.972	0.949	0.926	0.960
Random forest	0.883	0.976	0.954	0.935	0.965
GRACE	0.810	0.970	0.922	0.900	0.946

AUC: area under the receiver operating characteristic curve; GBM: gradient boosting machine; GLM: generalized linear model; DNN: deep neural network.

Note. The boldface value denotes the highest value in different performance measures.

Figure 3 showed the ROC curves in the mortality prediction models in patients with ACS during the 1-year clinical follow-up after hospital discharge. The AUC values were in the decreasing order of DNN, GBM, RF, GLM, and GRACE. Overall, GBM was superior to other approaches in the AUC, recall, accuracy, and F-score, as described in Table 7 and Figure 3.

Figure 3.

The ROC curves in mortality prediction models during the 1-year clinical follow-up tracking after hospital discharge.

Conclusion

This article proposed a mortality prediction model using machine learning algorithms, including DNN, GBM, GLM, and RF, during 1-year clinical follow-up after hospital discharge in Korean patients with ACS. Finally, we can summarize our main contributions as follows. First, this article led to the development of a machine learning–based 1-year mortality prediction model for clinical patients with ACS. Second, this model could forecast the occurrences of mortality during the 1-year clinical follow-up after hospital discharge in Korean patients with ACS because they did well reflect the Korean’s demographic characteristics. Third, it was shown that the performances in machine learning–based mortality prediction model were superior to GRACE. Finally, it was expected that these results would contribute to develop a future diagnosis and forecast tool of the occurrences of MACEs in clinical ACS patients.

Finally, there were some potential limitations on our research. First, we used only 8227 experimental subjects, and thus, our data set was insufficient because machine learning algorithms need to employ a large scale of data set in the experiment. Second, our proposed model was also limited in diagnosing and forecasting the mortality in Korean patients with ACS. Third, it was difficult to explain the prediction result in machine learning–based approaches because GBM, RF, and DNN were non-linear models, whereas, in the regression-based prediction models, it was easy to explain that major prognostic factors were associated with the mortality in patients with ACS because they were based on statistical analysis. Finally, there was a limit on checking up the mortality prediction during the short clinical follow-up period of 1 year after hospital discharge in our experimental data set.

Footnotes

The authors would like to thank the Korea Acute Myocardial Infarction Registry (KAMIR),a nationwide,multicenter data collection registry,for providing them multicenter data for their experiments.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2017R1D1A1A02018718).

ORCID iD

Jong Yun Lee

References

World Health Organization. The top 10 causes of death, https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death (2018, accessed 24 May 2018).

Thygesen

Alpert

White

HD.

Third universal definition of myocardial infarction. J Am Coll Cardiol 2012; 126(16): 2020–2035.

Saqlain

Hussain

Saqib

, et al. Identification of heart failure by using unstructured data of cardiac patients. In: International conference on parallel processing workshops, 2016, pp. 426–431, https://ieeexplore.ieee.org/document/7576495

Hamm

Bassand

J-P

Agewall

, et al. ESC guidelines for the management of acute coronary syndromes in patients presenting without persistent ST-segment elevation. Eur Heart J 2011; 32: 2999–3054.

Mahmood

Levy

Vasan

, et al. The Framingham Heart Study and the epidemiology of cardiovascular disease: a historical perspective. Lancet 2014; 383(9921): 999–1008.

D’Agostino

Sr Vasan

Pencina

, et al. General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation 2008; 117(6): 743–753.

Ferket

van Kempen

BJH

Hunink

MGM

, et al. Predictive value of updating Framingham risk scores with novel risk markers in the US general population. PLoS ONE 2014; 9(2): e88312.

Brindle

Emberson

Lampe

, et al. Predictive accuracy of the Framingham coronary risk score in British men: prospective cohort study. BMJ 2003; 327(7426): 1267–1266.

Hippisley-Cox

Coupland

Vinogradova

, et al. Predicting cardiovascular risk in England and Wales: prospective derivation and validation of QRISK2. BMJ 2008; 336(7659): 1475–1482.

10.

Hippisley-Cox

Coupland

Brindle

Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study. BMJ 2017; 357: j2099.

11.

Fox

KAA

Dabbous

Goldberg

, et al. Prediction of risk of death and myocardial infarction in the six months after presentation with acute coronary syndrome: prospective multinational observational study (GRACE). BMJ 2006; 333(7578): 1091–1096.

12.

Elbarouni

Goodman

Yan

, et al. Validation of the Global Registry of Acute Coronary Event (GRACE) risk score for in-hospital mortality in patients with acute coronary syndrome in Canada. Am Heart J 2009; 158(3): 392–399.

13.

Huang

FitzGerald

Goldberg

, et al. Performance of the GRACE Risk Score 2.0 simplified algorithm for predicting 1-year death after hospitalization for an acute coronary syndrome in a contemporary multiracial cohort. Am J Cardiol 2016; 118(8): 1105–1110.

14.

Liu

Wan

Zhao

, et al. Adjustment of the GRACE score by HemoglobinA1c enables a more accurate prediction of long-term major adverse cardiac events in acute coronary syndrome without diabetes undergoing percutaneous coronary intervention. Cardiovasc Diabetol 2015; 14: 110–2840.

15.

Breiman

Random forests. Mach Learn 2001; 45(1): 5–32.

16.

Mokashi

Tambe

Walke

PT.

Heart disease prediction using ANN and improved K-Means. International J Innovat Res Elect Electr Instrument Contr Eng 2016; 4(4): 221–224.

17.

Subha

Revathi

Murugan

Comparative analysis of support vector machine ensembles for heart disease prediction. Int J Comp Sci Comm Networks 2015; 5(6): 386–390.

18.

Ahmed

Hannan

SA.

Data mining techniques to find out heart diseases: an overview. Int J Innovat Tech Explor Eng 2012; 1(4): 18–23.

19.

Ridker

Danielson

Fonseca

FAH

, et al. Rosuvastatin to prevent vascular events in men and women with elevated C-reactive protein. N Engl J Med 2008; 359(21): 2195–2207.

20.

Wolterink

Leiner

Viergever

, et al. Dilated convolutional neural networks for cardiovascular MR segmentation in congenital heart disease. In: RAMBO 2016, HVSMR 2016: reconstruction, segmentation, and analysis of medical images, 2016, pp. 95–102, https://link.springer.com/chapter/10.1007/978-3-319-52280-7_9

21.

Gul

Celik

An exhaustive review and analysis on applications of statistical forecasting in hospital emergency departments. J Health Syst. Epub ahead of print 19 November 2018. DOI: 10.1080/20476965.2018.1547348.

22.

Ordu

Demir

Tofallis

A comprehensive modelling framework to forecast the demand for all hospital services. Int J Health Plan Manage 2019; 34(2): e1257–e1271.

23.

Juang

Huang

, et al. Application of time series analysis in modelling and forecasting emergency department visits in a medical centre in Southern Taiwan. BMJ Open 2017; 7(11): e018628.

24.

Hachesu

Ahmadi

Alizadeh

, et al. Use of data mining techniques to determine and predict length of stay of cardiac patients. Healthc Inform Res 2013; 19(2): 121–129.

25.

Friedman

JH.

Greedy function approximation: a gradient boosting machine. Ann Statist 2001; 29(5): 1189–1232.

26.

Natekin

Knoll

Gradient boosting machines, a tutorial. Front Neurorobot 2013; 7: 21.

27.

Madsen

Thyregod

Introduction to general and generalized linear models. New York: CRC Press, 2010.

28.

Jiang

Chin

Tsui

KL.

A universal deep learning approach for modeling the flow of patients under different severities. Comput Methods Programs Biomed 2018; 154: 191–203.

29.

Kim

HK.

Hospital discharge risk score system for the assessment of clinical outcomes in patients with acute myocardial infarction (Korea Acute Myocardial Infarction Registry [KAMIR] Score). Am J Cardiol 2011; 107(7): 965.e1–971.e1.

30.

Mayo Medical Laboratories. https://www.mayomedicallaboratories.com/

31.

Qian

Ren

Obesity and depressive symptoms among Chinese people aged 45 and over. Sci Rep 2017; 7: 45637.

32.

Moustris

Douros

Nastos

, et al. Priftis, seven-days-ahead forecasting of childhood asthma admissions using artificial neural networks in Athens, Greece. Int J Environ Health Res 2012; 22(2): 93–104.

33.

PASW Statistics. http://www.spss.com.hk/statistics/ (accessed 1 July 2018).

34.

H₂O.ai. https://www.h2o.ai/ (accessed 1 July 2018).

35.