Abstract
Introduction
Cerebral palsy (CP) is the term used for a group of nonprogressive disorders of motor control and posture that occur secondary to damage to the developing brain during the early stages of its development. 1 Predictive models using logistic regression (LR) are widely used in the healthcare, but only recently, they started to be used to predict health-related outcomes in subjects with CP.2–4 This article discusses the potential of using a LR-based predictive model (hereby named PredictMed) for health outcomes and presents examples of the application of the model. PredictMed is an algorithm based on LRs implemented in R programming language.2–4 It has been developed 1 and validated 2 to predict scoliosis development and to identify factors associated with autistic features, 4 intellectual disabilities, and impaired adaptive functioning in subjects with CP. 3
The purpose of this article was to explain the overall procedures used and highlight the versatility of the PredictMed model using gastrostomy as an additional example
Methods
Ethical statements
All procedures were performed in accordance with the ethical standards of the institutional research committee and with the Declaration of Helsinki 1975, revised Hong Kong 1989 and its later amendments. All participants and/or parents consented to participate. The data were anonymized and analyzed according to the requirements of French Reference Method 003. Ethics committee approval and informed consent was registered with number “2017728 v 0-MR003 (Reference Method 003).”
Design
Linear regression and LR are among the first machine learning (ML) algorithms 5 used to implement predictive models in healthcare. The number of parameters (hereby named features or independent variables) should not be too large but should contain enough information to predict the output. We deal with the Feature Reduction problem in two steps: first, by using Fisher’s 6 exact test to identify parameters (i.e. the LR independent variables) associated with the binary dependent variable (e.g. need/not need for gastrostomy) and second, by testing and ranking the predictive performance of each of all possible subsets (hereby named tuples) of the parameters found in the first step. The PredictMed algorithm takes as input a data matrix having patients as rows and parameters (variables) as columns. The first column contains the values of the binary dependent variable while other columns those of independent variables (Table 1).
Binary dependent variable “gastrostomy placement” versus independent variables.
NS: neuromuscular scoliosis; TT: truncal tone; ET: etiology; SP: spasticity; DY: dystonia; EP: epilepsy; SE: gender; MACS: Manual Ability Classification System; GMFCS: Gross Motor Function Classification System; EDACS: Eating and Drinking Ability Classification System.
Gastrostomy placement: 0 = no gastrostomy; 1 = presence of gastrostomy. Neuromuscular scoliosis: 0 = no; 1 = presence of scoliosis. Truncal tone: 0 = normal; 1 = hypertonic; 2 = hypotonic. Etiology: 1 = antenatal; 2 = perinatal; 3 = postnatal. Spasticity: 0 = no; 1 = hemiplegia, 2 = diplegia; 3 = tri/quadriplegia. Dystonia: 0 = no;1 = presence of dystonia. Epilepsy: 0 = no;1 = presence of epilepsy, 2 = intractable epilepsy. Gender: 1 = male; 0 = female. GMFCS 5–1: 5 = heavy motor impairments; 1 = no impairments. MACS 5–1: 5 = heavy manual impairments; 1 = no impairments. EDACS 5–1: 5 = heavy-eating disability; 1 = no disability.
If we have N independent variables, we can form 2N – 1 different tuples of independent variables. For example, if the independent variables were A, B, and C, we would have 23 – 1 = 7 tuples: ABC, AB, AC, BC, A, B, C. After splitting the data in a training and test set, PredictMed provides the following outputs for each tuple of the 2N – 1:
A classification using LR, predicting the outcome of the dependent variable based on the probability of this outcome (if the probability of positive outcome is above a threshold, then we predict the outcome as positive: we try different thresholds in a range from 0.1 to 0.9).
Accuracy, sensitivity, and specificity of the prediction (calculated on the test set)
Ranking of the odds ratio and p-value significance of each independent variable of the tuple in influencing the outcome of the dependent variable.
Based on steps 1 and 2, we can find out the tuple with the best predictive performance in predicting the outcome of the dependent variable. Since this tuple is a subset of the initial set of all the independent variables (features), in doing this, we perform a Feature Reduction.
Data sources and study population
Data were collected between June 2005 and June 2015 for a multinational, double-blinded, cross-sectional descriptive study on the need of gastrostomy placement in children with CP. Narrative notes were coded and entered into the electronic database PredictMed.2–4 Assessments and data collection for the implementation of the model were conducted in the last 6 months of 2014, and data analysis began in December 2015 and lasted 12 months. The fields included patient demographics, the results of diagnostic procedures, and functional and neurological assessments following the “European Society for Pediatric Gastroenterology, Hepatology and Nutrition Guidelines for the Evaluation and Treatment of Gastrointestinal and Nutritional Complications in Children with Neurological Impairment.” 7 The study followed the guidelines of the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement. 8
Nine hundred twenty-six potentially eligible patients with severe developmental disabilities and CP were screened (Figure 1). Patients were included in the analysis if they were between 12 and 18 years old; had CP9–11 diagnosed by a pediatric neurologist; had spastic, dystonic, mixed spastic/dystonic, or hypotonic CP evaluated and classified using the “Surveillance of Cerebral Palsy in Europe” system; 11 and were followed for at least 3 years after initial diagnosis. Patients with progressive encephalopathy or spinal neuropathology, well-developed gross motor functional capacities (defined by Gross Motor Function Measure-88 > 70), 12 and good manual abilities (Manual Ability Classification System (MACS) < 2) 12 were not included. Our study included a total of 130 adolescents (72 male, 58 female). The mean age was 16 years 6 months (standard deviation (SD) 1.8). Mean follow-up time was 5 years (range: 3–12 years) (Table 2).

Flow diagram of study participants for analysis.
Clinical presentation according to the presence and absence of self-feeding skills.
SD: standard deviation.
No self-feeding skills include oral feeding by others and gastrostomy.
Patients’ nutritional and respiratory status
Patients’ nutritional status was assessed by pediatricians based on weight, height, body mass index (BMI in kg/m2), and skinfold thickness. Patients were classified as undernourished if they had a falling weight centile, weight loss, a discrepancy between weight and measured or calculated height centile BMI under the 3rd centile, or BMI < 18 kg/m2. Dysphagia was defined as difficulty in swallowing manifested as coughing, spluttering, or choking during eating⁄ drinking; aversion to eating; or excessive time to eat a meal (longer than 30 min). 9
Gastroesophageal reflux (GER) was defined as the passage of gastric contents into the esophagus with or without regurgitation and vomiting. Gastroesophageal reflux disease (GERD) was defined as a complication of GER, like failure to thrive, Sandifer syndrome, or frequent nighttime waking. Respiratory disease was defined as two or more episodes of respiratory infection requiring antibiotic therapy and/or admission to hospital for treatment of pneumonia or recurrent wheezing requiring bronchodilators. 9
Feeding abilities
Assessment of eating and drinking performance was made using the Eating and Drinking Ability Classification System for Individuals with Cerebral Palsy (EDACS) 13 (Table 3).
List of assessment tools, skills evaluated, and function levels of ratings.
Feeding autonomy was classified by a pediatrician as children who had “some self-feeding skills” (mild or no feeding disorders) and those who were “fed by others,” or who needed gastrostomy/laparoscopic antireflux surgery (LARS) (with severe feeding disorders). The indications for gastrostomy placement included malnutrition or failure to thrive, as well as oropharyngeal dysphagia. 14 Types of gastrostomy included surgical gastrostomy (SG) using the Stamm technique, percutaneous endoscopic gastrostomy (PEG), 15 and laparoscopic-assisted PEG. LARS, like Nissen fundoplication, was the treatment option for children with GERD when medical therapy failed and GER was confirmed by esophageal pH Monitoring. 16
Etiology
Developmental disability (DD) and CP etiology were classified as pre-natal (genetic, cerebral malformation, infection, or vascular), peri-natal (anoxic-ischemic or infectious) or postnatal (cranial trauma, infectious, epilepsy, or post-natal anoxic/ ischemic injury).
Motor status
All patients were assessed for gross motor function using the gross motor function classification system (GMFCS), 17 manual abilities by the MACS 18 (Table 3) following the recommendation of the “Surveillance of Cerebral Palsy in Europe” proposal. 19 They use a 5-level classification system (from I to V) and higher scores indicate lower levels of motor functioning.
Epilepsy
Presence of epilepsy was determined by a pediatric neurologist and identified as “well controlled” or “intractable.” We followed the definition of the International League against Epilepsy, which defines intractable epilepsy as patients who follow treatment with at least 2 appropriate antiepileptic drugs treated at a comprehensive epilepsy center during long-term follow-up. 20
Neurological status
Neurological status was classified according to the anatomy of spastic disorder (hemiplegia, diplegia, and tri/quadriplegia) and the presence of dystonia. Spasticity was quantified using the Bohannon and Smith modified Ashworth Scale and/or the Modified Tardieu Scale. 21 Trunk muscles’ tone was also assessed as hypotonic, spastic, or normal. The presence of scoliosis was determined by the presence of a spinal radiograph with a Cobb angle >10°, and it was labeled as “severe” if the Cobb angle was >40°.22,23
Prediction model
The algorithm consists of a 5-step-process (Figure 2). Steps from 1 to 4 involve Feature Selection based on Fisher’s test and a brute-force algorithm using LR. Step 5 involves the calculation of the outcome probability for new patients using the selected features as independent variables.

Logistic regression algorithm flowchart.
First step: finding potential predictors
The first statistical analysis was performed with “OpenEpi,” a web-based epidemiological calculator, 24 by creating contingency tables and using Fisher exact test to identify factors associated with feeding disorders causing the need for a gastrostomy. Confidence intervals and distribution frequencies were calculated for the binary dependent variable “need for gastrostomy (yes/no)” versus each independent variable. Statistical significance was set at 95% confidence interval. A total of 10 independent variables (potential predictors hereby named features) were associated with feeding disorders: etiology (ET), spasticity (SP), dystonia (D), epilepsy (E), gender (SE), neuromuscular scoliosis (NS), truncal tone disorders (TT), GMFCS score (GMFCS), MACS score (M), and EDACS score (ED).
Second step: generating all possible subsets (tuples) of the potential predictors
All the possible subsets (hereby named tuples) of the 10 parameters (features) found in step 1 were studied. For a set of n elements, the number of all the possible subsets of m (with m < n) elements of the set is 24
The total number of tuples that can be generated with n elements and every value of m between 1 and n is
For n = 10 parameters, we have a total of C(10) = 210 – 1 = 1023 tuples (Table 4). Each parameter in a tuple is an independent variable for the LR to predict the outcome of the dependent binary variable (need/not need for gastrostomy).
Values generated using tuple size m (⩽10) and corresponding number C(m,10) of m-size subsets.
Since we have n = 10 and m = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, the total of possible combinations (tuples) is 1023.
Third step
For each tuple, perform a LR and assess the tuple performance in predicting the outcome. For each tuple of the 1023, an LR was run to predict the outcome (need/not need for gastrostomy). The patients were randomly split in a “training set” to train the LR model and a “test set” to test the performance of the model. Model performance on each tuple was evaluated based on the accuracy, sensitivity, and specificity 25 of the predictions. The open source software “R,”2–4 implementing the general linear model, “glm()” was used to predict the probability that each subject would have the outcome (need/not need for gastrostomy). If the probability was above a threshold, the subject was classified as “will need gastrostomy placement.”
Sensitivity, specificity, and accuracy were described in terms of true positives (TPs), true negatives (TNs), false negatives (FNs), and false positives (FPs). 25
Sensitivity was defined as the proportion of actual positives and identified as such (TP/(TP + FN)). Specificity was defined as the proportion of actual negatives and identified as such (TN/(TN + FP)). Accuracy was defined as the proportion of TP and TN in all assessments and identified as (TP + TN)/(TP + TN + FP + FN)
To minimize dependency from the composition of training and test sets, we used cross validation—a technique for assessing how the results of a statistical analysis will generalize to an independent data set. By splitting the original data set, we randomly generated 20 different couples of training and test sets, and for each couple, we calculated the accuracy, sensitivity, and specificity of the predictions with different thresholds, and finally, we averaged (on the 20 rounds), so getting (for each threshold) the average accuracy, sensitivity, and specificity.
Fourth step
Finding which tuple best predicts the outcome. The 1023 tuples were so tested to find the one with the best predicting performance in terms of accuracy, sensitivity, and specificity. When predicting the need for gastrostomy, the best tuple (with a probability threshold of 0.5) was composed of 7 parameters: NS, TT, SP, D, EP, GMFCS, and ED.
Table 5 shows that the probability threshold of 0.5 was the best to be used in conjunction with this 7-factor tuple to achieve the highest accuracy, sensitivity, and specificity (each one calculated as the average of the results on 20 different test sets).
Probability thresholds in conjunction with logistic regression independent variables.
NS: neuromuscular scoliosis; TT: truncal tone disorders; SP: spasticity; DY: dystonia; EP: epilepsy; ED: EDACS score; EDACS: Eating and Drinking Ability Classification System.
Table 6 shows as among the 7-elements tuples (120 out of 1023), the tuple with threshold = 0.5 showed the best overall performance in terms of accuracy, specificity, and sensitivity.
Overall performances in terms of accuracy, specificity, and sensitivity.
NS: neuromuscular scoliosis; TT: truncal tone disorders; ET: etiology; SP: spasticity; DY: dystonia; EP: epilepsy; ED: EDACS score; EDACS: Eating and Drinking Ability Classification System; SU: previous surgeries.
Moreover, we make sure that this 7-element tuple performs better even if compared with all other 1023 tuples of 1, 2, 3, . . .10 elements. The choice of the best tuple to be used in LR is not just based on the best predictive performance. The tuple itself must also make sense from a clinical point of view in accordance with the aim of the research.
The best tuple in predicting the outcome of our binary dependent variable (need for gastrostomy: yes/no) had the following 7 parameters: NS, TT, SP, D, EP, GMFCS, and ED (Table 7).
List of the predictive logistic regression coefficients (independent variables) for the need of gastrostomy.
GMFCS: Gross Motor Function Classification System; EDACS: Eating and Drinking Ability Classification System.
Logistic regression: the increasing of EP, G, and ED (positive values) favors the need of gastrostomy (“Estimate” column). For every unit increase in EP, the log odds = ln(p/1 − p) increases 2.2939 times (where p = probability to have gastrostomy). In the same way, for every unit increase in SP, the log odds = ln(p/1-p) decreases −0.2692 times. The “Pr(>|z|)” column indicates the significance strength of the respective parameter in terms of p-value as gastrostomy need predictor. The significance of EP, G, and ED in predicting gastrostomy need is very probable, with a p-value < 0.05.
Multiple LR is performed with open source software “R”,2–4 using the general linear model, “glm(),” to predict the probability that each subject would need gastrostomy. If the probability is above the threshold of 0.5 (found in step 4), we classify the subject needing gastrostomy placement.
Fifth step
Use the best tuple to predict the binary outcome for new incoming patients. We use again LR to predict the outcome of our binary dependent variable (need for gastrostomy: yes/no) having as independent variables the 7 parameters found in step 4: NS, TT, SP, D, EP, GMFCS, and ED.
Model highlights
Feature reduction
PredictMed performs a Feature Reduction, using a brute-force algorithm that ranks the predictive performance of all the possible tuples of features. This allows us to find the features (e.g. independent variables) that most influence the outcome of the dependent variable. In the end, this Feature Reduction discovers the best predictors of the outcome of the dependent variable.
Cross correlation
As explained above, for each tuple, PredictMed cross-correlates our regression on multiple training and test sets and then averages accuracy, sensitivity, and specificity.
This means that for each tuple, we randomly re-shuffle the composition of training and test. On a total of 130 patients, 80% of them (104) are in the training set and 20% (26) in the test set. Composition of the training a test set was randomly determined through the R function createDataPartition(). At each re-shuffle, we perform regression, prediction, and computation of accuracy, sensitivity, and specificity. Finally, we compute the average on all re-shuffles (about 20) of accuracy, sensitivity, and specificity. The average values we find are reliable since they do not depend from a composition of training and test set.
Threshold
PredictMed fixes a probability threshold. After predicting the outcome of the dependent variable if the probability of this outcome is above the threshold, we classify the outcome as positive (or vice versa). Our algorithm tries different values of thresholds, finding the values that lead to the best predictive performance in terms of accuracy, sensitivity, and specificity. Scholars can then decide which threshold to choose to maximize accuracy or specificity or sensitivity depending on the disorders we want to predict (see Discussion section).
Results
A breakdown of the clinical presentation according to the presence or absence of feeding disorders and gastrostomy placement is shown on Table 2. Factors associated with gastrostomy placement in multivariate analysis were epilepsy (p .03), poor motor function (p .04), and male gender (p .04). The average score of accuracy, sensitivity, and specificity was 90% (Table 7).
Model development path and its adaptability to multiform neurological diseases in CP
The model has been developed and validated through the previous studies1–4 and first in literature showed the following results:
On 2017, PredictMed highlighted the presence of hip surgery (p .002), intractable epilepsy (p .04), and female gender (p .07) as risks factors of scoliosis.1,2 Average of accuracy, sensitivity, and specificity of the predictive model was 74%.
On 2018, PredictMed identified that factors associated with intellectual disabilities and impaired adaptive functioning 3 were poor manual abilities (p ⩽ .001), gross motor function (p ⩽ .001), and type of epilepsy (intractable: p .04; well controlled: p .01). Best threshold performed an accuracy of 74%, sensitivity of 77%, and specificity of 70%, showing an average of 74%.
On 2019, PredictMed defined type of spasticity (hemiplegia > diplegia > tri/quadriplegia; p .04), communication disorders (p < .001), intellectual disability (p .05), feeding abilities (p .002), and motor function (p.01) as factors associated with autism spectrum disorders. 4 The best prediction model score performed an accuracy of 73%, sensitivity of 79%, and specificity of 72%, showing an average of 75%.
Discussion
Should we choose a tuple based on the higher sensitivity or specificity?
The choice depends on the aim of the research and its consequences. 26 For studies with purpose to screen frequent diseases (e.g. breast cancer or HIV infection), higher sensitivity than specificity is required to limit FN and detect the biggest proportion of potentially sick patients. Conversely, higher specificity would be more useful if we were to carry out a “preventive” treatment without precisely knowing whether the subject actually had the comorbidity or the disease (e.g. GERD) or if we had to perform invasive or expensive investigations to ascertain the disease, for example, esophageal pH monitoring.
For this study, we opted to choose the highest specificity (98%)/accuracy (95%) and a good sensitivity (77%) (Tav.5) to avoid overestimation of likelihood of reflux.
Knowing that gastrostomy will likely be needed can inform patient management and family preparation for the procedure. Knowing about the likely need for the surgery allows for the setup of early dietary and food follow-up to determine the ideal moment for the gastrostomy to take place. The evaluation period of enteral nutrition tolerance with a nasogastric tube may be shortened; therefore, iatrogenic complications could be limited.
Previous cited researches1–4 about comorbidities in patients with CP aimed to identify subjects with higher risk of presenting or developing a specific comorbidity (scoliosis, cognitive impairments, and autistic features). For these patients, we should inform the family and use specific non-invasive diagnostic tools to confirm if the comorbidity is present or not. Therefore, the choice of a specificity–sensitivity–accuracy balance (74% / 75%) seems appropriate to limit both overestimation and underestimation of the risks.
Future developments
LR-based predictive models are widely used in the healthcare field, but just recently, it started to be used to predict health conditions such as comorbidities in children with CP. These models can provide faster and more objective predictions, potentially improving care of patients with CP and reducing healthcare costs and suffering. In our research, we now focus on the study of improvements of our prediction algorithm. At present, the LR algorithm can deal only with binary dependent variables.
Our next step is to modify it to perform also multi-class classification to work with dependent variables having more than two values as multiform types of epilepsies (in development). We also plan to investigate different methods for features reduction.27–31 In fact, with a higher number of independent variables, the computation time increases accordingly (since we have to compute LR on all the possible tuples of independent variables). Given this, we are investigating other ML techniques used in various fields of healthcare. 32 A new method of feature reduction could use, for example, information theory.33–35 Another topic to be studied in the future regards if and how to replace the LR algorithm with other types of supervised classification algorithms like support-vector machines 36 or neural networks. 37
Limitations
A limitation of our prediction model is the small number of patients to study (hundreds), that could lead to a model overfitting. Thus, we first need to study and fine-tune the predictive model when increasing the number of patients (at least thousands) while maintaining the same number of independent variables (<15). Second, we need a further external validation of the model on a larger data set of new incoming patients (especially for step 5).
Computing time
Usually, we deal with data matrices having 100–300 rows (e.g. patients) and 5–15 columns (1 dependent variable + N independent variables). As previously discussed, the number of tuples and thus of LRs we have to perform is 2N – 1. If having, for example, 100 patients and N = 14, we have 214 – 1 = 16383 tuples, for example, LRs to perform. Using a notebook with an i7 Intel processor, the computing time was around 20 hours and we can expect it to double for every unit increase of N (or if doubling the number of patients). Supposing to increase the number N of independent variables or the number of patients, a Feature Reduction algorithm to reduce N prior to implement LR will be needed. The outcome used to create and evaluate the validity of the PredictMed was gastrostomy placement independently of the time interval; in future studies, we will control for the time of assessment of the predictors and the time of gastrostomy placement.
Conclusion
We developed a LR-based predictive model (PredictMed), and we tested it on 5 different data sets of children with CP, each dataset with a different outcome (a binary dependent variable): (1) feeding disorders needing gastrostomy, yes/no (present study); (2) likelihood of scoliosis,1,2 yes/no; (3) presence of profound intellectual disability, 3 yes/no; and (4) presence of autistic features, 4 yes/no. In each case, we evaluated accuracy, sensitivity, and specificity in predicting the outcome of the dependent variable. Even if PredictMed needs to be fine-tuned and validated on larger data sets, it showed on presently available data sets a predictive performance in terms of accuracy, sensitivity, and specificity between 74% and 90%. In our vision, PredictMed should significantly help doctors’ decision-making process regarding patient prognosis, improving care of patients with CP and reducing healthcare costs and suffering.
