Abstract
Introduction
International guidelines recommend communicating prognosis to facilitate shared decision making that aligns care with patients’ prognosis and personal preferences.1–3 Identifying patients at high risk for early death would help limit invasive treatments that are unlikely to provide a benefit near the end-of-life and facilitate the provision of timely kidney supportive care.4–6 Conversely, identifying those with high probability of survival would facilitate access to treatments and procedures aimed at longer-term survival. Despite these guidelines and evidence that patients wish to receive prognostic information,7,8 sharing this information is not standard clinical practice.7,9,10
There are numerous barriers to communicating prognosis including highly variable illness trajectories and insufficient prognostic tools, particularly for prevalent dialysis patients.11,12 Most prognostic tools predict survival for people initiating hemodialysis (HD) with very few studies including people undergoing peritoneal dialysis (PD). 13 There is poor understanding of model performance. Most studies have attempted to modify an existing tool or develop their own but did not provide evidence for validity. Studies that attempted to validate their model did so by two-way cross validation, using half their study sample to develop the model then applying the model to the remaining sample. This may lead to imprecise estimates of predictive accuracy and overfitting. 14 Only a single study examined model performance specifically for people undergoing PD. 15 Most models have not been tested in external cohorts. They are often in mathematical forms that cannot be applied easily in routine care and have not been evaluated in routine clinical practice.13,16 As a result, the clinical utility of current tools is limited, hindering meaningful translation to the bedside to inform practice.
To align care with patients’ prognosis and preferences and to deliver timely and effective care, both kidney supportive care and treatments aimed at prolonging survival, we need simple and accurate prognostic tools that can be integrated into routine clinical practice. 1 This has been identified as a high priority in nephrology and has been incorporated into the International Society of Nephrology's (ISN's) Strategic Plan for Integrated Care of Patients with Kidney Failure. 17 To help address these deficiencies, this study sought to evaluate a programmatic approach to identifying people undergoing PD with shortened survival. Specifically, we provide evidence for external validity for the Cohen prognostic model, that was designed to predict survival for dialysis patients, in a modern prevalent cohort that integrates actuarial factors with clinician predictions using the surprise question (SQ) to assess 6-, 12-, and 18-month mortality risk. 18
Methods
This prospective cohort study was conducted in prevalent PD patients. The University of Alberta Research Ethics Board approved the study (Pro00003255). We used the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) recommendations for ideal reporting of this study.
Source of data
In 2010, Alberta Kidney Care-North, a Canadian university-based program, introduced a supportive care assessment for patients receiving HD to prospectively identify those with high care needs who may benefit from kidney supportive care.19,20 This was extended to the PD Program in 2015. The supportive care assessment includes the Cohen prognostic tool, the Edmonton Symptom Assessment System-revised:Renal (ESAS-r:Renal),21,22 the Australia-modified Karnofsky Performance Status Scale (AKPS), 23 and activities of daily living. These assessments are done every 3 to 6 months when patients come to clinic. All assessments are recorded in the electronic medical record, which also provides laboratory data, demographic characteristics, comorbidity, and dialysis related information (modality, cause of kidney failure (KF), vascular access, and dialysis start and end dates). The first assessment following the PD Program's introduction of the supportive care assessment that had available data on all required variables was used to calculate the predictive risk index for all patients in our validation cohort. For patients who did not have laboratory results on the exact visit date, the last available result was imputed.
Participants
All prevalent patients who had been undergoing PD (both continuous ambulatory and automated) for ≥3 months between February 2015 to April 2019 were eligible for inclusion.
Outcome and predictors
Predicted survival was calculated for each patient at 6, 12, and 18 months using the Cohen model. The Cohen model involves five predictors: SQ—“Would you be surprised if this patient were to die in the next 12 months?”, age, serum albumin, dementia, and peripheral vascular disease (formula provided in Supplemental Figure S1). The SQ was based upon the opinion of the primary PD nurse. The other variables were based on comorbidities listed in the patient's electronic medical record (EMR). Severity of dementia was not considered. The serum albumin was typically within 1 month of the assessment, based on monthly blood work results.
Statistical analysis
Statistical analysis was done using R version 4.1.1. Patient characteristics are summarized as counts (%) for categorical variables and medians (interquartile range, IQR) or means (sd) for continuous. T-tests and Chi-square tests were used to test the differences between our external validation cohort and derivation data used in the Cohen model. The observed survival of the validation cohort was plotted as Kaplan–Meier (KM) curves, with follow-up censored administratively at 24 months. Follow-up start was defined as the date of the first recorded assessment that occurred when the patient had at least 3 months on PD. Patients who underwent a transplant or who recovered were censored at time of transplant or recovery. Mean predicted survival for the cohort is presented in bar plots and is compared to the KM estimates at the three time points.
In keeping with the Cohen internal validation, our validation cohort was split into risk groups by predicted risk index quintile and compared via KM curves. Additional risk groups were calculated and compared using Cox-based cutoffs (16th, 50th, and 84th centiles) recommended by Royston and Altman.24,25 Hazard ratios for both methods were estimated with 95% confidence intervals (CI) to assess the model's ability to differentiate between patients in different risk groups. To account for potential differences in outcomes by length of time on PD at start of the follow up, a sensitivity analysis was conducted after stratifying by +/− median time on PD.
Cumulative area under the curves (AUCs) and Somers’ Dxy were calculated to assess model discrimination at each time point, as well as overall. A calibration slope was built using the calculated predicted risk indexes. Model miscalibration was assessed through a likelihood ratio test of the slope and a joint test on all predictor variables with an offset term for predicted risk index. Using both semiparametric (Cox PH) and parametric (Weibull) models, baseline survival in the external validation cohort at 6, 12, and 18 months was re-estimated and applied to the original predicted risk index to create updated survival estimates. Baseline survival estimates were calculated by setting predicted risk indexes to the mean. 24 These estimates were compared alongside the observed and Cohen predicted survival using a bar plot. Competing risks models were used to check the assumption that bias was not being introduced by censoring patients who had been transplanted or who had recovered.
Results
Patient characteristics
Four hundred and sixty-four patients were included. Of these, 62 (13.4%) died within the 24-month follow-up period. Patient characteristics are described in Table 1. A number of characteristics were significantly different than those in the original derivation cohort (Table 2). Our external validation cohort had a higher proportion of males (64.9% vs 56.7%) and patients with peripheral vascular disease (8.4% vs 3.3%), a lower proportion of Hispanic or Latin American patients (1.7% vs 14.3%), fewer patients with dementia (0.6% vs 19.6%), and patients had lower albumin levels (mean g/dL 3.5 vs 3.8; 40.3% vs 14.9% with results below 3.5 g/dL). Additionally, our external validation set had a much lower proportion of practitioners who responded that they would not be surprised if the patient were to die (7.3% vs 15.8%). It should be noted that the SQ was in reference to the patient's survival within 12 months in our external validation cohort in keeping with the general approach in the literature,18,26 and was within 6 months in the Cohen derivation cohort.
Patient characteristics.
AKPS: Australia-modified Karnofsky Performance Status; eGFR: estimated glomerular filtration rate; IQR: interquartile range; PD: peritoneal dialysis.
Differences in patient characteristics between Cohen derivation and external validation cohorts.
*Derivation uses 6 months, validation uses 12 months.
CI: confidence interval; HD: hemodialysis; KM: Kaplan–Meier; SQ: surprise question.
Model discrimination and calibration
Survival in the external validation cohort remained high through the 24 months follow up. Predictions from the Cohen model followed observed survival closely, although the model underestimated survival to a greater degree at the longer follow-up points; the difference between mean observed and predicted survival was 3.1%, 5.5%, and 11.0% at 6, 12, and 18 months, respectively (Figure 1). The PD cohort in this study had significantly different baseline characteristics and a higher observed survival over the 24-month follow-up time than a previously investigated HD cohort (p = 0.02, Supplemental Figure S2 and Table S1) 27 ; the Cohen model predictions were able to accurately reflect this (Supplemental Figure S3).

Kaplan–Meier curve for validation cohort with observed versus predicted mean survival at 6, 12, and 18 months.
Model discrimination was moderate (Somers’ Dxy 0.46) and remained constant over all three time points (cumulative AUC: 0.731, 0.729, and 0.739 at 6, 12, and 18 months, respectively, Figure 2). An AUC and Somers’ Dxy of 1 suggest perfect concordance between pairs of predicted and observed values, where an AUC of 0.5 or Somers’ Dxy of 0 suggests the model is no better at predicting outcomes than chance. There was evidence of model miscalibration, suggesting worse discrimination compared to derivation data (calibration slope = 0.717, p = 0.007). A joint test of the predictors suggests that the predictive accuracy of the model was most impacted by the difference between the validation and derivation cohorts with respect to the SQ (p = 0.002). Supplemental Figure 4 illustrates the substantial impact the SQ has on predicted survival in the ‘not surprised’ group. The model greatly underestimated survival for these people; the difference between mean observed and predicted survival was 19%, 35%, and 47% at 6, 12, and 18 months, respectively. In the “surprised” group, the difference between mean observed and predicted survival was only 2%, 3%, and 8% at 6, 12, and 18 months, respectively.

ROC curves with cumulative AUC at 6, 12, and 18 months. AUC: area under the curve; ROC: receiver operating characteristic.
Comparisons of predictive index stratified survival curves suggested the model had the ability to differentiate between high and low risk patients (Figure 3). Cutpoints for the quintile approach were at −3.48, −2.93, −2.45, and −1.89, and showed evidence of discrimination for quintiles 4 and 5 compared to quintile 1 (quintile HRs [95% CI]: Q4 vs Q1 9.1 [2.1–39.4], Q5 vs Q1 18.9 [4.5–79.1], Figure 3(a)). The Cox-based cutoffs showed evidence of discrimination for groups 3 and 4 when compared to group 1 (Cox HRs [95% CI]: C3 vs C1 13.7 [1.9–101.2], C4 vs C1 32.0 [4.3–236.5], Figure 3(b)). The distribution of predicted risk index scores can be found in Supplemental Figure 5. When compared to mean observed survival, the predicted 6-, 12-, and 18-month survival by quintile appeared very similar for the first two (low-risk) quintiles, with the Cohen model underestimating survival both as the predicted risk index increased, as well as at longer time periods (Figure 4). In the highest patient risk group (Q5), 12- and 18-month predicted survival was 15% and 28% lower than observed mean survival. This is in contrast to the lowest risk group (Q1), where the differences between the same time periods are only 1% and 4%, respectively. At risk tables for the cohort as a whole and stratified by risk group are available in Supplemental Table 3.

Kaplan–Meier curves with hazard ratios and 95% confidence interval (CI) for (a) quintile groups and (b) Cox-based risk groups.

Observed versus predicted mean survival at 6, 12, and 18 months by quintile.
Sensitivity analyses
Re-estimating baseline survival at 6, 12, and 18 months did not result in better predictive accuracy, and in fact both the Cox PH and Weibull models overestimated survival consistently for each time point (Supplemental Figure S6). This further suggests that the differences between the cohort characteristics are impacting model accuracy.
The median time on PD at start of the follow up for the cohort was 7.5 months. There were no significant differences between the predictor variable levels between patients who had been on PD for more or less time than the median time (Table S4), nor were there differences in survival (p = 0.34, Supplemental Figure S7). The Cohen model performed slightly better in patients who had a shorter time on PD, although in both cases model performance dropped when looking further out (Supplemental Figures S8 and S9).
Discussion
Shared decision making that aligns care with patients’ values and preferences, and providing both kidney supportive care aimed at promoting quality of life and more invasive care aimed at prolonging survival requires accurate prognostication using validated tools that can be integrated easily into routine clinical care. Prognostic tools to date have focused on survival predictions for patients starting dialysis, often with the goal of helping patients choose a kidney replacement therapy. They do not identify prevalent patients who have become high risk for early death and would benefit from enhanced kidney supportive care and potential reprioritization of goals for care. Data are particularly sparse for people undergoing PD.
The Cohen prognostic model is relatively unique in that it has been converted into an easy to use tool that is available for free, online or on mobile applications (calculated by QxMD). It integrates routinely collected clinic data with the SQ and survival predictions can be generated at the bedside without consuming physician time. In a recent external validation study, we showed that the Cohen prognostic tool could be successfully integrated into routine HD care at a program level. 27 It was able to predict mortality with a c-statistic of 0.71 to 0.72 for 6-, 12-, and 18-month predictions and showed good discrimination between high- and low-risk patients.
In this external validation study we showed that the Cohen model, when routinely assessed in people on PD, it was able to predict survival of prevalent PD patients in Alberta as effectively as for people undergoing HD in Alberta with moderate accuracy in discrimination and calibration (c-statistic of 0.73–0.74 for all 3 time points), despite different baseline characteristics. Discrimination performance was not as high compared to Cohen's derivation and internal validation cohorts where AUCs for 6-month survival predictions were 0.87 (95% CI 0.82–0.92) and 0.80 (95% CI 0.73–0.88), respectively. 28 Cohen et al. did not evaluate performance at 12 and 18 months. Our c-statistics are consistent with those described by previous studies of prediction tools for mortality risk in incident dialysis patients.13,16 We also showed good discrimination between low- and high-risk patients. As with the external validation in the HD cohort, the tool over predicted mortality risk and performed best with short-term survival and lower risk patients. The difference between observed and mean predicted survival was only 3.1% at 6 months but increased to 11.0% by 18 months. This is consistent with 3.2% and 12.9% differences between observed and predicted survival at 6 and 18 months, respectively for people undergoing HD. 27
The SQ is a screening tool that has been used to identify prevalent patients nearing the end of life but its accuracy varies across clinical disciplines and clinical settings. In a meta-analysis, the pooled accuracy of the SQ was 74.8% (95% CI 68.6–80.5); the c-statistic of the SQ ranged from 0.512 to 0.822, and 0.63 to 0.78 when limited to kidney failure populations. 29 Comparatively, the C-statistic for the SQ in the PD cohort used in this validation was 0.518 at 12 months. The lower accuracy may be driven by differences in the cohort, that is, limited to people on PD compared to those on hemodialysis in the meta-analysis. In the meta-analysis, the SQ seemed more accurate in an oncology setting and doctors appeared to be more accurate than nurses at recognizing people in the last year of life (c-statistic = 0.735 vs 0.688). The SQ has also been shown to identify a subgroup of people on PD who had a 3.59 excess risk of death (95% CI: 1.41–9.15; p = 0.007) within 12 months. 30 Recently, the SQ has been applied by the primary PD nurse to successfully identify a subgroup of prevalent PD patients at higher risk for early mortality but similarly to this study, the predictive accuracy increased greatly when integrated with other objective assessments: a one-page modified Palliative Care Screening Tool and a clinical risk model that combined sex, PD vintage, coronary artery disease, malignancy, normalized protein nitrogen appearance, white blood cell count, and serum sodium level. 15 Unlike the Cohen model, this model has yet to be validated externally and has not been translated into scores that can be easily calculated and applied at the bedside to inform clinical decision making.
There were a number of important differences between the PD cohort described here and the Cohen derivation HD cohort, most notably greater survival over the 24 months of follow up and more “surprised” responses to the SQ, reflecting a healthier population able to perform home-based dialysis. However, the model was able to accurately predict survival in these lower risk patients. In this validation study, the predictive accuracy of the model was most impacted by the SQ with miscalibration being the highest for people in the “not surprised” group. A test of the differences between 24-month survival in between the SQ strata in our external validation cohort did not find a difference (log-rank p-value = 0.23). This is consistent with the two studies conducted in people on PD which showed that the negative predictive value was high (93.4% and 95.6%) making this a good screening tool to identify low-risk patients but the positive predictive value was low (24.8% and 52.9%), with most patients still being alive at 12 months in the “not surprised” group.15,30 Supplemental Table S2 shows a breakdown of patient characteristics for our PD cohort and the Cohen derivation cohort, stratified by SQ response. In both cohorts, patients with a “not surprised” response were older and had lower serum albumin levels. Low serum albumin is a strong predictor of mortality in this model and serum albumin was significantly lower in this PD subgroup compared to the Cohen derivation cohort (3.1 g/dL vs 3.5 g/dL, p=<0.001). Since the predictive index for the model is a weighted linear combination of all prognostic variables, the influence of the “not surprised” response on predicting poor survival is magnified by the lower serum albumin. The literature suggests low serum albumin levels may not be associated with mortality risk in people undergoing PD to the same extend as in people on HD; instead, changes in serum albumin may be more predictive. 31 This would account for the high miscalibration where “not surprised” response was compounded by a falsely high weight of low serum albumin. The model only slightly over predicted mortality risk for patients who were in the “yes, I’d be surprised” subgroup. Once again, this slight miscalibration may be due to the lower serum albumin levels when compared to the derivation cohort (3.6 v. 3.9 g/dL, p ≤ 0.001).
There are some limitations to consider. External validation of a prognostic time-to-event model can require a large number of events per predicted timepoint to be able to accurately measure model calibration and descrimination.32,33 A minimum of 100 events has been suggested to be able to precisely evaluate prognostic performance. 34 We also recognize that the value of the SQ is likely to vary with clinical discipline and experience. Registered nurses completed the assessment in this study within the context of a primary nurse-led model of care with rotating nephrologists. These were experienced nurses who followed their patients closely. The use of data collected as part of usual clinical care has both positive and negative implications. Our goal was to validate a model that relied on data that are collected routinely for people undergoing dialysis to promote greater generalizability and ease of integration into varied clinical settings. The ability of this prediction tool to reasonably predict mortality across both PD and HD patients using the same predictors, despite quite different observed mortality risk, highlights the relevance of these characteristics to mortality risk among people undergoing dialysis. However, changes in laboratory variables such as serum albumin need to be considered for future refinement of this model. Furthermore, measures such as functional status and acute care utilization, which have been shown to improve mortality predictions in community-dwelling older adults, are likely important in predicting decline in patients undergoing dialysis.35–37 Integrating these additional factors may help improve predictive accuracy.
Conclusions
In summary, we have provided evidence for external validity and feasibility of the Cohen prognostic model routinely collected at a program level for people undergoing PD, to identify those with shortened survival. The Cohen prognostic model was accurate at predicting survival of low risk groups. Identifying those who are at low risk of early mortality is reassuring and may prevent withholding of potentially beneficial procedures aimed at treating comorbidities and promoting longer-term survival. The model was also very informative at identifying high-risk patients. However, given the extent it over estimates mortality risk for highest risk patients, care must be taken to not use the predictions to withhold or ration treatment but rather as a risk stratification tool that signals potential decline and the need to re-evaluate the patient. This will help provide opportunities to identify reversible factors that may improve patient outcomes, including survival and quality of life through enhanced kidney supportive care. It will also facilitate discussions regarding possible adjustments to care. This miscalibration of the highest risk patients, predominantly due to poor weighting of serum albumin and overly pessimistic responses to the SQ provides an imperative to further refine the tool for people on PD, which may require the inclusion of a slightly different set of predictors.
Supplemental Material
sj-docx-1-ptd-10.1177_08968608251364097 - Supplemental material for External validation of a prognostic model in routine practice for short- and long-term survival in peritoneal dialysis
Supplemental material, sj-docx-1-ptd-10.1177_08968608251364097 for External validation of a prognostic model in routine practice for short- and long-term survival in peritoneal dialysis by Sara N Davison and Sarah Rathwell in Peritoneal Dialysis International
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship and/or publication of this article: This work was supported by the Canadian Institutes of Health Research (grant numbers 201209MOP – 286366 – CIC – CBAA – 117151, 201209MOP – 286394 – PLC – CBAA – 117151).
Ethical approval
The University of Alberta Research Ethics Board approved the study (Pro00003255).
Informed consent to participate
The study was approved with a waiver for written informed consent.
Author contributions
SD wrote the first draft of the manuscript. SR completed data analysis. All authors reviewed and edited the manuscript and approved the final version of the manuscript.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
