Abstract
Introduction
The treatment of primary progressive multiple sclerosis (PPMS) remains an unmet challenge. To date, only one treatment for PPMS is available, and this treatment only has a modest effect, predominantly in people with remaining focal inflammatory disease activity. 1 Meaningful progress in the development of new and impactful treatments for PPMS will undoubtedly require many more clinical trials investigating new interventions. However, clinical trials in PPMS are difficult and expensive to conduct, in part due to the use of the expanded disability status scale (EDSS) 2 as the primary outcome measure in trials in all forms of MS. 3
The use of the EDSS in trials has become the standard with a time to event outcome of confirmed disability progression (CDP). Trials using this outcome are powered based on the number of progression events, which occur between 30% and 40% of participants over the course of 2 years. Using this approach translates into large sample sizes. If alternative outcome measures have higher event rates, it might be possible to lower the sample size and shorten the duration of trials.
We recently investigated clinical outcome measures in two secondary progressive MS (SPMS) trial data sets and found that the timed 25-foot walk (T25FW) 4 may be a more useful primary outcome measure than the EDSS. For this study, we gained access to the data set of the INFORMS trial, 5 a large phase III randomized controlled trial in PPMS, to investigate the reliability or the ‘noise’ inherent in the EDSS, T25FW and nine-hole peg test (NHPT) 6 and their combinations.
In addition to the increasing physical disability, PPMS is also characterized by progressive cognitive decline, 7 and it would be useful to have a reliable clinical outcome measure reflecting this aspect of the disease. The paced auditory serial addition test (PASAT) was previously the most widely used cognitive outcome and is still often used in cross-sectional studies. It is part of the multi-dimensional multiple sclerosis functional composite, 8 but its value as a longitudinal measure is debatable because of a large practice effect that can limit its use as a repeated measure. 9 Despite its popularity, we know relatively little about changes on the PASAT over time and about its usefulness as a trial outcome. In this study, we additionally investigated the value of the PASAT as a measure of disease progression in PPMS.
To aid the selection of appropriate eligibility criteria for PPMS trials, we also investigated the association of baseline factors with the risk of disability worsening over the course of 3 years of follow-up. Our analyses inform the selection of the most appropriate outcome measures and eligibility criteria for clinical trials in PPMS.
Methods
Trial data set and ethics
The INFORMS data set was obtained from Novartis (Novartis, Basel, Switzerland), the pharmaceutical company which conducted and oversaw the INFORMS trial. The ethical approval for INFORMS is described in the original publication. 5 The University of Calgary Conjoint Health Research Ethics Board Ethical granted ethical approval for this analysis. INFORMS was a randomized, double-blind, placebo controlled trial conducted at 148 centres in 18 countries. Key inclusion criteria were age: 25–65 years, a clinical diagnosis of PPMS, disease progression for 1 year or more, a disease duration of 2–10 years and objective evidence of disability worsening in the 2 years before inclusion. In INFORMS, participants were initially randomly assigned to receive either fingolimod 1.25 mg per day or placebo, but during the trial, the decision was made to discontinue the development of fingolimod 1.25 mg and to continue with fingolimod 0.5 mg instead. Participants who had been assigned to 1.25 mg were switched to 0.5 mg providing for variable exposure to 1.25 mg. We present the group of patients originally assigned to fingolimod 1.25 mg as separate group in Table 1. For certain analyses, we combined both fingolimod groups into a single group. For the presentation of disease duration from onset and from the time of diagnosis, we imputed a missing day of the month as the 15th of the month, and a missing month as July of the year.
Baseline characteristics.
SD: standard deviation; EDSS: expanded disability status scale; NHPT: nine-hole peg test; PASAT: paced auditory serial addition test; T25FW: timed 25-foot walk; IQR: interquartile range.
Progression rates
We determined the proportion of individuals with disability worsening and improvement by comparing baseline and follow-up disability measures. Patients missing the disability measure at baseline, the time point of interest or the corresponding confirmation assessment (at either 3 or 6 months subsequently) were excluded from these analyses. Disability worsening and improvement were defined as a 20% or more worsening/improvement from baseline in the time for the T25FW 4 and the NHPT. 6 According to the definition of historical trials in SPMS10–13 and PPMS, 1 we defined worsening/improvement on the EDSS as an increase/decreased of one whole point on the EDSS if the baseline EDSS was 5.5 or lower, and of one half point if the baseline EDSS was 6.0 or 6.5. Since no agreed-upon definitions of significant worsening exist for the PASAT, we calculated mean PASAT-3 scores throughout follow-up. We also investigated worsening and improvement on the PASAT-3 from baseline (1) by any degree, (2) by at least four points and (3) by at least 20%.
Investigations of ‘noise’
We investigated the reliability or ‘noise’ inherent in the EDSS, T25FW and NHPT in three ways.
Unconfirmed versus confirmed disability worsening
First, we compared unconfirmed and ‘confirmed’ disability worsening. We labelled a worsening event ‘confirmed’ if (1) a disability measure showed significant worsening compared to baseline and (2) was confirmed as worsened at a confirmation measurement 3 or 6 months. An ideal robust clinical outcome in PPMS should have only a small difference between unconfirmed and confirmed disability worsening.
Confirmed versus sustained disability worsening
We also compared ‘confirmed’ with ‘sustained’ disability worsening. We labelled a worsening event ‘sustained’ if (1) a disability measure showed significant worsening compared to baseline, (2) remained significantly worsened at a confirmation measurement 3 or 6 months and (3) remained significantly worsened at the last (36 month) trial visit. An ideal outcome of irreversible disability worsening in PPMS should have only a small difference between confirmed and sustained disability worsening. However, one disadvantage of this approach is that the length of the ‘sustained’ period varies depending on when the index worsening first occurs.
Disability worsening versus similarly defined improvement
To investigate whether the measure truly is a reflection of the ongoing worsening of disability, we compared disability worsening with similarly defined improvement. An outcome measuring the chronically progressive disease process of PPMS should have only a small proportion of patients ‘improving’ on an outcome measure, and a large proportion of patients worsening. The proportion of patients with disability worsening should increase over time, reflecting chronic progression in PPMS, while the proportion of patients with improvement should decline or remain unchanged over time.
Baseline factors associated with disability worsening
To aid in the selection of inclusion criteria for clinical trials, we investigated the association of baseline characteristics with disease progression at 12, 24 and 36 months of follow-up using logistic regression models. We used worsening on the EDSS or T25FW (unconfirmed and 3 month confirmed) at 12, 24 and 36 months as the dependent (outcome) variables and age, disease duration, sex, treatment (placebo or fingolimod), EDSS score at baseline, T25FW at baseline and contrast enhancing lesions on the screening MRI (present or absent) as the independent (predictor) variables. Statistical significance was taken to be at the two-tailed 0.05 level. All statistical analyses were performed with the R statistical software package for Windows, version 4.0.2. 14
Data availability
The data used in this study are available upon request from Novartis. Individual participant data collected during the trial will be shared after anonymization and on approval of a research proposal and data sharing agreement. Research proposals can be submitted online (https://www.clinicalstudydatarequest.com).
Results
INFORMS data set
The INFORMS data set contained individual patient level data of 970 participants. Table 1 shows their baseline characteristics.
Progression rates
Table 2 shows the proportion of trial participants with unconfirmed and confirmed disability worsening over the course of the trial. The T25FW had the highest number of worsening events over time, followed by the EDSS. The NHPT showed the lowest progression rates over follow-up. To explore whether the NHPT may be a more useful outcome in participants with advanced disability, we investigated NHPT worsening in a subgroup of patients with a baseline EDSS of 6.0 or higher (
Percentage of trial participants with unconfirmed and confirmed disability worsening throughout follow-up.
EDSS: expanded disability status scale; NHPT: nine-hole peg test; UDP: unconfirmed disability progression, CDP: confirmed disability progression, 3M: 3 months, 6M: 6 months;
Investigations of ‘noise’
Unconfirmed versus confirmed disability worsening
Table 2 shows the difference between unconfirmed and confirmed disability worsening on single and combined outcome measures. The EDSS showed the lowest difference between unconfirmed and confirmed disability worsening, with a large majority of unconfirmed worsening events confirmed at 3 or 6 months (e.g. 82.2% of those with unconfirmed 12-month worsening events were confirmed at 3 months). For the T25FW this difference was slightly larger (70% of unconfirmed 12-month worsening events confirmed at 3 months), and largest for the NHPT (with only 56.3% of unconfirmed 12-month worsening events confirmed at 3 months, Table 2). There was little difference between 3 and 6 month confirmation, although 6 month was slightly lower with regards to the difference between unconfirmed and confirmed disability worsening.
Confirmed versus sustained disability worsening
Table 3 shows the difference between confirmed and sustained worsening events for single and combined outcomes. The EDSS showed the lowest difference between confirmed and sustained disability worsening, with, for example, 65.4% of confirmed 12-month worsening events sustained until the end of the trial, followed by the T25FW (48.9%). The NHPT had the largest discrepancy between confirmed and sustained worsening events, with only 37.9% of confirmed 12-month worsening events sustained until the end of the trial (Table 3).
Comparison of confirmed versus sustained disability worsening.
EDSS: expanded disability status scale; NHPT: nine-hole peg test; CDP: confirmed disability progression, SDP: sustained disability progression, 3M: 3 months, 6M: 6 months; T25FW: timed 25-foot walk.
Disability worsening versus similarly defined improvement
Table 4 and Figure 1 show a comparison of worsening events with similarly defined improvement on single and combined outcome measures. Overall, improvement events were much rarer than worsening events, and remained stable throughout the course of the trial. The EDSS had the highest number of improvement events with around 10% of patients experiencing improvement (unconfirmed) on the EDSS, followed by the T25FW (with around 7% of improvement) and the NHPT (with around 4% of improvement; Table 4).
Percentages of trial participants with disability worsening versus similarly defined improvement throughout follow-up.
EDSS: expanded disability status scale; NHPT: nine-hole peg test; UDP: unconfirmed disability progression, CDP: confirmed disability progression, 3M: 3 months, 6M: 6 months; T25FW: timed 25-foot walk.

Proportion of patients with disability worsening versus similarly defined (unconfirmed) improvement on the (a) EDSS, (b) T25FW and (c) NHPT.
Baseline factors associated with disability worsening
EDSS and T25FW at baseline were consistently associated with EDSS and T25FW worsening at 12, 24 and 36 (or 33) months. Male sex and disease duration were associated with worsening on the EDSS in some but not all regression models (Table 5). Age, the presence of contrast enhancing lesions and fingolimod treatment were not associated with the risk of EDSS and T25FW disability worsening in any of the regression models. We performed these analyses with the treatment variable dichotomized into placebo or fingolimod, repeating analyses with separate 0.5 and 1.25 mg fingolimod arms did not change the results.
Results of the logistic regression models.
EDSS: expanded disability status scale; AIC: Akaike information criterion; UDP: unconfirmed disability progression, CDP: confirmed disability progression; T25FW: timed 25-foot walk.
The models included sex, age, disease duration, EDSS at baseline, T25FW at baseline, contrast enhancing lesions on the screening MRI (present or absent) and the treatment arm (placebo or fingolimod) as independent (predictor) variables.
Discussion
An ideal clinical outcome measure in PPMS should show a steadily growing number of worsening events over time in a disease that has no disease modifying treatments. These worsening events should reflect irreversible disability, so that the difference between raw and confirmed worsening, on one hand, and confirmed and sustained worsening, on the other hand, should be as low as possible. So far, there has been little motivation to compare outcome measures in PPMS and other disease courses, because the agreed-upon standard for outcome measurement in all forms of MS has been the EDSS.
Our investigation of progression rates shows that–similar to our previous investigations in SPMS
15
and PPMS,
16
the T25FW had the highest proportion of patients with disability worsening over time, followed by the EDSS. The NHPT showed the lowest worsening rates, with only 12.1% of patients experiencing (3 month confirmed) disability worsening at 33 months. Limiting our investigation to a subgroup of patients with significant baseline disability (EDSS of 6.0 or greater,
Change in the PASAT throughout the trial.
PASAT: paced auditory serial addition test; SD: standard deviation.

(a) Mean PASAT scores (error bars represent the standard deviation) throughout follow-up. The PASAT does not show worsening over time, but a slight increase in mean scores up to about 12 months, and little change afterwards. This slight increase in PASAT scores may be due to a practice effect. (b) Improvement of the PASAT compared to baseline is more likely than worsening throughout the trial.
The primary goal of treatment in PPMS is the prevention or delay of irreversible disability. An outcome measure that reflects this irreversibility should therefore only show a small difference between unconfirmed and confirmed or sustained disability worsening. In the INFORMS data set, the EDSS showed the highest consistency between unconfirmed and confirmed and between confirmed and sustained disability worsening, followed by the T25FW. Around two-thirds of 12-month (3 month confirmed) worsening events on the EDSS were sustained until 3 years of follow-up. This is in contrast to a seminal study in relapsing–remitting MS (RRMS), where only about half of all (3 months) confirmed worsening events at 1 year were sustained until 2 years of follow-up. 18 This difference between the earlier study in RRMS and our study may be because the RRMS study included participants with EDSS scores in the lower portion of the scale, where the EDSS is known to have poorer test–retest reliability.19,20
Our investigation of worsening versus similarly defined improvement is based on the idea that (in a trial that has not demonstrated a treatment effect) a useful outcome measure in PPMS should reflect the ongoing clinical worsening. While plateaus and occasional improvements are possible in PPMS, the clinical picture is dominated by a slow and steady decline across all functional systems. Based on this reasoning, an ideal outcome measure in PPMS should show steady worsening of disability, while improvement by the same margin on the same outcome would then either be due to measurement error or a very rare ‘true’ improvement event; either way, worsening events should over time vastly outnumber improvement events in PPMS. Ebers et al. 21 showed that the EDSS improved about as often as it worsened in RRMS trial cohorts followed for shorter periods of time. Fortunately, this effect was much less noticeable in the INFORMS data set, with improvement rates of largely below 10% for the EDSS and even lower proportions for the other single outcome measures. Combining the EDSS and T25FW resulted in a larger proportion of individuals with disability worsening, without substantially increasing the differences between unconfirmed and confirmed or between confirmed and sustained disability worsening. Including the NHPT in a combined outcome added little to progression rates.
The EDSS is currently the standard primary outcome measure in all forms of MS. Our findings suggest that the T25FW would also be a good choice for primary outcome measure in PPMS. It may seem limiting to use a pure ambulation measure as the primary outcome, but it should be kept in mind that the EDSS is almost exclusively an ambulation measure at values of 4.0 and higher, which applies to the vast majority of the participants in INFORMS. The T25FW could be used in isolation or in combination with the EDSS. However, we base this recommendation on this investigation of a single trial data set. While INFORMS was a well conducted representative trial, the precise impact of using the T25FW as primary outcome measure on statistical power, sample size calculations, and trial duration should also be investigated in other PPMS trial data sets.
One reason for the relative high reliability of worsening events in INFORMS may lie in the lower inflammatory disease activity in PPMS compared to RRMS and SPMS. INFORMS included participants with notably low markers of inflammatory disease activity; for instance, only 13% of participants had contrast enhancing lesions at baseline, which is about half of, for example, the ORATORIO trial of ocrelizumab in PPMS (with 27% of patients having contrast enhancing lesions at baseline) 1 or comparable trials in SPMS: ASCEND (22%) 22 and IMPACT (34%). 23 This suggests that in the INFORMS cohort disability worsening is likely not driven by overt focal inflammatory disease activity.
We performed regression analyses on the association of baseline characteristics and the risk of disability worsening. Our models showed that age, disease duration, sex and having contrast enhancing lesions at baseline were not consistently associated with the risk of disability worsening in PPMS as compared to the frequent association in RRMS. This may be due to homogeneity of these risk factors within the study population or suggest that it may not be necessary to adjust for these factors by formulating specific eligibility criteria. In contrast to these findings, it appeared that younger patients with contrast enhancing lesions at baseline were more likely to benefit from immunomodulatory treatment in the ORATORIO trial, the only positive phase III trial in PPMS to date. 1 Through its eligibility criteria, INFORMS may have a selected for a group of patients with meaningfully less focal inflammatory disease activity: the INFORMS cohort is, on average, 5 years older than the ORATORIO cohort, and included participants aged between 25 and 65 years, while ORATORIO included participants aged between 18 and 55 years. Focal inflammatory disease activity is in part an inverse function of age: two studies, which mostly included patients with RRMS and SPMS, showed that the proportion of patients with contrast enhancing lesions declines almost linearly as a function of age.24,25 It would be worthwhile to explore the association of MRI characteristics, age and disability worsening in other natural history and clinical trial cohorts in PPMS.
