Abstract
1 Introduction
Mediation analysis is used to statistically explore the possible mechanisms through which a treatment or exposure affects an outcome. To achieve this, the analysis attempts to decompose the total effect of the exposure on the outcome into an indirect effect and a direct effect. The indirect effect is the part of the total effect that is realised by the exposure acting on the mediator and that mediator then acting on the outcome. The direct effect captures the portion of the total effect that does not act via the mediator. Through this decomposition we hope to gain insight about the process through which an effect occurs. We refer readers to VanderWeele 1 for a thorough overview of mediation analysis.
Most methods for mediation analysis have focused on a single mediator and a continuous or binary outcome, though extensions to more complex settings are emerging. In this paper, we are interested in the setting where the outcome is time to an event and the mediator is a time-dependent variable for which repeated measurements are available. The repeatedly-measured nature of the mediator allows us to focus on a process that evolves over time. For example, consider a situation where the exposure is the onset of a condition that causes a progressive elevation in a biomarker and that high levels of this biomarker lead to increased risk of death. One challenge that arises is that the mediator measurement from one time point may confound the association between the mediator measured at a later time point and the outcome. This is an example of exposure-induced confounding of the mediator-outcome association and, for identification, many methods assume this type of confounding does not exist. Time-to-event outcomes pose another difficulty as survival for a given time is a post-exposure confounder of the association between a mediator measured at a later time and subsequent survival (i.e. survival is required to take a mediator measurement) and, therefore, the exposure affects the mediator both directly and indirectly via survival time.
Vansteelandt et al. 2 recently proposed a method for mediation analysis in this setting by estimating the combination of path-specific effects where the exposure first acts upon the repeatedly-measured mediator. In other words, this method estimates how the effect of the exposure on the outcome is mediated via the entire longitudinal mediator process. As their approach also accommodates time-varying mediator-outcome confounders, it is suitable for situations where multiple mediators may exist. A second mediation method for this setting was proposed by Aalen et al. 3 and is based on dynamic path analysis using the additive hazards model. As this method requires that control for confounding be made using only covariate measurements taken prior to the exposure, it is restricted to settings without time-varying confounders.
In Buse et al. 4 the method proposed by Vansteelandt et al. 2 was applied to data from a randomised controlled trial (the LEADER trial) to identify possible mediators of the effect of treatment on the risk of cardiovascular events. In another application, this method was used to quantify the indirect effect of treatment on the risk of a composite kidney disease outcome via several candidate mediators. 5 In analyses such as these based on data from randomised controlled trials, control for confounding of the exposure-outcome and exposure-mediator relationships is rendered unnecessary via the randomisation and there is a clearly defined starting time for each individual at the time of randomisation. To the best of our knowledge, to date the methods of Vansteelandt et al. 2 and Aalen et al. 3 have only been applied to data from randomised controlled trials. Our focus is on the use of these methods to address mediation questions using observational data. When working with observational data, such as a registry dataset, control for confounding can only be accomplished via adjustment with measured covariates and in cases where the exposure is onset of a condition, there is no natural time zero for comparison of the exposed and unexposed. Further, measurements of time-updated variables are taken on a schedule designed for long-term data collection as opposed to targeting a specific research question. We discuss these issues in our motivating example, which is based on data from the UK Cystic Fibrosis Registry dataset.
Our primary aim is to more thoroughly evaluate two methods available for mediation analysis in the setting of a repeatedly-measured mediator and a time-to-event outcome: the method of Vansteelandt et al. 2 and the method of Aalen et al. 3 Although Vansteelandt et al. 2 provide a basic simulation illustrating their approach for cases with and without direct and indirect effects in an appendix, we are not aware of any extensive simulation study of their method or any simulation study of the method of Aalen. Here, we use simulation to look both at scenarios where we expect good performance as well as at scenarios that may challenge the methods. Further, we apply these methods to analyse mediation in cystic fibrosis-related diabetes. This is the first application of causal mediation methods to the UK Cystic Fibrosis Registry dataset and may motivate further research into mechanisms of disease progression. The paper is organised as follows. Section 2 introduces the motivating application: cystic fibrosis-related diabetes. In Section 3, a brief introduction to causal mediation analysis is provided as well as descriptions of the two mediation analysis methods studied. We present a simulation study assessing the performance of the two methods with a focus on bias in Section 4. Section 5 contains an analysis of the UK Cystic Fibrosis Registry dataset to investigate mediation of the effect of cystic fibrosis-related diabetes on survival. We conclude with a discussion in Section 6.
2 Motivating application: Cystic fibrosis-related diabetes
This study is motivated by the setting of cystic fibrosis (CF) and the desire to better understand the mechanisms associated with mortality. CF is a genetic, life-shortening disease that affects more than 10,500 people in the UK 6 and approximately 100,000 people worldwide. 7 It is characterised by a progressive loss of lung function and most people with CF die from respiratory failure. 8 Although there is no cure for CF, improved care and early diagnosis have led to substantial improvements in life expectancy, with a median predicted survival age for babies born today in the UK of 50.6 years. 6 The increasing lifespan of people with CF is concomitant with an increased risk of co-morbidities, the most common being CF-related diabetes (CFRD). CFRD has been shown to be associated with an increased risk of mortality9–12 but the mechanisms for this effect are not well understood. One hypothesis is based on the association of CFRD with worse pulmonary function.13–15 Using causal mediation analysis, we aim to quantify how much of the effect of CFRD on survival is mediated through lung function.
We use data from the UK CF Registry, which holds data on more than 12,000 people with CF representing over 99% of the CF population in the UK. This registry dataset contains demographic information, genotype, and time-updated measures of pulmonary function, bacterial infections and other health indicators. 16 Data are systematically collected at approximately annual routine monitoring visits conducted at specialist CF centres.
3 Causal mediation
3.1 Background
We begin by outlining some concepts for the setting of a single mediator and a non-time-to-event outcome. Many mediation analyses based on counterfactuals will aim to estimate the natural indirect effect (NIE) and natural direct effect (NDE).17,18 In a setting with a binary exposure
The NIE captures the change in outcome that would result from fixing the exposure but changing the mediator from the level it would have taken if exposed to the level it would have taken if unexposed. The NDE captures the effect of the exposure on the outcome if the mediator had taken the level it would have taken without exposure. The total causal effect of the exposure on the outcome is the sum of the NIE and NDE. These natural effects can be estimated under the assumption of no unmeasured exposure-outcome, mediator-outcome or exposure-mediator confounding and that there is no exposure-induced mediator-outcome confounding. These are strong assumptions that not even a randomised controlled trial may satisfy. In particular, the assumption of no exposure-induced mediator-outcome confounding is problematic because it requires that no such confounder exists. In other words, being able to measure such a confounder does not allow us to obtain unbiased estimates of the NIE and NDE. Although this assumption will not hold in many settings, it may be reasonable if the mediator is measured a very short time after the exposure. 19
In the setting studied here involving a repeatedly-measured mediator, survival outcome, and possible time-varying confounding, natural indirect and direct effects cannot be identified. As mentioned in the Introduction, there are problems with exposure-induced mediator-outcome confounding and with survival acting as a post-exposure confounder. In the next sections, we outline the methods of Vansteelandt et al. 2 and Aalen et al. 3 which have been developed to overcome these limitations.
3.2 Method of Vansteelandt et al. (2019)
Vansteelandt et al. 2 proposed a mediation analysis method suitable for settings with a time-to-event outcome, time-updated mediator and time-varying confounding of the mediator-outcome association using counterfactual scenarios based on a hypothetical intervention on the mediator. Their proposal infers the effect of the exposure on the outcome through combinations of path-specific effects via the time-updated mediator measurements. A key contribution of this method is that it allows for time-varying mediator-outcome confounders, which could themselves be affected by the exposure.
Consider a situation in which mediators and other time-dependent covariates are observed at regular visit times. The data-generating mechanism and causal ordering for the case of two post-exposure visits is shown in Figure 1. Let

Data-generating mechanism assumed in the method of Vansteelandt for the case of two post-exposure visits. The time-varying confounder
The indirect effect of the exposure on the outcome via the mediator is taken to be the combination of paths where one or more measurements of the mediator are directly influenced by the exposure and the mediator subsequently affects the outcome. These pathways that make up the indirect effect are shown in black in Figure 2 – upper panel. Conversely, the effect of the exposure on the outcome not via the mediator, referred to as the direct effect, is defined as the combination of paths where the exposure does not directly influence the mediator. These pathways include those in which the exposure first affects

Highlighted path-specific effects. In the top directed acyclic graph (DAG), the path-specific indirect effect of
Under the method of Vansteelandt, we estimate the survival probabilities
3.3 Method of Aalen et al. (2020)
Aalen et al. 3 proposed a mediation analysis method for the special case of a time-to-event outcome and time-updated mediator where control for confounding of the exposure-mediator and exposure-outcome relationships can be achieved using only the set of baseline confounders. When using this method, it is assumed that there are no time-varying confounders of the mediator-outcome association and only a single mediator. A key idea in the method of Aalen is exposure separation.22,23 This assumes that the exposure can be separated into two components: one that acts on the mediator process and one that affects survival either directly or through pathways not involving the mediator. Biologically, this means the exposure must be able to be split into separate physiological mechanisms that we could, in theory, manipulate independently of one another.
We use the same notation as described in the method of Vansteelandt and introduce

Data-generating mechanism assumed in the method of Aalen for the case of two post-exposure visits. The effect of the exposure
The estimand is a survival probability,
An additive hazards model for the hazard of the event at time
The assumptions made in equations (6) and (7) result in the special form of the mediational g-formula in (8). This allows for simple expressions for the IE and DE based on the probabilities
In the estimation procedure, event times are modelled as a counting process. At each event time, equation (7) is used to regress the change in the counting process onto the mediator and exposure and equation (6) is used to regress the mediator onto the exposure. The integrals in equations (9) and (10) are estimated as cumulative sums of the estimates of the model coefficients, with the integral in equation (10) being the standard cumulative coefficient reported from an additive hazards model. We refer readers to Aalen et al. 3 and Strohmaier et al. 24 for a detailed description of their dynamic path analysis approach to estimating the above quantities.
3.4 Comparison of the method of Vansteelandt and the method of Aalen
3.4.1 Conceptual considerations
The two approaches outlined above differ fundamentally in their conceptual approach to mediation in a survival context. A key difference is the nature of the counterfactuals. In the method of Vansteelandt, we consider a hypothetical intervention on the mediator to set it to a level that would have been seen if the exposure were different. This could lead to ill-defined effects when the outcome is survival because if the individual would survive longer when exposed (
The practical value of methods based on nested counterfactuals, which the method of Vansteelandt relies on, has also been discussed more generally.27,23,28,29 It involves considering an individual had their exposure been set to one level, but had their mediator been set to the value that would have been seen under a different exposure level. This is not a situation that could ever be observed in practice, which has raised conceptual concerns about the interpretation of the resulting estimands.
Although the exposure separation approach used in the method of Aalen avoids the use of nested counterfactuals, there are also conceptual hurdles involved in exposure separation. Here, physiologically, we must be able to decompose the exposure status into one component that affects survival but not the mediator and one component that affects the mediator but not survival. The independence of these two components is essential; if, for example, the proposed component affecting the mediator also affects survival, the assumptions of the analysis will not be valid. While an imagined exposure separation could correspond to a testable intervention, in practice, it may be difficult or even clinically impossible to individually manipulate the two components separately.
3.4.2 Statistical considerations
Both approaches require that there is no unmeasured confounding of the exposure-outcome, mediator-outcome and exposure-mediator relationships in order to obtain unbiased estimates of the estimands that they target. The method of Aalen requires that this control for confounding be via confounders measured at baseline and expressly forbids the existence of an exposure-induced mediator-outcome confounder. In contrast, the method of Vansteelandt was designed for settings with time-varying mediator-outcome confounders, including those affected by the exposure, and, as long as they can be measured, identification is possible. Another difference is that the method of Vansteelandt does not rely on parametric models in equation (5) for identification. In theory, arbitrary models may be selected for estimation. In the method of Aalen, however, the simplicity of the form of the IE and DE estimands is due to assumptions that the mediator follows a linear model, that the hazard of an event follows an additive hazards model and that only the most recent value of the mediator is necessary to model the hazard.
Despite these fundamental differences between the two approaches, Vansteelandt et al. 2 describe their approach as ‘a generalisation of dynamic path analysis’. Further, they showed the equivalence of the two approaches when there are no time-varying confounders, the mediator and hazard for the event follow additive models as in equations (6) and (7) and all individuals survive to the first mediator measurement. Under those conditions, using method of Vansteelandt equations (5) to calculate survival probabilities and (3) and (4) to calculate the IE(t) and DE(t), the resulting expressions are equivalent to those obtained from the method of Aalen for IE(t) and DE(t) in equations (9) and (10). This shows a connection between the nested counterfactual approach and the exposure splitting approach under certain conditions.
4 Simulation Study
4.1 Design
4.1.1 Overview and aims
To expand our understanding of the performance of these two mediation methods in more complex scenarios with a time-to-event outcome and repeatedly measured mediator, we conducted a simulation study. Both methods were evaluated using scenarios where we expected good performance as well as scenarios with data issues commonly found in observational datasets such as time-varying confounding and infrequent measurements of longitudinal variables. To assist other researchers interested in these methods, R code for generating truth data and simulated data as described in this manuscript is available from https://github.com/KamTan/MediationSimulation. Additionally, we use the results of this simulation study to aid in the interpretation of our analysis of the UK CF Registry dataset (see Section 5).
4.1.2 Data-generating mechanisms
Several different scenarios were studied and we begin by describing a reference scenario which is consistent with the assumptions of both methods. We consider a setting where there is a binary exposure

Illustration of the data-generating mechanism. The exposure
For each scenario considered, three sub-scenarios are studied: (1) both a direct and an indirect effect of the exposure on the outcome are present (“DE+IE”); (2) only an indirect effect is present, meaning there is no path from
The following steps were used to generate data according to the data generating mechanisms illustrated in Figure 4 for individuals Generate Generate Generate values for the longitudinal mediator measurement as random draws from: Generate the conditional hazard from: Generate event times Generate Calculate If At Generate an event indicator
The result is a dataset with values of
To further probe each method, we considered two additional scenario types: one with infrequent mediator measurements and one with time-varying confounding present. Table 1 provides a summary of the simulation scenarios investigated. To create the additional scenarios, some modification of the data generating procedure was needed as outlined below.
Listing of all simulation scenarios, the abbreviated name used in the Results section, the percent of simulated individuals experiencing an event prior to time
To create scenarios for investigating the impact of an infrequently measured mediator, the above described procedure was adapted to generate mediator values at time intervals of 0.25 (i.e. at times
Finally, to create scenarios with a time-varying confounder,
4.1.2 Estimands
We focus on two estimands: the indirect effect of the exposure on the outcome via the mediator and the direct effect of the exposure on the outcome not via the mediator (see equations (3) and (4) for the method of Vansteelandt and equations (9) and (10) for the method of Aalen). Total effect (TE) estimates are also reported for completeness. We do not consider proportion mediated as an estimand. Although it is intuitively appealing for quantifying mediation, it tends to have wide confidence intervals 1 and, when the total effect estimate is small, it becomes unstable as the denominator approaches zero.
4.1.3 Methods
Both the method of Vansteelandt and the method of Aalen were applied to each simulation scenario for effect estimation. We implemented the method of Vansteelandt with an additive hazards model as the simulated event times were generated under this assumption. We use the notation Vansteelandt
4.1.4 Performance measures
The primary performance measure assessed is bias in the estimates of DE(
Based on the results of several simulation runs, we expect the Var(
4.1.5 Generation of the true values of the estimands
The true values of the estimands were estimated using a large (
The same equations used to generate the simulated datasets were used except that the exposure was not affected by
4.1.6 Software
R v4.0.2
33
was used for all analyses and generation of simulated data. We used the
4.2 Results
4.2.1 Reference scenario
Using the reference scenario, the estimated TE, DE and IE were approximately unbiased for both methods for all three sub-scenarios (DE+IE, NoDE, NoIE). Full results are shown in Supplemental Information Table 3. The MCSE of all bias estimates was <0.005. The empirical standard error for the method of Aalen was lower in both the DE+IE and NoDE sub-scenarios leading to a relative efficiency greater than one (1.55–2.94) over the method of Vansteelandt (Supplemental Information Table 4).
The reference scenario was extended to study three settings with a time-varying effect of the mediator on the hazard: (R1) an effect that increases over time, (R2) an immediate effect only, and (R3) a delayed effect. Again, both methods produced approximately unbiased effect estimates (percent bias <2% and Monte Carlo standard error <0.005) at all time points for all sub-scenarios (results not shown).
4.2.2 Infrequent mediator measurements
For both methods and both scenarios (F1, F2) investigating the impact of infrequent mediator measurements, the estimated indirect effect was closer to 1.0 (no effect) than the true indirect effect in the DE+IE and NoDE scenarios. Figure 5 shows the estimated and true IE (left) and estimated and true DE (right) for scenarios F1–DE+IE and F2–DE+IE, using the method of Vansteelandt. Method of Aalen results were similar. In the F1–DE+IE scenario, both methods over-estimated the IE by 7% when 50% of individuals had experienced an event (time

Effect estimates using the method of Vansteelandt when the mediator is infrequently measured. On the left, the true IE (solid line) and the estimated IE (dashed line) are plotted over time. A similar plot for the true DE (solid line) and estimated DE (dashed line) is shown on the right. Estimates and true values for scenario F1 (
To better understand the source of this bias, we looked at estimates of the effect of the mediator on the hazard (

Results from 1000 simulation datasets of scenario F1–DE+IE using the method of Aalen to estimate the parameters from equation 7. On the left, the true value of
When the IE was overestimated (underestimated), the corresponding DE was underestimated (overestimated) resulting in an approximately unbiased estimated total effect. Supplemental Information Tables 5 and 6 provide complete results.
4.2.3 Time-varying confounders present
Scenarios L1, L2 and L3 include a time-varying confounder
Bias of effect estimates for scenario L6 with a time-varying covariate that is affected by the exposure. Percent bias is shown beneath the absolute bias in parentheses. Results are shown at times corresponding to the 20th, 50th (median) and 80th percentile of event occurrence for the DE+IE scenario. The Monte Carlo Standard Error was
5 Application to CF-related diabetes
5.1 Methods
Data were obtained from annual review records from the UK CF Registry between 2008 and 2017, inclusive. The study population consisted of all individuals in the UK CF Registry aged 18–60, with known genotype and at least two measurements of forced expiratory volume in 1 second (FEV1%), a key predictor of survival. From this group of 6374 individuals, we further excluded people who were pancreatic insufficient (to ensure positivity) and people who had been diagnosed with CFRD prior to the beginning of follow-up. We excluded these prevalent cases to avoid bias due to unknown duration of disease and focus only on incident cases of CFRD.35,36 The resulting cohort consisted of 3708 individuals with 18,693 annual review records.
The exposure was diagnosis of CFRD (Y/N) and the outcome was the composite of death from any cause or lung transplantation. The mediator studied was lung function, measured by FEV1%, a continuous variable. Five baseline confounders were adjusted for in all analyses: gender (M/F), genotype (F508del homozygous Y/N), calendar year, baseline FEV1% and baseline body mass index (BMI). To ensure proper temporal ordering of the data, baseline measurements were taken from the annual review prior to the one where the exposure was assessed and the first mediator measurement was taken from the annual review after exposure assessment. We also adjusted for time-updated measures of BMI (continuous) and respiratory infections (proxied by the number of days in hospital receiving IV antibiotics) when using the method of Vansteelandt. Hospital IV days was categorised into six categories as: 0, 1–7, 8–14, 15–21, 21–28 and >28 days as IV antibiotics are frequently given in week-long courses.
To create the analysis dataset, we assumed that measurements of the exposure, mediator and time-varying covariates were taken at integer-valued ages. For each age, 18–50 years, an age-specific dataset was created comprising all individuals at risk at that age who were either not diagnosed with CFRD or diagnosed with CFRD within the past year. In each age-specific dataset, time was reset to zero when CFRD was or was not diagnosed and age was included as a covariate. In this structure, each individual contributed data as an unexposed person at multiple ages (each age that they were at risk but not diagnosed) but only contributed data as an exposed person at the one age they were first diagnosed, if ever diagnosed. This allows us to make the best use of the longitudinal data in our situation where there is no natural time zero for an unexposed person. More details on the construction of the analysis dataset are available in the Supplemental Information.
Estimates are presented for IE using the same relative survival scale described in the simulation study. Estimates were computed every 0.1 years starting at time
5.2 Results
Both mediation analysis methods estimated the indirect effect of CFRD on survival via lung function to be modest in size. Figure 7 shows the estimated IE for each mediation method, as a function of time. Using the method of Vansteelandt, the estimated indirect effect increases in magnitude over time, reaching 0.996 at time

Results from the method of Vansteelandt and the method of Aalen. The indirect effect of CFRD on time to death or lung transplant via FEV1%, the mediator, is shown. Vertical bars at visit times illustrate 95% bootstrap confidence intervals.
6 Discussion
In this study, we explored two recently proposed methods for mediation analysis in a setting with a time-to-event outcome and a time-updated mediator using a simulation study and in our motivating example of CFRD. Both methods produced approximately unbiased estimates of TE, DE and IE in simulated scenarios consistent with their stated assumptions. When the mediator was measured infrequently or confounding was not controlled for, however, bias was seen in the effect estimates for both methods.
The presence of time-varying confounding of the mediator and outcome is likely in many settings where the mediator is repeatedly-measured over time. An important feature of the method of Vansteelandt et al.
2
is the ability to identify indirect and direct effects even when time-varying confounding exists. In six simulation scenarios with a time-varying covariate that influenced the mediator process, the method of Vansteelandt returned approximately unbiased effect estimates. Using the method of Aalen, which explicitly assumes that control for confounding can be accomplished with baseline covariates, there is no mechanism to adjust for values of
Substantial bias was found in the estimates of DE and IE for scenarios with infrequent mediator measurement. For example, this could occur in observational datasets where the mediator is a continuous biomarker measured periodically at infrequent visits. In this case, the analysis incorporates snapshots of the trajectory of the continuous biomarker but survival is affected in continuous time. This is a type of measurement error and it resulted in an attenuation of the effect estimates in our simulation study. Further, the bias accumulated over time. Strohmaier et al. 24 reported similar results but found that increasing the frequency of the mediator measurements did not necessarily improve estimation of the IE. Rather, less bias was seen when mediator measurements better represented the underlying biological process. Infrequently measured confounders may also contribute to the bias and this bias could be in either direction. Interesting avenues for future research would be to explore methods for mitigating the impacts of measurement error in mediators and infrequent measurements. Correcting for measurement error would require external information about the error, and Aalen et al. 3 have suggested a calibration approach based on work in VanderWeele 1 that may be useful when such information is available. A possible solution to the second issue could involve using mixed models or joint models to impute unmeasured longitudinal values of the mediators and covariates.
In the analysis of the UK CF Registry dataset, both methods produced similar effect estimates and found only a small indirect effect of CFRD on survival via lung function. The conclusion is that the majority of the total effect acts through pathways where the exposure does not first affect the mediator process. Because the primary cause of death in CF is respiratory failure and previous studies have shown that CFRD is associated with both increased mortality and decreased lung function, we hypothesised that the evidence for mediation would be greater. From the simulation study, we learned that indirect effect estimates may be attenuated if the mediator is measured infrequently. It is possible that annual measurements of lung function were not frequent enough for this process and that the estimated indirect effect was biased. Another potential source of bias is uncertainty in the mediator measurements. We do not believe this to be a substantial problem with the lung function measurements because of the standard laboratory procedures used and pre-planned measurement times at annual well visits. A further limitation is that some unexposed people later became exposed but their change in exposure status was not incorporated into the analysis. We chose not to censor them at the time they were diagnosed with CFRD because this would violate the assumption of non-informative censoring. The changing exposure status of some individuals may have biased the effect estimates and future work could include the specification of estimands for these two methods when the exposure is time-varying.
For the method of Aalen, we must posit an exposure separation. One possible mechanism for lung disease associated with diabetes is via a build-up of collagen in lung tissue leading to reduced elasticity.
37
As this aspect is unlikely to affect survival other than via lung function, we may envision a split of this aspect away from the other effects of glucose intolerance to obtain a theoretical exposure separation. The further assumption of the method of Aalen that only the most recent measurement of lung function affects the hazard seems unlikely to hold as there is evidence that previous values of FEV1% are significant predictors of the hazard at a given time in addition to the most recent value.
38
A causal interpretation for these analyses is also reliant upon the assumption of no unmeasured confounding. We attempted to control for confounding of the exposure-outcome and exposure-mediator relationships by adjusting for five baseline covariates which are known predictors of survival in CF but, as with all causal analyses using observational data, it is impossible to verify that confounding has been completely addressed. Controlling for confounding of the mediator-outcome relationship was via measurements of the two time-varying confounders (BMI and IV days) in the method of Vansteelandt. In the method of Aalen however, it was necessary to assume that sufficient control for confounding could be achieved using the baseline confounders. Because respiratory infections can be a time-varying common cause of both lung function and survival, this assumption is likely not valid and the method of Aalen results may be biased. In the method of Vansteelandt, a specific causal ordering of covariates and mediators was assumed where IV days and BMI measured at visit
Both of the methods studied here are valuable tools for mediation analysis in the setting of a survival outcome with a time-updated mediator. The method of Vansteelandt allows time-varying mediator-outcome confounding and can also be extended to multiple longitudinal mediators, a situation that will be common in clinical studies. The sample size required by the method of Vansteelandt to achieve the desired precision may be greater, particularly as the number of visit times increases because each regression is performed on the history of all of the covariates. At later visit times when there are more covariates in the model, there may be fewer observations due to fewer people having survived to that visit. An alternative evaluation technique for the method of Vansteelandt equation (5) is Monte Carlo integration but it requires jointly modelling the distribution of all variables. The method of Aalen offers fast computation times but is limited to the setting where time-varying confounding is believed not to exist. For some settings, it may be preferable to assume that the treatment or exposure can be split into separate biological components, as in the method of Aalen, than to contemplate a hypothetical intervention on the mediator. Also, further research into methods for assessing fit of these models would be helpful.
We have shown that both the method of Vansteelandt and the method of Aalen produce approximately unbiased results in a reference scenario consistent with both of their assumptions. Further, the method of Vansteelandt returned approximately unbiased effect estimates in a variety of scenarios where time-varying confounding was introduced. Both techniques rely on a number of assumptions to make causal statements and care should be taken with the interpretation of any analysis. The importance of discussions with experts on the clinical aspects of the data cannot be overstated.
Supplemental Material
sj-pdf-1-smm-10.1177_09622802221107104 - Supplemental material for Methods of analysis for survival outcomes with time-updated mediators, with application to longitudinal disease registry data
Supplemental material, sj-pdf-1-smm-10.1177_09622802221107104 for Methods of analysis for survival outcomes with time-updated mediators, with application to longitudinal disease registry data by Kamaryn T Tanner, Linda D Sharples, Rhian M Daniel and Ruth H Keogh in Statistical Methods in Medical Research
Footnotes
Acknowledgements
Data availability
Declaration of conflicting interests
Funding
Supplemental material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
