Abstract
1 Introduction
Multiple imputation (MI) has proven to be an extremely versatile and popular tool for handling missing data in statistical analyses. For a recent review, see Murray. 1 Its popularity is due to a number of factors. The imputation and analysis stages are distinct, meaning it is possible for one person to perform the imputation and another the analysis. It is flexible, in being able to accommodate various constraints and restrictions that the imputer or analyst may want to impose. Auxiliary variables can be used in the imputation process to reduce uncertainty about missing values or make the missing at random (MAR) assumption more plausible, yet need not be included in the analyst’s model.
In MI, the analysis model of interest is fitted to each imputed dataset. Estimates and standard errors from each of these fits are pooled using ‘Rubin’s rules’. 2 These give a point estimate as the simple average of the imputed data estimates. Rubin’s variance estimator combines the average within-imputation variance with the between-imputation variance in estimates. This requires an estimator of the complete data variance, which for most estimators is available analytically.
In Rubin’s original exposition, the estimand was a characteristic of a fixed finite population of which some units are randomly sampled and data are obtained. 2 Rubin defined conditions for an imputation procedure to be so-called ‘proper’ for a given complete data analysis. If in addition the complete data analysis gives frequentist valid inferences, MI using Rubin’s rules yields valid frequentist inferences.1–3 Subsequently, Rubin’s rules were criticised by some (e.g. Fay 4 ) because in certain situations Rubin’s variance estimator could be biased relative to the repeated sampling variance of the MI estimator. In response, Meng defined the concept of congeniality between an imputation procedure and an analyst’s complete (and incomplete) analysis procedure. 5 If an imputation and analysis procedure are congenial, this implies the imputation is proper for the analysis procedure. 6 Meng showed that for certain types of uncongeniality, Rubin’s variance estimator is conservative, ensuring the intervals have at least the advertised coverage level. 5 In other settings, however, it can be biased downwards, leading to under-coverage of confidence intervals. 7
Rubin’s rules have proved fantastically useful since MI’s inception, in particular because they facilitate the separation of imputation and analysis into two distinct parts and because they are so simple. Nevertheless, in settings where Rubin’s variance estimator is asymptotically biased, if feasible, the analyst may desire sharp frequentist valid inferences. Robins and Wang proposed a variance estimator which is valid without requiring congeniality or correct model specification. 7 Their estimator requires calculation of various quantities depending on the estimating equations corresponding to the particular choice of imputation and analysis models. As such, it is arguably harder to apply their approach when the imputer and analyst are separate entities. As far as we are aware, its use has been extremely limited thus far in practice due to these requirements.
Combining bootstrapping with MI was first suggested over 20 years ago, 8 and recently a number of papers have investigated a wider variety of approaches to combining them. Schomaker and Heumann investigated four variants which combined bootstrapping with MI. 9 Their motivation for exploration of using bootstrap with MI was for situations where an analytical complete data variance estimator is not available, or one is concerned that the MI estimator is not normally distributed. On the basis of theoretical and empirical investigation, they recommended three of the four variants for use. They did not explicitly seek to investigate performance under uncongeniality or model misspecification however. von Hippel and Bartlett proposed an alternative combination of bootstrapping with MI in the context of proposing frequentist type (improper) MI algorithms and noted that it would be expected to be valid under uncongeniality. 10 Lastly, Brand et al. investigated six different combinations of MI with bootstrapping in the context of handling skewed data and recommended using percentile bootstrap confidence intervals with single (stochastic) imputation. 11
In this paper, we investigate the properties of the different combinations of MI and bootstrap which have been recommended by these previous papers, giving particular emphasis to their validity under uncongeniality or model misspecification. In Section 2, we review MI, Rubin’s combination rules and congeniality. In Section 3, we describe the various combinations of bootstrapping and MI that have been recently recommended and consider their validity under uncongeniality or model misspecification. Section 4 presents two sets of simulation studies, empirically demonstrating the impacts of uncongeniality and model misspecification on the frequentist performance of the different variants. We conclude in Section 5 with a discussion.
2 MI using Rubin’s rules and congeniality
2.1 Rubin’s rules
In this section, we review MI and Rubin’s combination rules, following Meng
5
and Xie and Meng.
12
The
The analyst chooses a complete data estimation procedure which, given complete data
The MI estimate of
2.2 Congeniality
We now define congeniality between the imputation model and the analyst’s complete data procedure and show the implications of congeniality for inference using Rubin’s rules.5,12 The imputation model and the analyst’s complete data procedure are said to be congenial if there exists a unifying Bayesian model (referred to by For all
where 2. For all
where
Under congeniality, the posterior mean of
Next, under congeniality the posterior variance of
Assuming the embedding Bayesian model is correctly specified, the observed data posterior mean, which under congeniality is equal to
Of course, in practice the number of imputations
Its repeated sampling variance can then be expressed as
When the imputation and analysis models are not congenial, or they are but the embedding Bayesian model is misspecified, depending on the specific situation Rubin’s variance estimator can be biased upwards or downwards.5,7,14 We explore a range of examples in which uncongeniality or misspecification can arise in simulation studies described in Section 4.
Robins and Wang proposed a variance estimator for MI when each dataset is imputed using the maximum likelihood estimate of a parametric imputation model and the imputations are analysed using a non, semi or fully parametric model. 7 Their variance estimator is consistent without requiring the imputation and analysis models to be congenial nor even correctly specified. Hughes et al. compared Robins and Wang’s proposal to Rubin’s rules through a series of simulation studies where the imputation and analysis models were misspecified and/or uncongenial with each other. 14 They demonstrated that Rubin’s rules inference could be conservative or anti-conservative, whereas, at least for moderate or large sample sizes, inferences based on Robins and Wang’s proposal were valid across their simulation scenarios. Hughes et al. noted however that a major practical obstacle to the widespread use of Robins and Wang’s method is that its implementation is specific to the particular imputation and analysis models, and no software currently implements it.
3 Combining bootstrapping and MI
In this section, we review the combinations of bootstrapping and MI which have been recommended for use in the recent literature and consider their validity under uncongeniality and misspecification.
3.1 Imputation followed by bootstrapping
The first collection of methods we consider are where MI is first applied, and then bootstrapping is applied to each imputed dataset.
3.1.1 MI boot Rubin
The first combination considered (and recommended) by Schomaker and Heumann
9
is standard MI using Rubin’s rules, but using bootstrapping to estimate the within-imputation complete data variance:
Impute the missing values in the observed data For each imputed dataset For the For imputation
where 5. Rubin’s rules are then applied with
This approach is what has often been used when no analytical estimator for the complete data variance is available, or if one is concerned about whether the analysis model is correctly specified. In the latter case, a sandwich variance estimator has sometimes been used to attempt to provide robustness to misspecification. 14
Since this approach is application of Rubin’s rules with an alternative complete data variance estimator, we expect valid inferences when the imputation and analysis models are congenial and the embedding Bayesian model is correctly specified. This is supported by the setting 1 simulation results of Schomaker and Heumann. 9 Here, bivariate normal data were simulated, with the analysis model consisting of normal linear regression. The covariate of the analysis model was made MAR, and a bivariate normal imputation model was used. The imputation and analysis models were congenial, and the embedding bivariate normal model was correctly specified.
Under uncongeniality or misspecification, we should not expect valid inferences in general. This hypothesis is supported by Schomaker and Heumann’s setting 2 with high missingness simulation results, where we believe the imputation and analysis models are congenial but the embedding model is misspecified, and where coverage for one parameter was 91%. The analysis model here was again a normal linear regression and the imputation model a multivariate normal model for all variables, which are clearly congenial with a multivariate normal model. However, the embedding multivariate normal model was misspecified since some of the variables were binary. Despite this misspecification, Schomaker and Heumann stated that the point estimates were approximately unbiased, indicating the poor coverage was not due to bias in the point estimator.
3.1.2 MI boot pooled percentile
The second approach considered and recommended by Schomaker and Heumann
9
is the same as MI boot Rubin, except that Rubin’s rules are not (directly at least) used:
Impute the missing values in the observed data For each imputed dataset For the For point estimation of A
Under congeniality, this approach can be viewed as a route to obtaining a posterior credible interval, and if the embedding Bayesian model is correctly specified, we expect it to give valid inferences. This is because first draws are taken from the posterior of the missing data given observed, and second, conditional on these, bootstrapping and estimating the parameters by their maximum likelihood estimate is in large samples equivalent to taking a draw from the posterior given the imputed missing data and the observed data.
15
Note that here there is no complete data variance estimator being used, and so the congeniality requirement for the complete data procedure is only that
To explore this approach further, under congeniality we can express the estimate from the
where
The sample variance of the pooled sample of
Hence, if
Under uncongeniality or misspecification, there is no reason to expect this approach to result in valid inferences. Schomaker and Heumann’s setting 2 (where as described previously we believe the imputation and analysis models were congenial but the embedding model was misspecified) with high missingness simulation results support this, with coverages between 89% and 92%.
3.2 Bootstrap followed by MI
We now consider methods which first bootstrap sample the observed data and then apply MI to each bootstrap sample. This general approach to combining bootstrap with MI was proposed by Shao and Sitter 8 and Little and Rubin. 15
3.2.1 Boot MI percentile
Both Schomaker and Heumann
9
and Brand et al.
11
recommended calculating bootstrap percentile intervals to the estimator For each For point estimation of A
This approach is direct application of the standard percentile-based bootstrap confidence interval to the estimator
Brand et al. also found that the Boot MI percentile approach worked well in simulations.
11
They investigated it using either
3.2.2 Boot MI von Hippel
Of the various combinations of bootstrapping and imputation described, assuming the MI point estimator is consistent, only Boot MI percentile is expected to give confidence intervals that attain nominal coverage (asymptotically) under uncongeniality or model misspecification. A practical issue however is that the computational burden is high. For standard applications of MI, it is not uncommon now for
Given this variance components model, we have that
This shows that provided
If
4 Simulations
In this section, we report two simulation studies to empirically demonstrate the performance of the previously described combinations of bootstrapping and MI under uncongeniality or model misspecification.
4.1 Regression models under uncongeniality or misspecification
We first compared the previously described bootstrap and MI combination methods in four scenarios of uncongeniality or misspecification of the imputation model and complete data procedure using a simulation study based on one performed by Hughes et al. 14 This simulation study was based on fitting models to a dataset of standard anthropometric measurements of 951 young adults enrolled in the Barry Caerphilly Growth study. 21
Briefly, we simulated hypothetical datasets of one binary variable, sex, and four continuous variables, age, height, weight and natural log of insulin index (hereafter referred to as loginsindex). The data were generated under the following model
Median confidence interval width and coverage for the subgroup analysis (uncongenial) and heteroscedastic errors (misspecification) scenarios.
CI, confidence interval; CI cov., confidence interval coverage; MI, multiple imputation.
The analysis of interest was to estimate
For each scenario, we generated 1000 independent simulated datasets, where the sample size was 1000 observations and the probability of observing weight was 0.4, except for the subgroup analysis scenario where the probability of observing weight was 1 among women and 0.4 among men. We conducted MI Rubin using 10 imputations, and methods MI boot Rubin, MI boot pooled percentile and boot MI percentile with 10 imputations and 200 bootstraps, and von Hippel’s boot MI with two imputations and 200 bootstraps. Additionally, we applied boot MI percentile with one imputation and 200 bootstraps. Based on 1000 simulations, the Monte-Carlo standard error for the true coverage probability of 95% is
For all methods, the point estimates of
Tables 1 and 2 show the median of the confidence interval (CI) widths and CI coverage for the six methods under comparison. For the subgroup analysis scenario (Table 1), MI Rubin and both MI then bootstrapping methods resulted in confidence interval over-coverage. Narrower confidence intervals and nominal coverage were achieved with the boot MI percentile method with 10 imputations and boot MI von Hippel. Boot MI percentile with single imputation resulted in wide confidence intervals and over-coverage. This concurs with what was found in the simulations reported by Brand et al. In the Supplementary Appendix, we give a sketch argument for why the Boot MI percentile intervals with
Median confidence interval width and coverage for the omitted interaction (uncongenial) and moderate non-normality (misspecification) scenarios.
CI, confidence interval; CI cov., confidence interval coverage; MI, multiple imputation.
For the heteroscedastic errors scenario (Table 1), MI Rubin and both MI then bootstrapping methods resulted in confidence interval under-coverage. Again, the boot MI percentile method with 10 imputations and boot MI von Hippel were the best performing methods with close to nominal coverage. The results for the omitted interaction scenario (Table 2) followed a similar pattern noted for the subgroup analysis scenario. For the moderate non-normality scenario (Table 2), MI boot pooled percentile had slight confidence interval under-coverage and boot MI percentile with single imputation over-covered. The remaining methods had close to nominal coverage with similar median CI widths.
4.2 Reference-based imputation in clinical trials
Our second simulation study setting is so-called control or reference-based MI for missing data in randomised trials. Missing data due to study dropout are common in clinical trials, and there is often concern that missing data do not satisfy the MAR assumption. Often dropout in trials coincides with patients’ treatments changing. An increasingly popular approach to imputing missing data in trials is using so-called reference or control-based MI approaches.
23
These involve constructing the imputation distribution for the active treatment arm using a combination of information from the active and control arms, which results in uncongeniality between imputation and analysis models. This uncongeniality results in intervals constructed using Rubin’s variance estimator to over-cover.24,25 Cro et al. have suggested that although Rubin’s variance estimator is biased for the repeated sampling variance of the estimator, it consistently estimates a sensible variance in the context of MAR sensitivity analyses.
26
We do not enter this debate here, but merely investigate the previously described bootstrap and MI combinations in regards to their ability to produce confidence intervals with the correct repeated sampling coverage. In the setting of reference-based MI, Quan et al. applied (we believe) Boot MI to estimate standard errors of
We simulated 10,000 datasets of size
The analysis model was normal linear regression of
Table 3 shows the median confidence interval width and coverage for each of the combinations of bootstrapping and MI previously described with
Median confidence interval width and coverage under MAR (congenial and correctly specified), jump to reference (uncongenial and correctly specified) imputation from 10,000 simulations.
Times shown indicate median execution time for each method on one dataset. MAR, missing at random; CI, confidence interval; CI cov., confidence interval coverage; MI, multiple imputation.
Both Boot MI percentile
The median times to run each method on a single simulated data show unsurprisingly that all the bootstrap methods take much longer to run than standard Rubin’s rules without bootstrapping. Among the bootstrap methods, Boot MI percentile with
5 Discussion
We have reviewed a number of proposals for combining MI with bootstrapping, in particular with regards to their statistical validity when imputation and analysis procedures are uncongenial or misspecified. When the imputation and analysis procedures are congenial, and the embedding model is correctly specified, Rubin’s rules (without bootstrapping), MI boot Rubin, Boot MI percentile (provided
When the imputation and analysis procedures are uncongenial and/or misspecified, only the Boot MI percentile (with moderate
As mentioned in the Introduction, Rubin originally envisaged the imputer and analyst as distinct individuals, with the imputer releasing a single set of multiply imputed datasets to different analysts. A strength of the bootstrap followed by MI approach is that this division of roles is still feasible – the imputer bootstraps and then multiply imputes the observed data, releasing a set of imputations clustered by bootstrap. These can then be analysed by different analysts, and inferences can be obtained using either the boot MI percentile or Boot MI von Hippel approaches.
Combining bootstrapping with MI has some disadvantages compared to inference using Rubin’s rules. Compared to regular MI with Rubin’s rules, it is considerably more computationally intensive (Table 3) – this is the price paid for being able (in certain situations) to obtain valid inferences under uncongeniality or misspecification. Problems with model (imputation or analysis) convergence are probably more likely to occur due to the large number of bootstraps required. The non-parametric resampling scheme used by bootstrapping relies on an assumption that the data are independent and identically distributed, and further research is warranted to explore the use of other types of bootstrap resampling schemes in conjunction with MI.
Codes for the first simulation study (R) and the second simulation study (Stata) are available from https://github.com/jwb133/bootImputePaper.
Supplemental Material
sj-pdf-1-smm-10.1177_0962280220932189 - Supplemental material for Bootstrap inference for multiple imputation under uncongeniality and misspecification
Supplemental material, sj-pdf-1-smm-10.1177_0962280220932189 for Bootstrap inference for multiple imputation under uncongeniality and misspecification by Jonathan W Bartlett and Rachael A Hughes in Statistical Methods in Medical Research
Footnotes
Acknowledgements
Declaration of conflicting interests
Funding
Supplemental material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
