Abstract
Keywords
1 Introduction
Mendelian randomisation analyses use genetic variants as instrumental variables (IVs) to make causal inferences about the effect of modifiable risk factors on health- and disease-related outcomes in the presence of unobserved confounding of the relationship of interest. 1 – 5 Use of Mendelian randomisation is growing rapidly. 4 – 7 However, using genetic variants as IVs poses statistical challenges.5,8– 11 In particular, there is a need for large sample sizes because of the relatively small proportion of variation in risk factors typically explained by genetic variants.5,12,13
Recent decreases in genotyping costs and increases in genome-wide association studies (GWAS), have facilitated discovery of a substantial number of genetic variants associated with risk factors and disease-related outcomes, such as adiposity 14 – 16 and type 2 diabetes. 17 – 27 Consideration of multiple instruments for Mendelian randomisation applications is therefore timely due to increasing availability of suitable variants. In this article we discuss the use of multiple genetic variants as IVs, both for increasing statistical precision and for testing underlying IV assumptions.
The structure of the article is as follows: we describe instrumental variable assumptions (Section 1.1) and introduce an illustrative Mendelian randomisation analysis and present separate IV estimates for four instruments (Section 2). We then discuss the use of multiple instruments to help address some of the genetic and statistical issues that can affect Mendelian randomisation analyses (Sections 3 and 4), including the results of simulation studies (Section 5). We return to the example and simulation to compare IV estimates using multiple instruments and allele scores (Section 6), assess the impact of missing data (Section 6.2) and discuss the implications of our findings (Section 7).
1.1 Instrumental variable assumptions
An IV (instrument)
In the context of Mendelian randomisation, these assumptions can be expressed as: genotype is associated with the modifiable risk factor of interest (assumption 1); genotype is independent of unmeasured confounding factors that could bias conventional epidemiological associations between the risk factor and the outcome (assumption 2); genotype is related to the outcome only via its association with the risk factor (assumption 3). The second assumption can be justified through Mendel’s laws when applied to independent heritable units.5,28
If we further assume that intervention on the risk factor only affects the value of the risk factor, and hence affects the outcome only through this induced change in the risk factor, then the IV assumptions imply the ‘exclusion restriction’11,29 and its weaker form known as ‘conditional mean independence’ (used in structural mean models). 30 This additional assumption allows causal inferences to be drawn from IV analyses.
2 Illustrative Mendelian randomisation analysis: single instrument estimates
Our example investigates the causal effect of fat mass on bone mineral density (BMD) using four genotypes known to be associated with adiposity from previous GWAS. A previous study found a positive effect of fat mass on BMD using SNPs associated with the
2.1 Data
Our example uses data from the Avon Longitudinal Study of Parents and Children (ALSPAC). 32 ALSPAC is a longitudinal, population-based birth cohort study that recruited 14 541 pregnant women resident in Avon, UK, with expected dates of delivery 1 April 1991 to 31 December 1992 (http://www.alspac.bris.ac.uk). 32 Out of this 13 988 live born infants survived to at least one year of age. Children eligible for inclusion in our analysis: (1) had DNA available for genotyping; (2) attended the research clinic at age 9 and (3) had complete data on height and dual energy X-ray densitometry (DXA) scan-determined total fat mass and total BMD.
2.2 Selection of genotypes
Eleven adiposity-related SNPs identified in previous GWAS have been genotyped in ALSPAC. For these analyses we decided
The IV assumptions can be uniquely encoded in a directed acyclic graph (DAG).
11
The proposed DAG for our examplar multiple instrument model is shown in Figure 1.
DAG for a Mendelian randomisation analysis using four genetic variants as instrumental variables for the effect of fat mass on bone mineral density.
2.3 Statistical methods
Study participant characteristics, total eligible children
HWE: Hardy–Weinberg Equilibrium.
IV estimation used the two-stage least squares (TSLS) estimator implemented in the user written Stata command ivreg2. 35 – 37 The Hausman test of endogeneity 38 was used to compare the difference between the ordinary-least-squares (OLS) and TSLS estimates using the user-written Stata command ivendog. 35 (In econometrics a risk factor affected by unmeasured confounding factors, such that the assumptions of linear regression are violated, is termed an endogenous variable.) In models including multiple instruments the Sargan test of over-identification (discussed in Section 4.1), available in the ivreg2 command, was used to test the joint validity of the instruments. 39
2.4 Results for separate instruments
Table 1 shows characteristics of the 5 509 eligible children. Of these, 5 091 (92%) had valid genotype data for
Associations of genotypes with potential confounding factors
MEA: Mother’s highest educational achievement is a binary variable derived from the groups 0 = CSE, O-level, Vocational and 1 = A-level and degree.
HHSC: Head of household social class coded as categorical variable I, II, III non-manual, III manual, IV and V.
Assuming an additive genetic model.
OLS and IV estimates of the effect of fat mass on bone mineral density (BMD) based on complete case analysis,
Analyses adjusted for height and height squared.
For a 1 unit increase in
The first stage
3 Using multiple instruments to address potential biases in Mendelian randomisation analyses
Population stratification, linkage disequilibrium and pleiotropy have been identified as factors that could bias Mendelian randomisation analyses.2,5,11,40 We briefly describe them, and the use of multiple instruments to address issues they raise.
3.1 Population stratification
Population stratification occurs when a sample is composed of a mixture of populations and so contains latent ancestral structure. If there are corresponding differences in the prevalence of the outcome of interest by this structure, then genotype-risk factor associations may result from the presence of ancestrally informative alleles rather than biological function. 41 Some genetic variants that are potential candidates for use as IVs in Mendelian randomisation studies could have been influenced by such population stratification.5,42– 45 Population stratification therefore has the potential to bias estimates of causal effects in Mendelian randomisation studies. 5
3.2 Linkage disequilibrium
Linkage disequilibrium (LD) is correlation between allelic states at different loci on a stretch of the same chromosome when assessed within a population. LD is a function of the frequency of recombination and is subject to regional genomic characteristics as well as more stochastic processes which may be influenced by the physical distance between two loci as well as the relative age of the population in question. Extensive LD can increase the statistical power of a study to detect genotype-risk factor associations and is exploited in GWAS studies where an LD-based set of tag SNPs is chosen to maximise the amount of genetic variation captured per SNP.46,47 SNPs that are associated with phenotypes in GWAS are unlikely to be functional variants, but rather to be in LD with the unknown functional variant(s).46,47 IV assumptions are not violated when tag SNPs are used as IVs, providing that they are in LD only with the functional variant(s).5,11 However, if tag SNPs are also in LD with a variant that affects the outcome of interest via a pathway that does not include the risk factor of interest the IV assumptions will be violated. 5
3.3 Pleiotropy
Pleiotropy refers to a single gene having multiple biological functions. In the context of Mendelian randomisation analyses, SNPs in or near genes with pleiotropic effects that directly or indirectly influence the outcome other than through the risk factor of interest violate the IV assumptions. 11 In our example, if any of the adiposity variants had effects on pathways that influence BMD other than through adiposity, for example, if they influenced calcium or vitamin D metabolism, then IV assumptions would not hold.
3.4 Use of multiple instruments
Population stratification and pleiotropy can to some extent be dealt with by using ethnically homogenous study populations, identifying and incorporating population strata in the analysis and ensuring that the function of the genetic instrument is well understood. 5 Comparison of IV estimates based on multiple genetic variants with independent effects on the risk factor of interest provides an additional way to identify bias resulting from these issues. If IV estimates from different variants are similar, it is less plausible that LD or pleiotropy are present.
Comparison of IV estimates from independent genetic variants is analogous to comparing the results of RCTs of different classes of blood pressure lowering drugs, which lower blood pressure by different mechanisms. If the effect of the drug on stroke risk in each RCT is proportional to the direction and magnitude of its effect on blood pressure, this strengthens the evidence for a causal link between blood pressure and stroke risk, and against the drugs having effects on stroke risk through other mechanisms. Such consistency would also argue against the possibility that the trials were affected by methodological flaws that biased their results.
It is possible that separate IV estimates could be identical but biased to a similar extent by population stratification, because stochastic- or selection-driven non-independence that is not predicted by LD profiles could influence more than one genetic variant that affects a given risk factor. Databases such as dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/) that provide the fixation index
4 Statistical issues relating to use of multiple instruments in Mendelian randomisation analyses
4.1 Over-identification
Over-identification refers to the situation when there is more than one instrument for a single risk factor of interest or, more generally, when there are more instruments than endogenous variables. In such circumstances testing the ‘over-identification restriction’ checks the joint validity of multiple instruments by testing whether they give the same estimates when used singly or in linear combination. There are two commonly used tests of over-identification; the Hansen test and the Sargan test.39,48 Rejection of an over-identification test is taken to indicate that at least one of the instruments is not valid (i.e., it does not give the same estimate as the other instruments). 49
Verifying that the genotypes are independent of the measured confounding factors (Table 2) is an indication of the validity of the instruments. 50 However, genotypes could still be associated with unmeasured confounders.
4.2 Finite sample bias and instrument strength
IV estimators such as TSLS are asymptotically unbiased but biased in finite samples, with such bias inversely proportional to the amount of phenotypic variability explained by the instrument.
51
Two closely related measures of this are the first-stage regression
In Mendelian randomisation the first stage
As
4.3 Statistical power
Genotypic effects on phenotypes are typically small, so Mendelian randomisation analyses can require very large sample sizes to obtain adequate power.5,13 When multiple instruments are used in the TSLS estimator, the resulting IV estimate can be viewed as the efficient linear combination of the separate IV estimates. 61 Provided that each instrument is valid, use of multiple instruments will increase the precision of the IV estimate compared with the separate IV estimates. 61 Donald and Newey investigated the trade off for multiple instruments where increasing precision can also increase bias, and suggested using the instruments that minimise an approximate mean squared error (MSE) criterion. 62 Pierce et al. recently estimated the power of Mendelian randomisation studies in a range of settings, using both single and multiple genetic instruments. 13
In studies where genetic data are not obtained from GWAS (in which imputation based on LD is typically performed) there are typically some missing observations for each genetic variant, due to failure of genotyping or ambiguous genotype allocation. Missing data typically occur in different individuals for each variant. They can therefore result in a considerable cumulative reduction in the number of individuals with complete data on all genotypes, and hence reduce the power of multiple instrument Mendelian randomisation analyses. One approach to dealing with missing data is multiple imputation. 63 Whilst there has been considerable research into methods of imputation we are not aware of specific research into appropriate multiple imputation models for IV estimation.
4.4 Use of an allele score as an instrumental variable
An allele score is a weighted or unweighted sum of the number of ‘risk’ alleles across several genotypes: weights are usually based on each genotype’s effect on the phenotype. Use of such scores is becoming more common in gene–disease association studies. 64 – 66 To justify the use of an allele score the genotypes should have an approximately additive effect on the risk factor. For an unweighted score they should also have similar per allele effects.
The use of an allele score as a single IV, compared with multiple instruments, will cause the first stage
In general, using an unweighted allele score will have lower power than the multiple instrument approach, since the latter will estimate the efficient linear combination of the genotypes. 61 Given appropriate weighting, results from IV analyses using weighted allele scores will be similar to the multiple instruments approach.
5 Multiple instrument simulations
We investigated the use of multiple instruments through two simulations both based on our example. Specifically, we investigated bias and precision of IV estimates including: (i) additional non-weak instruments and (ii) weak instruments.
5.1 Simulation 1: non-weak instruments
Data were simulated as follows, where
The values of the coefficients on the genotypes were chosen so that OLS estimate of the regression of TSLS using TSLS using TSLS using TSLS using an unweighted allele score of TSLS using a weighted allele score of
We used 10 000 replications, each with a sample size of 5 000 observations. Weighted allele scores were generated by summing each genotype multiplied by its estimated coefficient from the linear regression of the risk factor on that particular genotype, divided by the sum of weights. We derived the average bias, MSE, average SE of the IV estimates, coverage, average
5.2 Simulation 1: results
Simulation 1 (non-weak instruments): results (Monte Carlo standard error reported in brackets beside each estimate)
MSE: mean squared error, SE: standard error, TSLS: two-stage least squares, OLS: ordinary least squares.
Models 4 and 6, (multiple instruments using the three genotypes and weighted allele score), had almost identical properties and had the smallest MSE. Model 3 (multiple instruments using
Figure 2 shows that power increased as the number of instruments increased. The power using the unweighted allele score was similar to that using Simulation 1 (non-weak instruments): power curves.
5.3 Simulation 2: non-weak and weak instruments
Data were simulated with four IVs as follows such that OLS estimate from regression of TSLS estimate using TSLS estimate using TSLS estimate using TSLS estimate using an unweighted allele score of TSLS estimate using a weighted allele score of TSLS estimate using an unweighted allele score of TSLS estimate using a weighted allele score of
We used 10 000 replications, each with a sample size of 5 000 observations. We also plotted power curves for testing
5.4 Simulation 2: results
Table 5 shows that models 3 and 6, using the two non-weak IVs as multiple instruments and just these two in a weighted allele score, had the smallest bias. However, models 4 and 8, using all four genotypes as multiple instruments and all four in the weighted allele score, had the smallest MSE and near identical properties to one another, the only difference being that the average Simulation 2 (non-weak and weak instruments): power curves. Simulation 2 (non-weak and weak instruments): results (Monte Carlo standard error in brackets beside each estimate) MSE: mean squared error, SE: standard error, TSLS: two-stage least squares, OLS: ordinary least squares.
6 Example revisited: multiple instrument estimates and assessment of missing data
6.1 Multiple instrument estimates
The lower half of Table 3 presents IV estimates using two, three and four genotypes and the unweighted and weighted allele scores. The estimated ratios of geometric means were similar, between 1.63 and 1.73, except for the estimate using the unweighted allele score (1.40). Consistent with the simulation studies, the smallest SEs were for the IV estimates using four SNPs and the weighted allele score. For each multiple instrument model, the Sargan over-identification test provides little evidence against the joint validity of the instruments. The Hausman tests suggest that the IV estimates using multiple instruments differ from the OLS estimate.
The SE of the IV estimate using all four SNPs was 0.12, approximately 20% smaller than that of the IV estimate using
6.2 Assessment of missing data
IV estimates of the effect of fat mass on bone mineral density (BMD) using all available data a
Analyses adjusted for height and height squared.
For a 1 unit increase in
7 Discussion and conclusion
Mendelian randomisation studies using genetic variants as instruments can control for unmeasured confounding and reverse causation, which can bias results from standard epidemiological analyses. However, population stratification, LD and pleiotropy can all affect the validity of the IV assumptions underlying Mendelian randomisation analyses. Obtaining similar IV estimates from separate independent instruments provides evidence against the presence of bias from pleiotropy and LD, though not bias from population stratification. In our example there was no evidence that the estimates for each instrument differed from each other (based on the over-identification test), providing some reassurance that bias from pleiotropy and LD is unlikely. However, we acknowledge in this example our power to detect differences between the estimates was limited.
Mendelian randomisation analyses require large sample sizes unless the instrument is strongly related to the risk factor (phenotype) of interest. Use of multiple genetic variants as IVs increases the power of such analyses and facilitate tests of the IV assumptions that are not possible in single instrument analyses (such as the test of over-identification). However, inclusion of instruments that explain only a small proportion of the variability in the phenotype can increase finite sample bias of IV estimates. We have limited our consideration to the linear IV model. Non-linear models that naturally arise for discrete outcomes require different treatment. 11
Our illustrative Mendelian randomisation analysis confirmed a positive causal effect of adiposity (fat mass) on BMD, in line with previous research 31 and suggested that the size of this effect was larger than that estimated by ignoring unmeasured confounding and using ordinary least squares, based on the Hausman endogeneity test. The SE of the IV estimate decreased by around 20% using all four genotypes, compared with the SE of the IV estimate using only the genotype with the strongest effect on risk factor. Such a reduction in SE corresponds to a 56% increase in sample size.
With increasing availability of multiple genetic variants associated with the same risk factor or disease outcome, it is becoming common for genetic association studies to report associations with allele scores.64,65 Before an allele score is used as an IV the joint validity of the SNPs should be assessed using an over-identification test. The weights used in weighted allele scores may be internal or external to the study: when internally estimated the single degree of freedom used in the
Another consequence of the large number of genetic variants that are being indentified in GWAS in relation to particular phenotypes is that it is possible to generate many independent combinations of such variants and from these many independent IV estimates of the causal effect of a risk factor on a disease outcome. These independent estimates will not be plausibly influenced by any common pleiotropy or LD-induced confounding, and therefore if they display consistency would provide strong evidence against the notion that reintroduced confounding is generating the effect.67,68
There are typically missing data on each genetic variant, due to failure of genotyping or ambiguous genotype allocation. Thus in multiple instrument analyses, missing genotype data can offset improvements in power compared with single instrument analyses. It may be reasonable to assume that the mechanism causing genetic data to be missing is independent of a particular analysis of interest, so this may not be a cause of bias. There is scope for methodological research into multiple imputation strategies for IV estimators. It might also be possible to impute missing data for single SNPs by exploiting the LD structure between SNPs in LD with them, as is common in GWAS. 69 In the ALSPAC study, maternal genotypes are available, which could also be used to impute missing offspring genotypes.
In conclusion, the use of multiple genetic instruments increases the statistical power of Mendelian randomisation analyses and provides opportunities to test IV assumptions.
