Sage Journals: Discover world-class research

Abstract

Motivated by empirical arguments that are well known from the genome-wide association studies (GWAS) literature, we study the statistical properties of linear mixed models (LMMs) applied to GWAS. First, we study the sensitivity of LMMs to the inclusion of a candidate single nucleotide polymorphism (SNP) in the kinship matrix, which is often done in practice to speed up computations. Our results shed light on the size of the error incurred by including a candidate SNP, providing a justification to this technique to trade off velocity against veracity. Second, we investigate how mixed models can correct confounders in GWAS, which is widely accepted as an advantage of LMMs over traditional methods. We consider two sources of confounding factors—population stratification and environmental confounding factors—and study how different methods that are commonly used in practice trade off these two confounding factors differently.

Get full access to this article

View all access options for this article.

References

Bacanu

, Devlin

, and Roeder

2002. Association studies for quantitative traits in structured populations. Genet. Epidemiol. 22, 78–93.

Bertram

, Lange

, Mullin

, et al. 2008. Genome-wide association analysis reveals putative Alzheimer's disease susceptibility loci in addition to apoe. Am. J. Hum. Genet. 83, 623–632.

Bonnet

2018. Heritability estimation in case-control studies. Electron. J. Statist. 12, 1662–1716. [Epub ahead of print]; DOI: 10.1214/18-EJS1424.

Campos

, Sorensen

, and Gianola

2015. Genomic heritability: What is it?. PLoS Genet. 11, e1005048.

Devlin

, and Roeder

1999. Genomic control for association studies. Biometrics, 55, 997–1004.

Devlin

, Roeder

, and Wasserman

2001. Genomic control, a new approach to genetic-based association studies. Theor. Popul. Biol. 60, 155–166.

Dicker

L.H.

, and Erdogdu

M.A.

2016. Maximum likelihood for variance estimation in high-dimensional linear models. In Gretton, A., and Robert, C.C., eds. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:159–167. Cadiz, Spain.

Freedman

M.L.

, Reich

, Penney

K.L.

, et al. 2004. Assessing the impact of population stratification on genetic association studies. Nat. Genet. 36, 388.

Haldar

, and Ghosh

2012. Effect of population stratification on false positive rates of population-based association analyses of quantitative traits. Ann. Hum. Genet. 76, 237–245.

10.

Heckerman

2018. Accounting for hidden common causes when inferring cause and effect from observational data. arXiv preprint arXiv:1801.00727.

11.

Hoffman

G.E.

2013. Correcting for population structure and kinship using the linear mixed model: Theory and extensions. PLoS One, 8, e75707.

12.

Jêdrzejczak

2005. Family and environmental factors of drug addiction among young recruits. Mil. Med. 170, 688–690.

13.

Jiang. J., Li

, Paul

, Yang

, et al. 2016. On high-dimensional misspecified mixed model analysis in genome-wide association study. Ann. Stat. 44, 2127–2160.

14.

Kang

H.M.

, Sul

J.H.

, Zaitlen

N.A.

, et al. 2010. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354.

15.

Kang

H.M.

, Zaitlen

N.A.

, Wade

C.M.

, et al. 2008. Efficient control of population structure in model organism association mapping. Genetics, 178, 1709–1723.

16.

Korte

, Vilhjálmsson

B.J.

, Segura

, et al. 2012. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat. Genet. 44, 1066–1071.

17.

Lippert

, Listgarten

, Liu

, et al. 2011. Fast linear mixed models for genome-wide association studies. Nat. Methods, 8, 833–835.

18.

Listgarten

, Lippert

, and Heckerman

2013. Fast-lmm-select for addressing confounding from spatial structure and rare variants. Nat. Genet. 45, 470–471.

19.

Listgarten

, Lippert

, Kadie

C.M.

, et al. 2012. Improved linear mixed models for genome-wide association studies. Nat. Methods, 9, 525–526.

20.

Loh

P.R.

, Tucker

, Bulik-Sullivan

B.K.

, et al. 2015. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290.

21.

Maldonado

Y.M.

2009. Mixed models, posterior means and penalized least-squares. Lect. Notes Monogr. Ser. 57, 216–236.

22.

Mennis

, Stahler

, and Mason

2016. Risky substance use environments and addiction: A new frontier for environmental justice research. Int. J. Environ. Res. Public Health, 13, 607.

23.

Patterson

, Price

A.L.

, and Reich

2006. Population structure and eigenanalysis. PLoS Genet. 2, e190.

24.

Pazokitoroudi

, Wu

, Burch

K.S.

, et al. 2020. Efficient variance components analysis across millions of genomes. Nat. Commun. 11, 1–10.

25.

Sankararaman

2019. Fast estimation of genetic correlation for biobank-scale data, 322. In Cowen, L.J., ed. RECOMB, Springer, Washington, DC, USA.

26.

Schwartzman

, Schork

A.J.

, Zablocki

, et al. 2019. A simple, consistent estimator of snp heritability from genome-wide association studies. Ann. Appl. Stat. 13, 2509–2538.

27.

Segura

, Vilhjálmsson

B.J.

, Platt

, et al. 2012. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat. Genet. 44, 825–830.

28.

Siva

2008. 1000 genomes project. Nature Biotech. 26, 256–257.

29.

Steinsaltz

, Dahl

, and Wachter

K.W.

2018. Statistical properties of simple random-effects models for genetic heritability. Electron. J. Statist. 12, 321–358.

30.

Sun

, Zhu

, Mozaffari

, et al. 2018. Heritability estimation and differential analysis of count data with generalized linear mixed models in genomic sequencing studies. Bioinformatics, 35, 487–496.

31.

Svishcheva

G.R.

, Axenovich

T.I.

, Belonogova

N.M.

, et al. 2012. Rapid variance components-based method for whole-genome association analysis. Nat. Genet. 44, 1166–1170.

32.

Thompson

W.A.

1962. The problem of negative estimates of variance components. Ann. Math. Stat. 33, 273–289.

33.

Tucker

, Loh

P.R.

, MacLeod

I.M.

, et al. 2015. Two-variance-component model improves genetic prediction in family datasets. Am. J. Hum. Genet. 97, 677–690.

34.

Vilhjálmsson

B.J.

, and Nordborg

2013. The nature of confounding in genome-wide association studies. Nat. Rev. Genet. 14, 1–2.

35.

Wacholder

, Rothman

, and Caporaso

2002. Counterpoint: Bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer. Cancer Epidemiol. Biomarkers Prev. 11, 513–520.

36.

Wang

, Aragam

, and Xing

E.P.

2017. Variable selection in heterogeneous datasets: A truncated-rank sparse linear mixed model with applications to genome-wide association studies. IEEE BIBM, 2017, 431–438.

37.

Wang

, Localio

, and Rebbeck

T.R.

2004. Evaluating bias due to population stratification in case-control association studies of admixed populations. Genet. Epidemiol. 27, 14–20.

38.

, and Sankararaman

2018. A scalable estimator of snp heritability for biobank-scale data. Bioinformatics, 34, i187–i194.

39.

Yang

, Benyamin

, McEvoy

B.P.

, et al. 2010. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569.

40.

Yang

, Zaitlen

N.A.

, Goddard

M.E.

, et al. 2014. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46, 100–106.

41.

, Pressoir

, Briggs

W.H.

, et al. 2006. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208.

42.

Zhou

2017. A unified framework for variance component estimation with summary statistics in genome-wide association studies. Ann. Appl. Stat. 11, 2027.

43.

Zhou

, and Stephens

2012. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.42 MB

0.00 MB

Trade-offs of Linear Mixed Models in Genome-Wide Association Studies

Abstract

Get full access to this article

References

Supplementary Material