Abstract
The flourishing growth of genome-wide association studies (GWASs) has provided comprehensive understanding of genetic determinants of disease susceptibility,1,2 shedding light on better prevention and treatment of diseases. The results from GWAS suggested the existence of “polygenicity” for complex diseases, which means that a complex disease is often affected by many variants with small effects. Due to polygenicity, limited sample size of a single GWAS often has a relatively low statistical power of association identification and poor predictive ability.
To this end, many methods have been proposed to effectively improve statistical efficiency by combining multiple data sets.3,4 These methods might take different types of data as input; integrating different sources of data is often feasible by leveraging pleiotropy.5,6 Recently, we have proposed a statistical method named LEP 7 to integrate the individual-level genotype data and summary statistics in GWASs. LEP and other statistical methods that integrate individual-level data and summary-level data are becoming increasingly important. This is because we often have limited individual-level data (usually a few thousands of samples at hand) but can get access to summary-level data through many public gateways. Working on limited samples with individual-level data may lead to great uncertainty on the estimation of genetic effects on a complex trait. Fortunately, genome-wide summary-level data bring additional information about genetic effects on the trait. LEP explores this kind of information in the joint analysis of individual-level data and summary-level data.
Originally, LEP was designed to integrate multiple traits of the same population by exploring pleiotropy among them. More specifically, pleiotropy means that a variant can affect multiple seemingly unrelated traits. LEP integrates the individual-level data and the summary-level data by modeling their pleiotropic relationship. By introducing
Comprehensive simulation studies and real-data analysis demonstrated the effectiveness of LEP by leveraging pleiotropy in the presence of heterogeneity among the individual-level and summary-level data.
For a given trait/disease, GWASs have been conducted in different populations. As a matter of fact, many GWASs have been conducted in the populations of European ancestry. Because the allele frequency and linkage disequilibrium (LD) pattern of samples from different populations can be quite different,6,8,9 heterogeneity of genetic effects widely exists and the discoveries in 1 population could not be directly transferred to another population. The study of different approaches to deal with the heterogeneous genetic effects in different populations is gaining increasing attention. Although LEP was designed to explore pleiotropy among different traits, the essential idea of LEP is to make use of the correlation of association status of multiple GWASs while accounting for the heterogeneity. Clearly, the probabilistic model given in equation (1) can account for heterogeneity in the presence of either pleiotropy or correlated genetic effects of the same trait in different populations. The pair of parameters
As an illustrative example, we applied LEP to analyze GWAS data of Crohn’s disease (CD) from several different populations. The individual-level data are from the Welcome Trust Case Control Consortium (WTCCC).
10
The summary-level data of CD are from the study by Franke et al,
11
composed of the
Information of the GWAS data for Crohn’s disease from different populations.
Abbreviations: GWAS, genome-wide association studies; SNP, single-nucleotide polymorphism; WTCCC, Welcome Trust Case Control Consortium.
After extracting overlapped SNPs of individual-level data (after quality control) and summary statistics, we had the individual-level data
Estimated parameters
Abbreviation: GWAS, genome-wide association studies.
Accuracy is calculated from 10 replications.
In summary, LEP can effectively account for heterogeneity when integrating individual-level data and summary-level data from GWAS. As a result, not only can LEP be applied to leverage pleiotropy for analysis of multiple traits in the same population but also it can serve as an effective tool to analyze the same trait across different populations.
