LLR: A latent low-rank approach to colocalizing genetic risk variants in multiple GWAS
Genome-wide association studies (GWAS), which typically assay millions of single nucleotide polymorphisms (SNPs) in thousands of individuals, have been widely used to identify risk SNPs underlying human complex phenotypes (quantitative traits or diseases). Most of conventional statistical analysis in GWAS only investigate one phenotype at a time. Recently, an increasing number of reports suggest the ubiquity of pleiotropy, i.e., many complex phenotypes share common genetic bases. This motivates us to develop new statistical approaches to joint analysis of multiple GWAS by leveraging pleiotropy. In this study, we propose a latent low-rank (LLR) approach to colocalizing risk genetic variants, using summary statistics. In the presence of pleiotropy, there exist risk loci affecting multiple phenotypes and thus their association statuses in multiple GWAS are no longer independent but correlated. To make use of the correlation, we introduce an low-rank structure to module the probabilities of the latent association statuses between loci and phenotypes. To make LLR computationally efficient, we have developed a novel expectation-maximization-path (EM-path) algorithm which greatly facilitate model selection and inference. We demonstrate the advantages of LLR over its competitions through simulation studies and joint analysis of 18 GWAS data sets.