We provided a step-by-step guide for performing colocalization analysis using the coloc package in R.
This guide provides a detailed walkthrough on the visualization of colocalization results in R. The visualization aims to represent the association of specific SNPs with two different phenotypes, typically GWAS and eQTL datasets, to understand shared genetic architectures.
The script coloc_Visualizition.R
provides a step-by-step approach to generate the colocalization plot.
if(!require("remotes"))
install.packages("remotes")
install.packages("dplyr")
library(remotes)
install_github("chr1swallace/coloc",build_vignettes=TRUE)
library("coloc")
library(dplyr)
gwas_data <- read.table("path_to_gwas_data.txt", header=TRUE, sep="\t")
⚠️ Replace path_to_gwas_data.txt with the path to the correct GWAS dataset.
eqtl_data <- read.table("path_to_eqtl_data.txt", header=TRUE, sep="\t")
⚠️ Replace path_to_eqtl_data.txt with the path to the correct eQTL dataset.
Before performing the analysis, ensure that the datasets are merged based on the SNP IDs.
merged_data <- merge(gwas_data, eqtl_data, by="SNP_ID")
Replace SNP_ID with the appropriate column name for SNP identifiers if it's different in datasets.
With the datasets imported and merged, performing the colocalization analysis.
library(coloc)
results <- coloc.abf(dataset1=list(beta=merged_data$beta_GWAS, varbeta=merged_data$varbeta_GWAS, pvalues=merged_data$pvalue_GWAS, type="quant trait", N=sample_size_gwas),
dataset2=list(beta=merged_data$beta_eQTL, varbeta=merged_data$varbeta_eQTL, pvalues=merged_data$pvalue_eQTL, type="quant trait", N=sample_size_eqtl))
⚠️ Columns "pvalue_GWAS" and "pvalue_eQTL" contain the p-values for GWAS and eQTL respectively⚠️ Replace sample_size_gwas and sample_size_eqtl with the actual sample sizes for the correct GWAS and eQTL studies, respectively.
The coloc.abf function outputs posterior probabilities for each of the five hypotheses (H0 to H4). In general, a high posterior probability for H4 suggests that the GWAS and eQTL signals colocalize, indicating that they are likely driven by the same causal variant.
Below are some essential resources and datasets for performing colocalization analysis:
This resource consolidates eQTL data from 37 different datasets, encompassing a total of 31,684 individuals. It provides a rich collection of eQTLs that can be utilized for various genomic studies including colocalization analysis.
The Genotype-Tissue Expression (GTEx) project provides a wide variety of datasets spanning multiple versions (from v6 to v8). These datasets are invaluable for studying the relationship between genetic variations and tissue-specific gene expression.
The coloc
package for R is an essential tool for performing colocalization analysis. It allows researchers to test for shared causal variants between two phenotypes, using summary data.