Skip to content

eSCAN (Scan the Enhancers): Scan Regulatory Regions for Aggregate Association Testing using Whole Genome Sequencing Data

Notifications You must be signed in to change notification settings

yingxi-kaylee/eSCAN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

eSCAN

eSCAN: Scan Regulatory Regions for Aggregate Association Testing using Whole Genome Sequencing Data

Description

eSCAN (or “Scan the Enhancers”, with “enhancers” as a shorthand for any potential regulatory regions in the genome) is an R package for performing genome-wide assessment of rare variants residing in enhancer regions that are significantly associated with a phenotype, combining the advantages of dynamic window selection with the advantages of increasing genomic annotation information, including chromatin accessibility, histone markers, and 3D chromatin conformation. eSCAN defines test windows across the genome by genomic annotation either specified by users or provided by eSCAN package such that each window marks putative regulatory region(s). eSCAN then searches the defined windows using fastSKAT and detects significant regions by an empirical/analytic threshold which adjusts for multiple testing of all the searching windows across the genome, of different sizes, including some overlapping windows.

Installation

install.packages("devtools")
devtools::install_github("yingxi-kaylee/eSCAN")

Usage

Please see the eSCAN user manual for detailed usage of eSCAN package. See the following section for a working example.

eSCAN Example

In this section, we will use a toy example to show how to use eSCAN package. Genotype data in this example is from the SCANG package, simulated using COSI, leveraging its calibrated model to closely resemble the LD patterns in African Americans.

Setup Data

library(eSCAN)
data(geno) # genotype data frame
data(pheno) # phenotype and covariates data frame
data(enhancer) # enhancer data frame, optional

The pheno data contains one column for phenotype and two for covariates which were generated from the standard normal distribution and Bernoulli distribution (p=0.5), respectively. The phenotype values were then generated using the two covariates and error term, along with causal variants randomly selected from two signal regions (the signal regions were also randomly selected with length 3 kb). The enhancer data was simulated as follows, generated by eGenerator() function in eSCAN package using parameter maxgap=300.

head(enhancer)
#>   Index Start_pos End_pos Start_index End_index Length
#> 1     1        54    2769           1        48     48
#> 3     2      4510    8272          63       128     66
#> 4     3      8655   13837         129       212     84
#> 5     4     14152   19509         213       302     90
#> 6     5     19943   22963         303       359     57
#> 7     6     23452   27073         360       425     66

The first column is index of the enhancers, the next two columns are start and end positions of the enhancers. The next two are start and end indices of the enhancers (indices in the genotype matrix sorted by genomic positions). The last column is the number of rare variants residing in the enhancers.

Analyzing the simulated data using eSCAN

First, fit a generalized linear model under the null hypothesis. The parameter x is a data frame containing phenotype and covariates, each row indicates an individual and each column indicates a phenotype/covariate; the parameter outcome is used to specify the name of phenotype column in the data frame x, and the parameter covars is to specify the names of covariates. Then the parameter fam is either “gaussian” for a continuous phenotype or “binomial” for a dichotomous phenotype.

nullmod <- fitNull(x = pheno, outcome = "phenotype", covars=c("X1", "X2"), fam="gaussian")

Then use the null model to run the main function eSCAN(). The parameter times indicates the number of Monte Carlo simulations (default=1000). For this toy example we set 10 so that you should be able to get results within one minute.

res_eSCAN <- eSCAN(genotype = geno, nullmod = nullmod, new_enloc = enhancer, times=10, 
                   alpha=0.05, analy=FALSE)

The function returns a list containing 3 elements: res, res0 and thres. res is a matrix of results for significant regions detected by eSCAN. The first column is the p-value of the detected region(s), and the next two columns are start and end position of the detected region(s).

res_eSCAN$res
#>          pvalue Start_pos End_pos
#> 25 0.0012677567    229802  234983
#> 26 0.0000276419    235306  258955
#> 28 0.0028600775    264096  274969
#> 30 0.0007865199    281338  284343

res0 is a matrix to summarize all the regions included in the analysis.

res_eSCAN$res0[1:10,]
#>        pvalue Start_pos End_pos
#> 1  0.09543621        54    2769
#> 3  0.10793791      4510    8272
#> 4  0.35983971      8655   13837
#> 5  0.85590415     14152   19509
#> 6  0.73614514     19943   22963
#> 7  0.68473027     23452   27073
#> 8  0.23106533     27487   40347
#> 9  0.09198459     40675   67183
#> 11 0.76369065     69689   76915
#> 12 0.21920216     77228   92139

thres is the threshold of eSCAN to control the family-wise/genome-wide error rate at alpha level.

res_eSCAN$thres
#>          5% 
#> 0.004935235

Version

The current version is 0.1.1 (June 21, 2020).

License

This software is licensed under GPLv3.

Citation

Yingxi Yang, Yuchen Yang, Le Huang, Jai G. Broome, Adolfo Correa, Alexander Reiner, Laura M. Raffield, and Yun Li (2021). “eSCAN: Scan Regulatory Regions for Aggregate Association Testing Using Whole Genome Sequencing Data.” bioRxiv, 2020.11.30.405266.

References

Zilin Li, Xihao Li, Yaowu Liu, Jincheng Shen, Han Chen, Hufeng Zhou, Alanna C. Morrison, Eric Boerwinkle, and Xihong Lin (2019) “Dynamic Scan Procedure for Detecting Rare-Variant Association Regions in Whole-Genome Sequencing Studies”. The American Journal of Human Genetics, 104(5), 802-814.

Gogarten SM, Sofer T, Chen H, Yu C, Brody JA, Thornton TA, Rice KM, Conomos MP (2019). “Genetic association testing using the GENESIS R/Bioconductor package.” Bioinformatics. doi:10.1093/bioinformatics/btz567.

About

eSCAN (Scan the Enhancers): Scan Regulatory Regions for Aggregate Association Testing using Whole Genome Sequencing Data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages