Skip to content

Commit

Permalink
Merge pull request #37 from mancusolab/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
zeyunlu committed Apr 16, 2024
2 parents 23beaf5 + b004c55 commit c3ac6a1
Show file tree
Hide file tree
Showing 5 changed files with 145 additions and 189 deletions.
140 changes: 140 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
[![Documentation-webpage](https://img.shields.io/badge/Docs-Available-brightgreen)](https://mancusolab.github.io/sushie/)
[![Github](https://img.shields.io/github/stars/mancusolab/sushie?style=social)](https://github.com/mancusolab/sushie)
[![License](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Project generated with PyScaffold](https://img.shields.io/badge/-PyScaffold-005CA0?logo=pyscaffold)](https://pyscaffold.org/)

# SuShiE🍣

SuShiE (Sum of Shared Single Effect) is a Python software to fine-map
causal SNPs, compute prediction weights, and infer effect size
correlation for molecular data (e.g., mRNA levels and protein levels
etc.) across multiple ancestries. **The manuscript is in progress.**

``` diff
- We detest usage of our software or scientific outcome to promote racial discrimination.
```

Check [here](https://mancusolab.github.io/sushie/) for full
documentation.

[**Installation**](#installation)
| [**Example**](#get-started-with-example)
| [**Notes**](#notes)
| [**Version History**](#version-history)
| [**Support**](#support)
| [**Other Software**](#other-software)

## Installation

Users can download the latest repository and then use `pip`:

``` bash
git clone https://github.com/mancusolab/sushie.git
cd sushie
pip install .
```

*We currently only support Python3.8+.*

Before installation, we recommend to create a new environment using
[conda](https://docs.conda.io/en/latest/) so that it will not affect the
software versions of the other projects.

## Get Started with Example

SuShiE software is very easy to use:

``` bash
cd ./data/
sushie finemap --pheno EUR.pheno AFR.pheno --vcf vcf/EUR.vcf vcf/AFR.vcf --covar EUR.covar AFR.covar --output ./test_result
```

It can perform:

- SuShiE: multi-ancestry fine-mapping accounting for ancestral
correlation
- Single-ancestry SuSiE (Sum of Single Effect)
- Independent SuShiE: multi-ancestry SuShiE without accounting for
correlation
- Meta-SuSiE: single-ancestry SuSiE followed by meta-analysis
- Mega-SuSiE: single-ancestry SuSiE on row-wise stacked data across
ancestries
- QTL effect size correlation estimation
- cis-SNP heritability estimation
- Cross-validation for SuShiE prediction weights
- Convert prediction results to
[FUSION](http://gusevlab.org/projects/fusion/) format, thus can be
used in [TWAS](https://www.nature.com/articles/ng.3506)

See [here](https://mancusolab.github.io/sushie/) for more details on how
to use SuShiE.

If you want to use in-software SuShiE inference function, you can use
following code as an example:

``` python
from sushie.infer import infer_sushie
# Xs is for genotype data, and it should be a list of numpy array whose length is the number of ancestry.
# ys is for phenotype data, and it should also be a list of numpy array whose length is the number of ancestry.
infer_sushie(Xs=X, ys=y)
```

You can play it with your own ideas!

## Notes

- SuShiE currently only supports **continuous** phenotype
fine-mapping.
- SuShiE currently only supports fine-mapping on
[autosomes](https://en.wikipedia.org/wiki/Autosome).
- SuShiE uses [JAX](https://github.com/google/jax) with [Just In
Time](https://jax.readthedocs.io/en/latest/jax-101/02-jitting.html)
compilation to achieve high-speed computation. However, there are
some [issues](https://github.com/google/jax/issues/5501) for JAX
with Mac M1 chip. To solve this, users need to initiate conda using
[miniforge](https://github.com/conda-forge/miniforge), and then
install SuShiE using `pip` in the desired environment.

## Version History

| Version | Description |
| --------- | --------- |
| 0.1 | Initial Release |
| 0.11 | Fix the bug for OLS to compute adjusted r squared. |
| 0.12 | Update io.corr function so that report all the correlation results no matter cs is pruned or not. |
| 0.13 | Add `--keep` command to enable user to specify a file that contains the subjects ID SuShiE will perform on. Add `--ancestry_index` command to enable user to specify a file that contains the ancestry index for fine-mapping. With this, user can input single phenotype, genotype, and covariate file that contains all the subjects across ancestries. Implement padding to increase inference time. Record elbo at each iteration and can access it in the `infer.SuShiEResult` object. The alphas table now outputs the average purity and KL divergence for each `L`. Change `--kl_threshold` to `--divergence`. Add `--maf` command to remove SNPs that less than minor allele frequency threshold within each ancestry. Add `--max_select` command to randomly select maximum number of SNPs to compute purity to avoid unnecessary memory spending. Add a QC function to remove duplicated SNPs. |
| 0.14 | Remove KL-Divergence pruning. Enhance command line appearance and improve the output files contents. Fix small bugs on multivariate KL. |

## Support

Please report any bugs or feature requests in the [Issue
Tracker](https://github.com/mancusolab/sushie/issues). If users have any
questions or comments, please contact Zeyun Lu (<[email protected]>) and
Nicholas Mancuso (<[email protected]>).

## Other Software

Feel free to use other software developed by [Mancuso
Lab](https://www.mancusolab.com/):

- [MA-FOCUS](https://github.com/mancusolab/ma-focus): a Bayesian
fine-mapping framework using
[TWAS](https://www.nature.com/articles/ng.3506) statistics across
multiple ancestries to identify the causal genes for complex traits.
- [SuSiE-PCA](https://github.com/mancusolab/susiepca): a scalable
Bayesian variable selection technique for sparse principal component
analysis
- [twas_sim](https://github.com/mancusolab/twas_sim): a Python
software to simulate [TWAS](https://www.nature.com/articles/ng.3506)
statistics.
- [FactorGo](https://github.com/mancusolab/factorgo): a scalable
variational factor analysis model that learns pleiotropic factors
from GWAS summary statistics.
- [HAMSTA](https://github.com/tszfungc/hamsta): a Python software to
estimate heritability explained by local ancestry data from
admixture mapping summary statistics.

------------------------------------------------------------------------

This project has been set up using PyScaffold 4.1.1. For details and
usage information on PyScaffold see <https://pyscaffold.org/>.
184 changes: 0 additions & 184 deletions README.rst

This file was deleted.

4 changes: 2 additions & 2 deletions docs/files.rst
Original file line number Diff line number Diff line change
Expand Up @@ -263,11 +263,11 @@ SuShiE by default outputs a ``*.corr.tsv`` file that contains the estimated effe
- Float
- 1.34
- The inferred effect size variance (the posterior estimate for :math:`\sigma^2_{i,b}` in :ref:`Model`) for ancestry 1. It depends on the number of ancestry. One estimate for each credible set.
* - ancestry1_est_covar
* - ancestry1_ancestry2_est_covar
- Float
- 2.56
- The inferred effect size covariance between ancestry 1 and ancestry 2. It depends on the number of pairs of ancestries. One estimate for each credible set.
* - ancestry1_est_corr
* - ancestry1_ancestry2_est_corr
- Float
- 0.8
- The inferred effect size correlation (the posterior estimate for :math:`\rho` in :ref:`Model`) between ancestry 1 and ancestry 2. It depends on the number of pairs of ancestries. One estimate for each credible set.
Expand Down
4 changes: 2 additions & 2 deletions sushie/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -256,7 +256,7 @@ def _prepare_cv(
train_pheno = []
valid_geno = []
valid_pheno = []
train_index = jnp.delete(jnp.arange(5), cv).tolist()
train_index = jnp.delete(jnp.arange(cv_num), cv).tolist()

# make the training and test for each population separately
# because sample size may be different
Expand Down Expand Up @@ -1111,7 +1111,7 @@ def build_finemap_parser(subp):
default=None,
help=(
"Genotype data in vcf format. Use 'space' to separate ancestries if more than two.",
" Keep the same ancestry order as phenotype's.",
" Keep the same ancestry order as phenotype's. The software will count RFE allele.",
),
)

Expand Down
2 changes: 1 addition & 1 deletion sushie/io.py
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ def read_vcf(path: str) -> Tuple[pd.DataFrame, pd.DataFrame, Array]:
"""Read in genotype data in `vcf <https://en.wikipedia.org/wiki/Variant_Call_Format>`_ format.
Args:
path: The path for vcf genotype data (full file name).
path: The path for vcf genotype data (full file name). It will count REF allele.
Returns:
:py:obj:`Tuple[pd.DataFrame, pd.DataFrame, Array]`: A tuple of
Expand Down

0 comments on commit c3ac6a1

Please sign in to comment.