Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slow feature finding on animal genomes #20

Open
arivers opened this issue Jun 1, 2023 · 1 comment
Open

slow feature finding on animal genomes #20

arivers opened this issue Jun 1, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@arivers
Copy link
Member

arivers commented Jun 1, 2023

I'm doing a standard Cas9 search on a 1 gigabase chicken genome and the step "Find genomic features closest to the guide" is taking longer than the guide finding step. I ran this on a 384 GB, 48-core node. using SLURM sacct I get a reported Max RSS of 199GB. I will look into ways to process this that improve speed /memory use. A lot of optimizations of guidemaker were for microbial genomes but most users are using it for eukaryotes. Long-term the plan is to improve the experience for eukaryotic users.

@arivers arivers added the enhancement New feature or request label Jun 1, 2023
@arivers arivers self-assigned this Jun 1, 2023
@arivers
Copy link
Member Author

arivers commented Sep 21, 2023

I've looked into this issue more while doing some other updates. It takes 2.5 h to identify all the guides then 30 hours to figure out the genes they target in the chicken genome. The performance issue appears to be with the bedtools library called by pybedtools. Some alternatives include Bedops and bedtk. it is also possible this could be handled by Pandas or an interval tree data object like this Python interval tree or this c++ interval tree. The guidemaker.core.Annotation class needs to be rewritten to handle this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant