Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimal downsampling of UMIs during wrangling #46

Open
JeffAndBailey opened this issue Dec 30, 2022 · 0 comments
Open

Optimal downsampling of UMIs during wrangling #46

JeffAndBailey opened this issue Dec 30, 2022 · 0 comments
Labels
feature ✨ feature request or enhancement
Milestone

Comments

@JeffAndBailey
Copy link
Member

JeffAndBailey commented Dec 30, 2022

Related Problem

When performing large scale seqeuncing the input for certain samples and particular MIPs within can be extremely deep (may reads for a given MIP in a given sample). This occurs when controls are repeated sequenced and merged together. The best place to subsample to reduce depth is after UMI determination and correction. The follwoing script does the subsampling

https://github.com/bailey-lab/MIPTools/blob/master/src/wrangler_downsample_umi.py

However the subsampling is random which is not optimal as it would be preferable to have this deterministic. Also, UMIs with the most read support make the most optimal sequences to subsample.

Solution Requested

Modify algorithm downsampling script to sort UMI sequences deterministiically based on # of supporting reads and then trim off those with lower read support if the number of UMI sequences exceeds the input threshold.

Describe alternatives you've considered
I am not sure there is really justification for alterantives unless one can argue that one wants to explore the effect of suboptimally selecting UMI sequences

@JeffAndBailey JeffAndBailey added the feature ✨ feature request or enhancement label Dec 30, 2022
@arisp99 arisp99 added this to the 1.0.0 milestone Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature ✨ feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants