Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No assembly reported for 100 reads with the same sequence #7

Open
nh13 opened this issue Nov 8, 2017 · 8 comments
Open

No assembly reported for 100 reads with the same sequence #7

nh13 opened this issue Nov 8, 2017 · 8 comments

Comments

@nh13
Copy link

nh13 commented Nov 8, 2017

@lh3 I was playing around with this tool but I couldn't get it to work on a "simple" case. I duplicated a read 100 times and would expect it to output the duplicated read. Any thoughts?

``` @M50205:20:000000000-B82KM:1:1108:8421:4217/2 CTAAGGTGGACATGTTGGCTTCTCTCTGTTCTTAACATGTTAAAATTAAAATTAACTTCTCTGGTGTGTGGAGATGTCTTACAATAACAGTTGCTACTATTTCTTTTCTTTTTCTCTTTCTTTCCTCTCTCTTTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTAGACAAGGTCTCAATTTGTCACTCAGAGTGAAGTGCATTGGCATGAACATTGCTCACTTCATCCTTAACCTTCTTGGCCAAAGAACTCCTCCTGCCTCACCCCC + 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 ```
@nh13
Copy link
Author

nh13 commented Nov 8, 2017

I forgot to mention the context. I want to re-assembly a set of reads I know originate from the same haploid copy of the genome, and it's in a tandem repeat. All the reads should start/end around the same place, so it's a bit easier than assembly.

@lh3
Copy link
Owner

lh3 commented Nov 9, 2017

These 100 reads will be collapsed to one read. You will get a singleton contig, which will be ignored unless you tune parameters.

For cfDNA-like data, assembly may not work well.

@nh13
Copy link
Author

nh13 commented Nov 9, 2017

That is actually what I want, a single contig at then end of the day. Think haploid variant calling across repeat regions with indel and mismatch errors. All reads would come from the same DNA molecule.

@nh13
Copy link
Author

nh13 commented Nov 9, 2017

I am considering using this instead of consensus calling for duplex sequencing. In this case we have stutter due to PCR slippage across STRs.

@nh13
Copy link
Author

nh13 commented Nov 9, 2017

Also, the introduction in the readme implied it would be suitable for re-assembly if short reads, even in runs of LOH. Would you mind sharing the tuning parameters you tube the parameters to output the single contig?

@lh3
Copy link
Owner

lh3 commented Nov 9, 2017

Your example is violating the basic assumption of assembly and won't happen in practice. You need to test on real data.

@nh13
Copy link
Author

nh13 commented Nov 9, 2017

@lh3 challenge accepted, I'll send you a real world dataset where this can happen!

@nh13
Copy link
Author

nh13 commented Nov 13, 2017

@lh3 I was wondering if you received the dataset of which I am speaking. I believe it would be a novel application of fermi-lite, where we aren't assembling a genome, but rather reconstructing a source molecule. You could see such applications as re-assembling reads from the same long-molecule (ex. 10x) or with novel sequencing preparations (ex. Duplex Sequencing) benefiting from proper assembly of reads from a single molecule.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants