Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providing a basecalled reads in FASTQ file #56

Open
hasindu2008 opened this issue Nov 10, 2022 · 9 comments
Open

Providing a basecalled reads in FASTQ file #56

hasindu2008 opened this issue Nov 10, 2022 · 9 comments

Comments

@hasindu2008
Copy link

Is it possible to make nanodisco accept a FASTQ file that contains basecalled reads, rather than extracting this from FAST5 files? This way, the need to rebasecall with --fast5-out will no longer be necessary I believe?

@touala
Copy link
Member

touala commented Nov 28, 2022

Hello @hasindu2008,

Unfortunately this is not readily implementable but should be doable "by hand". I made this design choice a while ago, because I found it to be the least error prone as it assure that the fast5, fastq, and bam matches. But this is indeed less efficient. Please let me know if you want a high level alternate solution.

Best,

Alan

@hasindu2008
Copy link
Author

Do you only need the base called read in the FAST5 file generated with --fast5-out or do you rely on the move table as well?

@touala
Copy link
Member

touala commented Nov 28, 2022

Basically, we need to be able to execute nanopolish eventalign for aligning events on the reference. The fastq are extracted and contain the path to the fast5 in each read's header which I found, at the time, to be efficient for indexing. I don't know if this is still the case.

@hasindu2008
Copy link
Author

Ohh, I suggest trying replacing nanopolish with f5c and both indexing (no need to have in the header) and event alignment will be much faster (~3-5X) with near-identical results.

f5c index -d fast5_dir in_fasta -t num_threads --iop num_threads
f5c eventalign -t num_threads --iop num_threads --scale-events -n -r in_fasta -b in_bam -g tmp_genome

You can make it 10X faster if you switch to BLOW5 format with added advantages such as less backward compatibility headaches and saving a lot of unnecessary dev time. slow5tools can be used to streamline many signal merge/split/get operations and both nanopolish and f5c are compatible with BLOW5 format.

f5c index -t num_threads  in_fasta  --slow5 signals.blow5
f5c eventalign -t num_threads --iop num_threads --scale-events -n -r in_fasta -b in_bam -g tmp_genome --slow5 signals.blow5

In the previous response, can you please explain what you meant by matching fast5, fastq, and bam matches? Each multi-fast5 files separately run with nanopolish in your script or do you concatenate all the the FASTQ and then run one nanopolish instance?

@ecpierce
Copy link

Hi @touala,

I am having trouble generating files with --fast5-out and so have a similar question. Can you clarify what you mean by "but should be doable by hand"? I may need to go this route. I have basecalled fastq files. I agree with hasindu that this solution may become important since it seems like nanopore is planning to remove the fast5-out option.

Thanks! Emily

@jflopezfernandez
Copy link

@ecpierce Hi, Emily, we ran into the --fast5-out option deprecation problem ourselves, and we opted to just download an older version of Guppy rather than figuring out a way to be able to use *.fastq files. As of this writing, it looks like version 6.4.2 is the most recent, but version 6.2.1 is the most recent version prior to the deprecation of the --fast5-out option in version 6.3+.

@ecpierce
Copy link

ecpierce commented Dec 1, 2022

@jflopezfernandez thank you for your response! That is the solution I ended up using. It would be useful though if nanodisco developers consider working on a long-term solution so that it will be compatible with even newer Guppy versions in the future. It seems like Dorado uses pod5 format- not sure how that would impact things but I guess something else to consider if nanodisco is going to be actively maintained. Really appreciate this awesome program!

@fanggang
Copy link

fanggang commented Dec 1, 2022

Thank you very much for sharing your experience and solutions to other users, Jose!

For the question from Emily: we are very much encouraged by the broad interests in Nanodisco, and yes we are committed to maintain it in the long term. This being said, because Nanopore software and kits are constantly evolving, our strategy (given the finite resources we have) is to 1) use Singularity to ensure the current package versions are compatible and the entire workflow is reliably working; 2) we do plan to release major upgrades: it would not be frequent (given the nature of nanopore software/kit evolution explained above), but we will do it for major milestones!

Best,
Gang

@ecpierce
Copy link

ecpierce commented Dec 1, 2022

@fanggang that makes sense. I appreciate your work and am glad to hear you are committed to maintaining!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants