Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No data in chunks despite gzip conversion #64

Open
Haidermanzer opened this issue Jan 16, 2023 · 16 comments
Open

No data in chunks despite gzip conversion #64

Haidermanzer opened this issue Jan 16, 2023 · 16 comments

Comments

@Haidermanzer
Copy link

Hello,

I've been trying to use Nanodisco and have ran into the issue that many others have had where the calculate differences step gives the "no data in chunk" error. It seems like almost all of the others that have ran into this issue have resolved it by using the ONT tools functionality to convert the files to .gzip, and I tried that as well but am still running into the same error.

Do you have any alternative solutions to this problem?

@touala
Copy link
Member

touala commented Jan 16, 2023

Hello @Haidermanzer,

Thank you for trying nanodisco. Do all chunks give this No data in chunk issue? How do native and WGA mappings looks like (e.g. in IGV for example)? You could have this issue when the reference do not match the datasets.

Best,

Alan

@Haidermanzer
Copy link
Author

Haidermanzer commented Jan 16, 2023 via email

@touala
Copy link
Member

touala commented Jan 16, 2023

IGV is a mapping visualisation software. You can find the documentation and download link at this address: https://software.broadinstitute.org/software/igv/UserGuide. Basically, I wanted to make sure reads were properly mapped by nanodisco preprocess command (e.g. -o <path_output> will produced .bam at <path_output>/<sample_name>.sorted.bam).

Is your dataset public?

Alan

@Haidermanzer
Copy link
Author

Haidermanzer commented Jan 24, 2023 via email

@touala
Copy link
Member

touala commented Jan 25, 2023

Hi @Haidermanzer,

You can send me the files and I'll look ASAP. To diagnose the issue, I would need native and WGA fast5, and the reference you used.

Best,

Alan

@Haidermanzer
Copy link
Author

Great! Which email address should I share the one drive link with?

@touala
Copy link
Member

touala commented Jan 26, 2023

You can share them at: [email protected]

@touala
Copy link
Member

touala commented Feb 8, 2023

Hi @Haidermanzer,

I got your data and I was able to run nanodisco. The main issue seems related to the .vbz vs .gzip compression. Please find the commands below:

# Download and unzip your dataset
# Using most recent, but compatible, guppy version i.e. 6.3.8
# Later version discontinued --fast5_out
guppy_basecaller --disable_pings --recursive -i Nativefast5_pass --fast5_out -s Nativefast5_basecalled -c dna_r9.4.1_450bps_sup.cfg --device cuda:0
guppy_basecaller --disable_pings --recursive -i WGAfast5_pass --fast5_out -s WGAfast5_basecalled -c dna_r9.4.1_450bps_sup.cfg --device cuda:0

compress_fast5 -t 20 --recursive -c gzip -i Nativefast5_basecalled -s Nativefast5_basecalled_gzip
compress_fast5 -t 20 --recursive -c gzip -i WGAfast5_basecalled -s WGAfast5_basecalled_gzip

# Using an older singularity version to avoid having to bind directories
conda create -n singularity_3.5.2 -c conda-forge singularity=3.5.2
conda activate singularity_3.5.2

run_nanodisco="singularity exec <path_to/nanodisco> nanodisco"
$run_nanodisco preprocess -p 20 -f Nativefast5_basecalled_gzip -s native -o results/preprocessed_gzip -r Reference.fasta
$run_nanodisco preprocess -p 20 -f WGAfast5_basecalled_gzip -s wga -o results/preprocessed_gzip -r Reference.fasta

$run_nanodisco difference -nj 2 -nc 1 -p 5 -f 281 -l 290 -i results/preprocessed_gzip -o results/difference -w wga -n native -r Reference.fasta
# Make sure to process more if not all the genome chunks for better performances
# $run_nanodisco chunk_info -r Reference.fasta

# Check output within R:
# summary(readRDS("results/difference/chunk.281.difference.rds"))

I'm not sure why it didn't work for you earlier. Was the compress_fast5 command done before or after basecalling? Maybe Guppy reintroduce the newer .vbz compression when outputting basecalled fast5.

Please let me know if this fixed your issue or if you have any other problem.

Best,

Alan

@Haidermanzer
Copy link
Author

Hi Alan,

Great! The steps that you outlined look pretty similar to what I tried other than some version differences, but I'll try using the versions you used now. In the meantime, would it be possible for you to use the one drive to send the output from those steps back to me so I could try to troubleshoot which part our output deviates from yours?

@touala
Copy link
Member

touala commented Feb 9, 2023

Ok. It could be version specific indeed. I've uploaded two current differences files: for a chunk and for the whole genome. Do you know if a motif is modified in this sample so I can do a sanity check? Feel free to continue the discussion by email if you don't want to talk specific here.

Alan

@ecpierce
Copy link

ecpierce commented Feb 13, 2023

@touala just in case it makes a difference- which version of ont-fast5-api/compress_fast5 are you using?

@touala
Copy link
Member

touala commented Feb 13, 2023

@ecpierce Unfortunately I didn't kept track of that and I've multiple versions available... Although the most likely is 4.0.0.

@BioRB
Copy link

BioRB commented Jul 12, 2023

I have same issie. when I do nanodisco difference I get no data for any chunk. I followed the mentioned procedure also trying to change the singularity version but notting. could you try a test on my data? thanks

@touala
Copy link
Member

touala commented Jul 12, 2023

Hello @BioRB,

Sure, feel free to share a subset of data. I've posted my email above.

Alan

@BioRB
Copy link

BioRB commented Jul 12, 2023

ok , I just sent you the files in a google drive folder.

@BioRB
Copy link

BioRB commented Aug 1, 2023

Dear @touala , do you have news about the subset I've sent you? Did you give it a try?
thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants