Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The data contains non-finite values. #252

Open
yhj-j opened this issue Jan 25, 2024 · 4 comments
Open

The data contains non-finite values. #252

yhj-j opened this issue Jan 25, 2024 · 4 comments

Comments

@yhj-j
Copy link

yhj-j commented Jan 25, 2024

"I encountered this error while performing the analysis, how should I troubleshoot it?

Traceback (most recent call last):
  File "/home/yanghj/miniconda3/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/yanghj/miniconda3/lib/python3.9/site-packages/tobias/tools/bindetect_functions.py", line 512, in process_tfbs
    obs_params = diff_dist.fit(observed_log2fcs)
  File "/home/yanghj/miniconda3/lib/python3.9/site-packages/scipy/stats/_continuous_distns.py", line 64, in wrapper
    return fun(self, *args, **kwds)
  File "/home/yanghj/miniconda3/lib/python3.9/site-packages/scipy/stats/_continuous_distns.py", line 406, in fit
    raise ValueError("The data contains non-finite values.")
ValueError: The data contains non-finite values.
Traceback (most recent call last):
  File "/home/yanghj/miniconda3/bin/TOBIAS", line 11, in <module>
    sys.exit(main())
  File "/home/yanghj/miniconda3/lib/python3.9/site-packages/tobias/TOBIAS.py", line 162, in main
    args.func(args)
  File "/home/yanghj/miniconda3/lib/python3.9/site-packages/tobias/tools/bindetect.py", line 674, in run_bindetect
    results = [task.get() for task in task_list]
  File "/home/yanghj/miniconda3/lib/python3.9/site-packages/tobias/tools/bindetect.py", line 674, in <listcomp>
    results = [task.get() for task in task_list]
  File "/home/yanghj/miniconda3/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
ValueError: The data contains non-finite values.
2024-01-25 00:14:27 (83576) [ERROR] Multiprocessing logger lost connection to queue - probably due to an error raised from a child process.

When I process another peak file with the same BAM file, there is no problem, but the peak file that reported the error seems fine when I import it into IGV.

Here is the code:

TOBIAS BINDetect --motifs /home/yanghj/my_data/workspace/Motif_Databases/motif_databases/JASPAR/JASPAR2022_CORE_plants_non-redundant_v2.meme \
--signals 2.ScoreBigwig/heat_ATAC_ScoreBigwig.bw 2.ScoreBigwig/Leaf_ATAC_ScoreBigwig.bw \
--genome /home/yanghj/my_data/ATAC-seq/ref/Zhangshugang_genome.fa \
--peaks .../heat_PBC_ST8_bed/Heat_Down_Pro.bed \
--outdir 3.heat2Leaf_LeafK27ac_Pro/ \
--cond_names heat_ATAC_ScoreBigwig_LeafK27ac_Pro Leaf_ATAC_ScoreBigwig_LeafK27ac_Pro --cores 8 1>BINDetect_log 2>&1
@mohobein
Copy link
Collaborator

Hey @yhj-j,

Thank you for using TOBIAS. My first guess as to the reason for your error would be that you may have used a different peak file during ScoreBigwig to calculate the footprint scores. Now when you run BINDetect you are looking for scores in a different set of regions, but some of them did not receive a score because they were not part of the original set of peaks. The non-finite values encountered that cause the error would then be NaNs from regions where no score could be found. This would also explain why your command works fine with a different peak file if it was used to calculate the scores with ScoreBigwig.

To troubleshoot, please rerun ScoreBigwig using the peak file that is causing you problems. Then check if the BINDetect error persists using the newly generated scores as input instead. If it does, we can investigate further.

Hope this helps!

Best regards,
Moritz

@yhj-j
Copy link
Author

yhj-j commented Jan 26, 2024

Thank you for your response. I have split the problematic peak file into 12 parts based on chromosome number and carried out ATACorrect, Scorebigwig, and BINDetect separately on each. However, I still encounter the same error with the peak file for chromosome 12.

Traceback (most recent call last):
  File "/home/yanghj/miniconda3/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/yanghj/miniconda3/lib/python3.9/site-packages/tobias/tools/bindetect_functions.py", line 512, in process_tfbs
    obs_params = diff_dist.fit(observed_log2fcs)
  File "/home/yanghj/miniconda3/lib/python3.9/site-packages/scipy/stats/_continuous_distns.py", line 64, in wrapper
    return fun(self, *args, **kwds)
  File "/home/yanghj/miniconda3/lib/python3.9/site-packages/scipy/stats/_continuous_distns.py", line 406, in fit
    raise ValueError("The data contains non-finite values.")
ValueError: The data contains non-finite values.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/yanghj/miniconda3/bin/TOBIAS", line 11, in <module>
    sys.exit(main())
  File "/home/yanghj/miniconda3/lib/python3.9/site-packages/tobias/TOBIAS.py", line 162, in main
    args.func(args)
  File "/home/yanghj/miniconda3/lib/python3.9/site-packages/tobias/tools/bindetect.py", line 674, in run_bindetect
    results = [task.get() for task in task_list]
  File "/home/yanghj/miniconda3/lib/python3.9/site-packages/tobias/tools/bindetect.py", line 674, in <listcomp>
    results = [task.get() for task in task_list]
  File "/home/yanghj/miniconda3/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
ValueError: The data contains non-finite values.
2024-01-26 08:49:26 (43000) [ERROR]     Multiprocessing logger lost connection to queue - probably due to an error raised from a child process.

for peak_file in Chr01_Heat_Down_Pro.bed Chr02_Heat_Down_Pro.bed Chr03_Heat_Down_Pro.bed Chr04_Heat_Down_Pro.bed Chr05_Heat_Down_Pro.bed Chr06_Heat_Down_Pro.bed Chr07_Heat_Down_Pro.bed Chr08_Heat_Down_Pro.bed Chr09_Heat_Down_Pro.bed Chr10_Heat_Down_Pro.bed Chr11_Heat_Down_Pro.bed Chr12_Heat_Down_Pro.bed; do
    mkdir 3.heat2Leaf_${peak_file%.bed}
    # ATACorrect for heat sample
    TOBIAS ATACorrect --cores 10 --bam ../../2.align/heat_ATAC_merge.df.bam --genome /home/yanghj/my_data/ATAC-seq/ref/Zhangshugang_genome.fa \
    --peaks bed/${peak_file} --prefix heat_ATAC_${peak_file%.bed} --outdir 1.ATACorrect/
    # ATACorrect for leaf sample
    TOBIAS ATACorrect --cores 10 --bam ../../2.align/Leaf_ATAC_merge.df.bam --genome /home/yanghj/my_data/ATAC-seq/ref/Zhangshugang_genome.fa \
    --peaks bed/${peak_file} --prefix Leaf_ATAC_${peak_file%.bed} --outdir 1.ATACorrect/
    # ScoreBigwig for heat sample
    TOBIAS ScoreBigwig --signal 1.ATACorrect/heat_ATAC_${peak_file%.bed}_corrected.bw \
    --regions bed/${peak_file} \
    --output 2.ScoreBigwig/heat_ATAC_${peak_file%.bed}_ScoreBigwig.bw \
    --cores 8 1>heat_ATAC_${peak_file%.bed}_ScoreBigwig_log 2>&1
    # ScoreBigwig for leaf sample
    TOBIAS ScoreBigwig --signal 1.ATACorrect/Leaf_ATAC_${peak_file%.bed}_corrected.bw \
    --regions bed/${peak_file} \
    --output 2.ScoreBigwig/Leaf_ATAC_${peak_file%.bed}_ScoreBigwig.bw \
    --cores 8 1>Leaf_ATAC_${peak_file%.bed}_ScoreBigwig_log 2>&1
    # BINDetect
    TOBIAS BINDetect --motifs /home/yanghj/my_data/worksapce/Motif_Databases/motif_databases/JASPAR/JASPAR2022_CORE_plants_non-redundant_v2.meme \
                     --signals 2.ScoreBigwig/heat_ATAC_${peak_file%.bed}_ScoreBigwig.bw 2.ScoreBigwig/Leaf_ATAC_${peak_file%.bed}_ScoreBigwig.bw \
                     --genome /home/yanghj/my_data/ATAC-seq/ref/Zhangshugang_genome.fa \
                     --peaks bed/${peak_file} \
                     --outdir 3.heat2Leaf_${peak_file%.bed}/ \
                     --cond_names heat_ATAC_${peak_file%.bed}_LeafK27ac_Pro Leaf_ATAC_${peak_file%.bed}_LeafK27ac_Pro \
                     --cores 6 1>BINDetect_${peak_file%.bed}_log 2>&1

done

@mohobein
Copy link
Collaborator

Okay, that does sound peculiar. May I ask how you generated those bed files? Perhaps for chromosome 12, there is a region written down in the file that is outside of the actual chromosome range and thus, no reads can be fetched from there, resulting in missing scores. You could also have a look whether all regions in your bed file have reads in the bam file / scores in the ScoreBigwig file associated with them (for example using bedtools intersect).

If you want, you can also send me your input files so I can reproduce the error and see what might be causing the problem locally.

@shelbyar
Copy link

shelbyar commented Apr 1, 2024

Hello-thanks for the great tool!

I am experiencing the same error for TOBIAS BINDetect 0.16.1. I used the same peak file for both ScoreBigwig and BINDetect, so that shouldn't be my issue. I realized, however, that ATACorrect output identical bigwig files for the corrected and uncorrected bigwigs, and the expected/bias bigwig files were empty. Not sure if these errors are linked? The test datasets run perfectly.

I am now testing an older version of TOBIAS. Any other thoughts on how to troubleshoot? Thanks a lot!

edit to add: version 0.14.0 had the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants