Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in newest version 1.3.0 #104

Open
pengouy opened this issue Feb 4, 2024 · 15 comments
Open

Errors in newest version 1.3.0 #104

pengouy opened this issue Feb 4, 2024 · 15 comments

Comments

@pengouy
Copy link

pengouy commented Feb 4, 2024

Hi, I updated the Hecatomb to the newest version 1.3.0 the day you released it. Unfortunately, it seems that there are some bugs when I run with the command hecatomb test, and I have noticed that you are working on it day and night. I really need this extraordinary tool now, but I can't install the version 1.2.0, could I ask when the bug-fixed version 1.3.1 will be released? Looking forward to your response, thanks for your time. The following is log:


Activating conda environment: anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/conda/82f1c97d51f13e73842c70c6a19c5768_
/usr/bin/bash: -c: line 0: syntax error near unexpected token ;' /usr/bin/bash: -c: line 0: source /public3/home/sc30177/anaconda3/bin/activate '/public3/home/sc30177/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/conda/82f1c97d51f13e73842c70c6a19c5768_'; set -euo pipefail; if [[ -d hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG ]]; then; rm -rf hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG; fi; megahit -1 hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_R1.host_rm.fastq.gz -2 hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_R2.host_rm.fastq.gz -r hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_RS.host_rm.fastq.gz -o hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG --out-prefix A13-256-115-06_GTTTCG -t 16 --presets meta-large&> hecatomb.out/logs/megahit_sample_paired.A13-256-115-06_GTTTCG.log; sed 's/>/>A13-256-115-06_GTTTCG/' hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG/A13-256-115-06_GTTTCG.contigs.fa > hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG/A13-256-115-06_GTTTCG.rename.contigs.fa; tar cf - hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG | zstd -T16 -9 > hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG.tar.zst 2> hecatomb.out/logs/megahit_sample_paired.A13-256-115-06_GTTTCG.log;'
[Sun Feb 4 09:22:40 2024]
Error in rule megahit_sample_paired:
jobid: 13
input: hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_R1.host_rm.fastq.gz, hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_R2.host_rm.fastq.gz, hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_RS.host_rm.fastq.gz
output: hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG/A13-256-115-06_GTTTCG.contigs.fa, hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG/A13-256-115-06_GTTTCG.rename.contigs.fa, hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG.tar.zst
log: hecatomb.out/logs/megahit_sample_paired.A13-256-115-06_GTTTCG.log (check log file(s) for error details)
conda-env: /public3/home/sc30177/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/conda/82f1c97d51f13e73842c70c6a19c5768_
shell:
if [[ -d hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG ]]; then; rm -rf hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG; fi; megahit -1 hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_R1.host_rm.fastq.gz -2 hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_R2.host_rm.fastq.gz -r hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_RS.host_rm.fastq.gz -o hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG --out-prefix A13-256-115-06_GTTTCG -t 16 --presets meta-large&> hecatomb.out/logs/megahit_sample_paired.A13-256-115-06_GTTTCG.log; sed 's/>/>A13-256-115-06_GTTTCG/' hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG/A13-256-115-06_GTTTCG.contigs.fa > hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG/A13-256-115-06_GTTTCG.rename.contigs.fa; tar cf - hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG | zstd -T16 -9 > hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG.tar.zst 2> hecatomb.out/logs/megahit_sample_paired.A13-256-115-06_GTTTCG.log;
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Logfile hecatomb.out/logs/megahit_sample_paired.A13-256-115-06_GTTTCG.log not found.

Config file /public3/home/sc30177/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/workflow/../config/config.yaml is extended by additional config specified via the command line.
Config file /public3/home/sc30177/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/workflow/../config/dbFiles.yaml is extended by additional config specified via the command line.
Config file /public3/home/sc30177/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/workflow/../config/immutable.yaml is extended by additional config specified via the command line.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 32
Rules claiming more threads will be scaled down.
Select jobs to execute...
[Sun Feb 4 09:22:43 2024]
Finished job 51.
2 of 99 steps (2%) done

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
cat .snakemake/log/2024-02-04T072114.582588.snakemake.log >> hecatomb.out/hecatomb.log
FATAL: Hecatomb encountered an error.
Check the Hecatomb logs directory for command-related errors:
hecatomb.out/logs
Complete log: .snakemake/log/2024-02-04T072114.582588.snakemake.log
[2024:02:04 09:22:43] ERROR: Snakemake failed

@beardymcjohnface
Copy link
Collaborator

beardymcjohnface commented Feb 4, 2024

Hi, I'll be releasing 1.3.1 soon which should have all the kinks worked out. Unfortunately snakemake v8 broke some things, and python 3.12 broke f-strings in snakemake. the cluster commands for snakemake 8+ have also changed, so I've pinned all my tools to snakemake <8 for now and will migrate them all together at a later date.

The unit tests for Hecatomb don't quite cover everything yet so some bugs slipped through the cracks. The next version is waiting on review for koverage 0.1.10 in bioconda bioconda/bioconda-recipes#45597 and I'll push the release as soon as that is done.

If you need it today, pull and install hecatomb from source:

conda create -n hecatombDev python=3.11
conda activate hecatombDev
git clone https://github.com/shandley/hecatomb.git
cd hecatomb
git checkout dev
pip install -e .

Modify the koverage yaml to use koverage 0.1.9 and snakemake<8:

nano hecatomb/snakemake/workflow/envs/koverage.yaml
name: koverage
channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
    - koverage=0.1.9
    - snakemake<8

Install DBs and envs

hecatomb install
hecatomb test build_envs

It should then work

hecatomb test

@pengouy
Copy link
Author

pengouy commented Feb 4, 2024

Thank you so much for the quick response, it helps a lot. I will try the hecatombDev. Many thanks for the effort.

@beardymcjohnface
Copy link
Collaborator

All good, let me know how it goes.

@pengouy
Copy link
Author

pengouy commented Feb 6, 2024

All good, let me know how it goes.

Hi, the job has not finished yet with the newly released version 1.3.1, but it works well untill now.
I'm using Hecatomb on a supercomputer platform containing multiple nodes. I submitted the job to two nodes, however, only one node has been used. Considering that the step of mmseq alignment costs a lot of time, I'm wondering whether the mmseq supports to run with two or more nodes to speed it up? It may be an optional choice for a big data job in the future updating.

@beardymcjohnface
Copy link
Collaborator

Hecatomb's HPC support is via snakemake profiles. You can submit the main hecatomb job with 1 thread and pass your profile to the hecatomb command. The main job will submit individual jobs to the queue for you. You could also just submit to one node with lots of resources and run as a local job.

I just found a new bug when using --profile so I'll push another version soon.

@pengouy
Copy link
Author

pengouy commented Feb 6, 2024

Thanks for the explaination, I have noticed that Hecatomb would select itself to run multiple jobs.
I checked the result file "contigAnnotations.tsv" and found that there was an error during the seperation of the colume "target" to classification name like this:

  1. kingdom phylum class order family genus species
  2. Uroviricota\ Caudoviricetes\ Caudoviricetes order\ Caudoviricetes family\ Punavirus\ Punavirus P1

The "\" had not been correctly replaced.

And I have another doubt that when I use the contig sequences in "merged_assembly.fasta" to BLAST in NCBI whose contigID is clutered into viruses in "contigAnnotations.tsv" file, the BLAST results almost could not match to the "contigAnnotations.tsv". Isn't there a correspondence among these two files? About 10 days ago, I met this question when I run the same data using version 1.2.0, I thought it was an accident, but met same quesion again, It really confused me.

@beardymcjohnface
Copy link
Collaborator

Thanks, I'll look into it.

@pengouy
Copy link
Author

pengouy commented Feb 6, 2024

Hi, sorry to bother you again that the job ended just now without no error report, but yeilded a "bigtable.tsv" sized only 1Kb, I checked the log directory and found "secondary_nt_calc_lca.log" file sized more than 3Gb, it looks like the resuls have not been successfully merged. Could you please check whether there is a bug?

@pengouy
Copy link
Author

pengouy commented Feb 6, 2024

Here is the relative log detail:

[Tue Feb  6 18:56:40 2024]
rule combine_aa_nt:
    input: hecatomb.out/processing/mmseqs_aa_secondary/AA_bigtable.tsv, hecatomb.out/processing/mmseqs_nt_secondary/NT_bigtable.tsv
    output: hecatomb.out/results/bigtable.tsv
    log: hecatomb.out/logs/combine_AA_NT.log
    jobid: 77
    benchmark: hecatomb.out/benchmarks/combine_AA_NT.txt
    reason: Missing output files: hecatomb.out/results/bigtable.tsv; Input files updated by another job: hecatomb.out/processing/mmseqs_nt_secondary/NT_bigtable.tsv, hecatomb.out/processing/mmseqs_aa_secondary/AA_bigtable.tsv
    resources: tmpdir=/tmp, time=01:00:00, mem_mb=16000, mem_mib=15259, mem=16000MB

{ cat hecatomb.out/processing/mmseqs_aa_secondary/AA_bigtable.tsv > hecatomb.out/results/bigtable.tsv; tail -n+2 hecatomb.out/processing/mmseqs_nt_secondary/NT_bigtable.tsv >> hecatomb.out/results/bigtable.tsv; } &> hecatomb.out/logs/combine_AA_NT.log; 
[Tue Feb  6 18:56:40 2024]
Finished job 77.
81 of 89 steps (91%) done
Select jobs to execute...

@pengouy
Copy link
Author

pengouy commented Feb 7, 2024

And when I load the "contigSeqTable.tsv" file, I found all classification of contigs into taxon levels remains NA.

@beardymcjohnface
Copy link
Collaborator

If your bigtable is empty then the contigSeqTable will be all NA as it joins the seq annotations with the contigs. I think I've fixed, it was caused the formatting issues with the taxonkit command. Can you confirm that both hecatomb.out/processing/mmseqs_aa_secondary/AA_bigtable.tsv and hecatomb.out/results/bigtable.tsv are tiny files?

@pengouy
Copy link
Author

pengouy commented Feb 8, 2024

Oh no! I have deleted the whole hecatomb.out directory yesterday, but I am sure hecatomb.out/results/bigtable.tsv is tiny file

@beardymcjohnface
Copy link
Collaborator

oh that's fine, i'm pretty sure i've worked out the issues. I'm just waiting on new releases for koverage and trimnami before i can push the next version of hecatomb.

@pengouy
Copy link
Author

pengouy commented Feb 8, 2024

Appreciate your efforts, looking forward to it.

@pengouy
Copy link
Author

pengouy commented Feb 9, 2024

Hi, I am a little bit confused about the result of the file merged_assembly.fasta, the NCBI BLAST results of contigs in this file do not always match the taxon classification of contigAnnotations.tsv. And I have also made alignment between contigs and sequences fatched according to the NCBI accession number in column 'target' of contigAnnotations.tsv, they do not match either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants