No rule to produce assembly #97

mhmism · 2023-08-14T21:28:51Z

Hello,

I would like to run Hecatomb only using the assembly or ctg_annotations module, but I end up with an error.

Here is an example code:
hecatomb run --reads /reads ctg_annotations --threads 16

I get this error:
Building DAG of jobs...
MissingRuleException:
No rule to produce ctg_annotations (if you use input functions make sure that they don't raise unexpected exceptions).

beardymcjohnface · 2023-08-15T05:19:28Z

Hi,

The target is ctg_annotate. Try this:

hecatomb run --reads reads/ ctg_annotate --threads 16

mhmism · 2023-08-15T09:44:30Z

Thanks! You are right. I think it is working now.
localrules: all, preprocess, assemble, annotate, ctg_annotate, print_stages, dumpSamplesTsv

I am closing the issue

mhmism · 2023-08-15T11:48:39Z

I have reopened the issue because I have a question related to the annotation. I am interested only in the ctg_annotate module (contigs preprocessing, assembly, and contigs annotations). If I am not mistaken, I think the reads sequence table is also being fed to the mmseqs2 annotation (even if I only run Hecatomb with the ctg_annotate rule). Is this correct? Is there any reason for this? If so, can I bypass the read sequence annotation and move directly to the contigs annotation by mmseqs2?

Here is the command output that comes from Hecatomb:
{ # Run mmseqs taxonomy module
mmseqs easy-taxonomy hecatomb.out/results/seqtable.fasta /home/.conda/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/workflow/../databases/aa/virus_primary_aa/sequenceDB hecatomb.out/processing/mmseqs_aa_primary/MMSEQS_AA_PRIMARY hecatomb.out/processing/mmseqs_aa_primary/mmseqs_aa_tmp --min-length 30 -e 1e-3 --start-sens 1 --sens-steps 3 -s 7 --lca-mode 2 --shuffle 0 -a --tax-output-mode 2 --search-type 2 --tax-lineage 1 --lca-ranks "superkingdom,phylum,class,order,family,genus,species" --format-output "query,target,evalue,pident,fident,nident,mismatch,qcov,tcov,qstart,qend,qlen,tstart,tend,tlen,alnlen,bits,qheader,theader,taxid,taxname,taxlineage" --threads 24 --split-memory-limit 48000M;

beardymcjohnface · 2023-08-16T05:06:02Z

Yes, it's dumb. The issue is that there are both direct annotations of the contigs and read-based annotations of the contigs in the targets for ctg_annotate, so it's essentially just another target for running everything. I'll fix it in the next version. Until then, I'd suggest just running hecatomb run assemble ... and then manually annotate with mmseqs using the hecatomb secondary nt database.

mhmism · 2023-08-17T10:23:04Z

Hello, thanks a lot. It will be great if this can be modified in the next version.

In the meantime, here is the code that I retrieved from a previous Hecatom run to make the annotations for the contigs manually (in case anyone is curious about it):

Create directories for the output tmp files

mkdir home/test/hecatomb.out/results/queryDB
mkdir home/test/hecatomb.out/results/mmseqs2_results
mkdir home/test/hecatomb.out/results/tophit

Create an mmseqs2 query database

mmseqs createdb home/test/hecatomb.out/results/cross_assembly.fasta home/test/hecatomb.out/results/queryDB/queryDB --dbtype 2;

Make an mmseqs2 search against the Hecatomb secondary nt database

mmseqs search home/test/hecatomb.out/results/queryDB/queryDB /home/.conda/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/databases/nt/virus_secondary_nt/sequenceDB home/test/hecatomb.out/results/mmseqs2_results/mmseqs2_results home/test/hecatomb.out/results/mmseqs2_tmp --start-sens 2 -s 7 --sens-steps 3 --split-memory-limit 22000M --min-length 90 -e 1e-20 --search-type 3 --threads 16 > home/test/hecatomb.out/results/mmseqs_contig_annotation.log

Filter TopHit results

mmseqs filterdb home/test/hecatomb.out/results/mmseqs2_results/mmseqs2_results home/test/hecatomb.out/results/tophit/tophit --extract-lines 1;

Convert to alignments

mmseqs convertalis home/test/hecatomb.out/results/queryDB/queryDB /home/.conda/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/databases/nt/virus_secondary_nt/sequenceDB home/test/hecatomb.out/results/tophit/tophit home/test/hecatomb.out/results/tophit/tophit.m8 --format-output 'query,target,evalue,pident,fident,nident,mismatch,qcov,tcov,qstart,qend,qlen,tstart,tend,tlen,alnlen,bits,target';

Header for output table

printf "contigID evalue pident fident nident mismatch qcov tcov qstart qend qlen tstart tend tlen alnlen bits target kingdom phylum class order family genus species" > home/test/hecatomb.out/results/contigAnnotations.tsv;

Assign taxonomy

sed 's/tid|//' home/test/hecatomb.out/results/tophit/tophit.m8 | sed 's/|\S*//' | taxonkit lineage --data-dir /home/.conda/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/databases/tax/taxonomy -i 2 | taxonkit reformat --data-dir /home/.conda/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/databases/tax/taxonomy -i 19 -f '{k}\t{p}\t{c}\t{o}\t{f}\t{g}\t{s}' -F --fill-miss-rank | cut --complement -f2,19 >> home/test/hecatomb.out/results/contigAnnotations.tsv;

mhmism closed this as completed Aug 15, 2023

mhmism reopened this Aug 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No rule to produce assembly #97

No rule to produce assembly #97

mhmism commented Aug 14, 2023

beardymcjohnface commented Aug 15, 2023 •

edited

Loading

mhmism commented Aug 15, 2023 •

edited

Loading

mhmism commented Aug 15, 2023 •

edited

Loading

beardymcjohnface commented Aug 16, 2023

mhmism commented Aug 17, 2023 •

edited

Loading

No rule to produce assembly #97

No rule to produce assembly #97

Comments

mhmism commented Aug 14, 2023

beardymcjohnface commented Aug 15, 2023 • edited Loading

mhmism commented Aug 15, 2023 • edited Loading

mhmism commented Aug 15, 2023 • edited Loading

beardymcjohnface commented Aug 16, 2023

mhmism commented Aug 17, 2023 • edited Loading

Create directories for the output tmp files

Create an mmseqs2 query database

Make an mmseqs2 search against the Hecatomb secondary nt database

Filter TopHit results

Convert to alignments

Header for output table

Assign taxonomy

beardymcjohnface commented Aug 15, 2023 •

edited

Loading

mhmism commented Aug 15, 2023 •

edited

Loading

mhmism commented Aug 15, 2023 •

edited

Loading

mhmism commented Aug 17, 2023 •

edited

Loading