Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to load db index #101

Open
MPourjam opened this issue Jun 10, 2021 · 4 comments
Open

Failing to load db index #101

MPourjam opened this issue Jun 10, 2021 · 4 comments
Assignees
Labels
Milestone

Comments

@MPourjam
Copy link

Although the index for database is built and exists in the same directory with the database file, Sina throws the error of 'Failed to load "path/to/index" - rebuilding'

@epruesse
Copy link
Owner

Odd. Can you elaborate?

  • where did you get the build from?
  • what parameters and data did you use?

Some conditions under which this would occur

  • The index got corrupted somehow, e.g. because the disk was full or your storage quota exceeded
  • You changed the size of k between runs
  • You updated the source database (should give a different message though)

@MPourjam
Copy link
Author

I am using Sina in pipeline. It needs to get run through an iteration with the arguments below:
sina-1.7 --in {} --search --meta-fmt csv --threads {} --lca-fields tax_slv --turn --search-max-result 100 --db {the absolute path to SILVA ARB SSURef99 } --out {The path to output file} --search-min-sim 0.9 --lca-quorum 0.7 --search-no-fast > /dev/null

Even after removing the build and letting the Sina creates it and store it, it still throws the error : "[Search (internal)] argument index 14:23:42 [Search (internal)] Failed to load "/path/to/db/SILVA-RefNR/SILVA138_SSURefNR99_120620.sidx" - rebuilding"
The data files are regular FASTA files.

Thank you very much.

@ilnamkang
Copy link

ilnamkang commented Jul 8, 2021

I've experienced the same problem.

(1) I've downloaded "sina-1.7.2-linux.tar.gz" from this repository, decompressed it, and used the binary within "sina-1.7.2-linux/bin".

(2) My command was as below. I had used the same command for the previous run, except for input and output files.

sina -i input.fasta -r SILVA_138.1_SSURef_NR99_12_06_20_opt.arb -o output.fasta -o output.csv --turn --search --fs-full-len=1300 --lca-fields=tax_slv --show-conf --intype=fasta --preserve-order --fasta-write-dna --fasta-write-dots --csv-crlf --overhang=remove --lowercase=unaligned --insertion=forbid --pen-gapext=2 --calc-idty --fs-kmer-no-fast --search-iupac=pessimistic

(3) Free space in my disk is >20 TB, so I think that the index was not corrupted.

(4) I used the same source database and didn't changed the size of k.

Anyway, SINA worked fine after rebuilding the index, which required 15-20 min on my machine (Ubuntu 18.04).

Below, I attach the stdout of the run.
-----
19:21:54 [SINA] This is SINA 1.7.2.
Effective parameters:
add-relatives = 0
auto-filter-field =
auto-filter-threshold = 0.8
calc-idty = 1
colors = 0
csv-crlf = 1
csv-id = name
csv-sep =
db = "SILVA_138.1_SSURef_NR99_12_06_20_opt.arb"
debug-graph = 0
fasta-block = 0
fasta-idx = 0
fasta-write-dna = 1
fasta-write-dots = 1
filter =
fs-cover-gene = 0
fs-engine = internal
fs-full-len = 1300
fs-kmer-len = 10
fs-kmer-mm = 0
fs-kmer-no-fast = 1
fs-kmer-norel = 0
fs-leave-query-out = 0
fs-max = 40
fs-min = 40
fs-min-len = 150
fs-msc = 0.7
fs-msc-max = 2
fs-no-graph = 0
fs-oldmatch = 0
fs-req = 1
fs-req-full = 1
fs-req-gaps = 10
fs-weight = 1
gene-end = 0
gene-start = 0
in = "input.fasta"
insertion = forbid
intype = FASTA
lca-fields = tax_slv
lca-quorum = 0.7
line-length = 0
lowercase = unaligned
markaligned = 0
markcopied = 0
match-score = 2
max-in-flight = 160
meta-fmt = none
min-idty = 0
mismatch-score = -1
no-align = 0
num-pts = 80
out = "output.fasta" "output.csv"
overhang = remove
pen-gap = 5
pen-gapext = 2
prealigned = 0
preserve-order = 1
prot-level = 4
ptport = :/tmp/sina_pt_81559
realign = 0
search = 1
search-all = 0
search-copy-fields =
search-correction = none
search-cover = query
search-filter-lowercase = 0
search-ignore-super = 0
search-iupac = pessimistic
search-kmer-candidates = 1000
search-kmer-len = 10
search-kmer-mm = 0
search-kmer-norel = 0
search-max-result = 10
search-min-sim = 0.7
search-no-fast = 0
search-port = :/tmp/sina_pt2_81559
select-file =
select-skip = 0
select-step = 1
show-conf =
show-diff = 0
show-dist = 0
threads = 4294967295
turn = revcomp
use-subst-matrix = 0
write-used-rels = 0

Processing: 0 [00:00:04]
19:21:58 [Search (internal)] Failed to load "SILVA_138.1_SSURef_NR99_12_06_20_opt.sidx" - rebuilding
19:31:20 [famfinder] Using internal engine for reference search
Processing: 0 [00:09:26]██████████████████████████████████████████████████████████████████████████████████████| 1048576/1048576 [00:09:18 / 00:00:00]
19:31:20 [Search (internal)] Failed to load "SILVA_138.1_SSURef_NR99_12_06_20_opt.sidx" - rebuilding
19:39:58 [SINA] Aligner ready. Processing sequences
19:39:59 [SINA] Took 0.721s to align 24 sequences (33.2428 sequences/s)
19:39:59 [SINA] SINA finished.
19:39:59 [ARB I/O] Closing ARB database '"SILVA_138.1_SSURef_NR99_12_06_20_opt.arb"' ...
-----

@epruesse epruesse added the bug label Sep 15, 2021
@epruesse epruesse self-assigned this Sep 15, 2021
@epruesse epruesse added this to the 1.7.3 milestone Sep 15, 2021
@Lorcaserin
Copy link

Lorcaserin commented Apr 12, 2022

Hello,

I have a similar issue which seems to be linked to the --search-no-fast & --fs-kmer-no-fast parameters.
I am trying to replicate online SILVA tool default parameters and when I use --search-no-fast or --fs-kmer-no-fast, my database (SSU Ref NR99) which worked with the parameter off, goes into rebuilding the index and after, I believe, yields a similar result (almost) to when the parameter is off (as I compared with the online SILVA tool's output) & needs to rebuild the index everytime one of those parameter is switched on.

Below is my command: (my input is a fasta file)
sina -i $query -o $output -r SILVA_138.1_SSURef_NR99.arb --outtype=csv --search --search-db SILVA_138.1_SSURef_NR99.arb --lca-quorum=0.8 --min-idty=0.9 --lca-fields tax_slv --fields align_quality_slv,lca_tax_slv --preserve-order --show-conf --search-no-fast

Find below the stdout of the run


08:23:14 [SINA] This is SINA 1.7.2.
Effective parameters:
add-relatives = 0
auto-filter-field =
auto-filter-threshold = 0.8
calc-idty = 0
colors = 0
csv-crlf = 0
csv-id = name
csv-sep =
db = "/Raw_data/SILVA_138.1_SSURef_NR99.arb"
debug-graph = 0
fasta-block = 0
fasta-idx = 0
fasta-write-dna = 0
fasta-write-dots = 0
fields = align_quality_slv,lca_tax_slv
filter =
fs-cover-gene = 0
fs-engine = internal
fs-full-len = 1400
fs-kmer-len = 10
fs-kmer-mm = 0
fs-kmer-no-fast = 0
fs-kmer-norel = 0
fs-leave-query-out = 0
fs-max = 40
fs-min = 40
fs-min-len = 150
fs-msc = 0.7
fs-msc-max = 2
fs-no-graph = 0
fs-oldmatch = 0
fs-req = 1
fs-req-full = 1
fs-req-gaps = 10
fs-weight = 1
gene-end = 0
gene-start = 0
in = "/Raw_data/input.fasta"
insertion = shift
intype = AUTO
lca-fields = tax_slv
lca-quorum = 0.8
line-length = 0
lowercase = none
markaligned = 0
markcopied = 0
match-score = 2
max-in-flight = 20
meta-fmt = none
min-idty = 0.9
mismatch-score = -1
no-align = 0
num-pts = 10
out = "/Results/result.csv"
outtype = CSV
overhang = attach
pen-gap = 5
pen-gapext = 2
prealigned = 0
preserve-order = 1
prot-level = 4
ptport = :/tmp/sina_pt_188530
realign = 0
search = 1
search-all = 0
search-copy-fields =
search-correction = none
search-cover = query
search-db = "/Raw_data/SILVA_138.1_SSURef_NR99.arb"
search-filter-lowercase = 0
search-ignore-super = 0
search-iupac = optimistic
search-kmer-candidates = 1000
search-kmer-len = 10
search-kmer-mm = 0
search-kmer-norel = 0
search-max-result = 10
search-min-sim = 0.7
search-no-fast = 1
search-port = :/tmp/sina_pt2_188530
select-file =
select-skip = 0
select-step = 1
show-conf =
show-diff = 0
show-dist = 0
threads = 4294967295
turn = none
use-subst-matrix = 0
write-used-rels = 0

08:23:20 [famfinder] Using internal engine for reference search
Processing: 0 [00:00:05]███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 510508/510508 [00:00:00 / 00:00:00]
08:23:20 [Search (internal)] Failed to load "/Raw_data/SILVA_138.1_SSURef_NR99.sidx" - rebuilding
08:29:10 [SINA] Aligner ready. Processing sequences
08:29:44 [SINA] Took 34.455s to align 1258 sequences (36.5108 sequences/s)
08:29:44 [SINA] SINA finished.
08:29:44 [ARB I/O] Closing ARB database '"/Raw_data/SILVA_138.1_SSURef_NR99.arb"' ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants