Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: data/train-clean-360_anon_sp/feats.scp already exists #18

Open
suhitaghosh10 opened this issue May 12, 2022 · 5 comments
Open

Error: data/train-clean-360_anon_sp/feats.scp already exists #18

suhitaghosh10 opened this issue May 12, 2022 · 5 comments

Comments

@suhitaghosh10
Copy link

steps/diagnostic/analyze_alignments.sh --cmd run.pl data/lang exp/tri3b_cleaned
steps/diagnostic/analyze_alignments.sh: see stats in exp/tri3b_cleaned/log/analyze_alignments.log
1 warnings in exp/tri3b_cleaned/log/build_tree.log
27 warnings in exp/tri3b_cleaned/log/acc...log
8 warnings in exp/tri3b_cleaned/log/update..log
33 warnings in exp/tri3b_cleaned/log/align.
..log
20 warnings in exp/tri3b_cleaned/log/convert.
.log
9 warnings in exp/tri3b_cleaned/log/fmllr...log
steps/train_sat.sh: Likelihood evolution:
-55.2576 -52.3376 -52.1623 -51.612 -50.1089 -48.6247 -47.4872 -46.7146 -46.154 -45.526 -45.1648 -44.7545 -44.4738 -44.2664 -44.0799 -43.9154 -43.7681 -43.6342 -43.512 -43.3358 -43.2092 -43.1134 -43.0245 -42.9414 -4
2.8652 -42.793 -42.7238 -42.6577 -42.5951 -42.5037 -42.44 -42.4085 -42.3882 -42.3738
exp/tri3b_cleaned: nj=10 align prob=-45.15 over 355.80h [retry=0.0%, fail=0.0%] states=5952 gauss=150145 fmllr-impr=0.71 over 293.24h tree-impr=9.36
steps/train_sat.sh: done training SAT system in exp/tri3b_cleaned
local/chain/run_tdnn_1d__360.sh
local/nnet3/run_ivector_common.sh: preparing directory for low-resolution speed-perturbed data (for alignment)
utils/data/perturb_data_dir_speed_3way.sh: data/train-clean-360_anon_sp/feats.scp already exists: refusing to run this (please delete data/train-clean-360_anon_sp/feats.scp if you want this to run)

@SarinaMeyer
Copy link

I have a similar problem, each time I rerun the evaluation, the run terminates with an error because some files already exist. Could you please provide an update to the cleanup.sh that includes the new files of this challenge?

@Natalia-T
Copy link
Collaborator

Kaldi-based ASR AM models and corresponding Kaldi scripts are used for ASR evaluation. The ASR AM training scripts comprised multiple stages of training, and in some of them an additional verification is implemented to avoid repeating of already completed processes.

For example, in your case:

  1. utils/data/perturb_data_dir_speed_3way.sh data/${train_set} data/${train_set}_sp
  2. https://github.com/kaldi-asr/kaldi/blob/d673298886e8d62d4c890e5e3eac8491df0b7e12/egs/wsj/s5/utils/data/perturb_data_dir_speed_3way.sh#L52

So, you can more precisely specify the stage from which you want to resume your training or remove a corresponding file as suggested in the Kaldi script.

cleanup.sh was originally designed for (re-)running ASR/ASV evaluation stages only (with already trained ASR/ASV evaluation models), and could be updated correspondingly for the new setup. However, it is not related to training of ASR/ASV models because each of these processes has multiple (sub)stages and the logic which data to remove will depend on the completed (sub)stages and is not so straightforward (requires user's supervision).

@suhitaghosh10
Copy link
Author

Thanks for the detailed answer. But, when I am running for the first time, shouldn't it run without such errors?

@Natalia-T
Copy link
Collaborator

But, when I am running for the first time, shouldn't it run without such errors?

Yes, for the first time you should not get such errors.

@egaznep
Copy link

egaznep commented Jun 18, 2022

I have also been experiencing this issue, and after numerous attempts, managed to get a full execution without any 'refusing to run' errors. I created a shell script in the baseline folder and pasted the following into it:

rm -rf data/train-clean-360_anon_sp/feats.scp
rm -rf data/train-clean-360_anon_sp_hires/feats.scp
rm -rf data/train-clean-360_anon_sp_hires_60k/feats.scp
rm -rf exp/tri3b_cleaned_ali_train-clean-360_anon_sp
rm -rf exp/models/user_asr_eval_anon/chain_cleaned/tree_sp/final.mdl

and I am running this each time I'd like to re-run the baseline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants