Fine-tuning speaker diarization pipeline tutorial (with my own dataset): params.yml not being generated #803

anon747 · 2021-10-31T18:37:07Z

anon747
Oct 31, 2021

Hello team-Pyannote and community!

I have created my own dataset with the (.lst, .rttm, .uem) files for each of the subsets train, development and test. I have followed the data preparation tutorial for this and am sure that this step has been completed correctly. I am only using a single dataset in my work i.e. there is no additive noise coming in from the MUSAN or any other dataset.

Now, I tried fine-tuning the speaker diarization pipeline to my dataset. For this I am following the fine-tune pipelines to your data tutorial.

I complete the first step:-

**for SUBSET in developement test

do
for TASK in sad scd emb
do
pyannote-audio ${TASK} apply --step=0.1 --pretrained=${TASK}_ami --subset=${SUBSET} ${EXP_DIR} MyDataset.SpeakerDiarization.Data
done
done**

following which I obtain 3 folders sad_ami, scd_ami, emb_ami in my EXP_DIR. Inside each of these folders is a metadata.yml file. Sample contents from one of these yml files is given below. Please let me know if the output of step 1 is being generated as it should be:-

dimension: 512
duration: 2.0
start: 0.0
step: 0.2

I proceed to the next step i.e.
a. I create the train directory using os.makedirs command and then set the TRN_DIR environment variable as
export TRN_DIR=${EXP_DIR}/train/MyDataset.SpeakerDiarization.Data.development
b. Then I run $ pyannote-pipeline train --subset=development --forever ${EXP_DIR} MyDataset.SpeakerDiarization.Data

I am supposed to create the TRN_DIR myself right (as I mentioned in 2a)?

However, no params.yml file is generated anywhere in my entire filesystem which contradicts the tutorial. Please note the following points:-
a. I have made the appropriate changes in my database.yml and config.yml files where I have changed the dataset name etc.
b. I am working on Google Colab notebooks synced to Google Drive.
c. To execute from the command line (as in 2.), I use Python's os.system(<command_in_string_format>).

Can someone please help me figure out where the problem may lie? Whether I'm going astray in step 1 itself, if the output of step 1 is not what is expected or if I am making any mistake in step 2?

Thanks! :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tuning speaker diarization pipeline tutorial (with my own dataset): params.yml not being generated #803

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Fine-tuning speaker diarization pipeline tutorial (with my own dataset): params.yml not being generated #803

anon747 Oct 31, 2021

Replies: 0 comments

anon747
Oct 31, 2021