You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have created my own dataset with the (.lst, .rttm, .uem) files for each of the subsets train, development and test. I have followed the data preparation tutorial for this and am sure that this step has been completed correctly. I am only using a single dataset in my work i.e. there is no additive noise coming in from the MUSAN or any other dataset.
Now, I tried fine-tuning the speaker diarization pipeline to my dataset. For this I am following the fine-tune pipelines to your data tutorial.
I complete the first step:-
**for SUBSET in developement test
do
for TASK in sad scd emb
do
pyannote-audio ${TASK} apply --step=0.1 --pretrained=${TASK}_ami --subset=${SUBSET} ${EXP_DIR} MyDataset.SpeakerDiarization.Data
done
done**
following which I obtain 3 folders sad_ami, scd_ami, emb_ami in my EXP_DIR. Inside each of these folders is a metadata.yml file. Sample contents from one of these yml files is given below. Please let me know if the output of step 1 is being generated as it should be:-
dimension: 512
duration: 2.0
start: 0.0
step: 0.2
I proceed to the next step i.e.
a. I create the train directory using os.makedirs command and then set the TRN_DIR environment variable as export TRN_DIR=${EXP_DIR}/train/MyDataset.SpeakerDiarization.Data.development
b. Then I run $ pyannote-pipeline train --subset=development --forever ${EXP_DIR} MyDataset.SpeakerDiarization.Data
I am supposed to create the TRN_DIR myself right (as I mentioned in 2a)?
However, no params.yml file is generated anywhere in my entire filesystem which contradicts the tutorial. Please note the following points:-
a. I have made the appropriate changes in my database.yml and config.yml files where I have changed the dataset name etc.
b. I am working on Google Colab notebooks synced to Google Drive.
c. To execute from the command line (as in 2.), I use Python's os.system(<command_in_string_format>).
Can someone please help me figure out where the problem may lie? Whether I'm going astray in step 1 itself, if the output of step 1 is not what is expected or if I am making any mistake in step 2?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello team-Pyannote and community!
I have created my own dataset with the (.lst, .rttm, .uem) files for each of the subsets train, development and test. I have followed the data preparation tutorial for this and am sure that this step has been completed correctly. I am only using a single dataset in my work i.e. there is no additive noise coming in from the MUSAN or any other dataset.
Now, I tried fine-tuning the speaker diarization pipeline to my dataset. For this I am following the fine-tune pipelines to your data tutorial.
**for SUBSET in developement test
following which I obtain 3 folders sad_ami, scd_ami, emb_ami in my EXP_DIR. Inside each of these folders is a metadata.yml file. Sample contents from one of these yml files is given below. Please let me know if the output of step 1 is being generated as it should be:-
dimension: 512
duration: 2.0
start: 0.0
step: 0.2
a. I create the train directory using os.makedirs command and then set the TRN_DIR environment variable as
export TRN_DIR=${EXP_DIR}/train/MyDataset.SpeakerDiarization.Data.development
b. Then I run $ pyannote-pipeline train --subset=development --forever ${EXP_DIR} MyDataset.SpeakerDiarization.Data
I am supposed to create the TRN_DIR myself right (as I mentioned in 2a)?
However, no params.yml file is generated anywhere in my entire filesystem which contradicts the tutorial. Please note the following points:-
a. I have made the appropriate changes in my database.yml and config.yml files where I have changed the dataset name etc.
b. I am working on Google Colab notebooks synced to Google Drive.
c. To execute from the command line (as in 2.), I use Python's os.system(<command_in_string_format>).
Can someone please help me figure out where the problem may lie? Whether I'm going astray in step 1 itself, if the output of step 1 is not what is expected or if I am making any mistake in step 2?
Thanks! :)
Beta Was this translation helpful? Give feedback.
All reactions