Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The default model parameter for training is different from the pretrained checkpoint #11

Open
seyong92 opened this issue Feb 25, 2023 · 2 comments

Comments

@seyong92
Copy link

seyong92 commented Feb 25, 2023

Hello, thank you for the valuable code sharing!

I have several questions about the code.

  1. The default parameter for training is different from the pre-trained model in the repo.
    For the default setting, it has 229 mel bins (as same as the paper), but the pre-trained model has 300 mel bins. Also, f_min and f_max value are different. Also I found that the pre-trained model has one more conv layer in the PreConvSpec. Does this change have a meaningful change on the performance?

  2. Also, when I tried the training (once with the default parameter, and the other with the pre-trained model parameter), both cases shows much lower performance than the pre-trained model (0.7403 for valid F1) and the score reported in the paper. I think the only difference is the batch size, which is 12 in the paper and 2 in the default parameter. Have you ever trained the model with batch size 2 or trained the model with the default parameter in this repo?

image

Again, thank you very much for sharing your code! 😁

@Yujia-Yan
Copy link
Owner

Yujia-Yan commented Feb 25, 2023

Hi,
Thanks for your interest.

  1. I do not see those parameter matters, as far as I know. But this view may be biased as a larger range may accommodate more harmonics for seldomly played high notes.
  2. a. The curve looks under-fitting. The number of training steps should be increased accordingly. The code is using an aggressive learning rate scheduler (OneCircle) which requires the total number of iterations known beforehand. It's probably better to switch to a constant learning rate scheduler to train longer (e.g. the famous default 1e-4 learning rate) in practice if we don't know how much steps are enough.
    b. The architecture uses batch norm, which is also notorious for small batch training. That's why I have not even tried small batch like that.
    BTW, the spikes in train_f1 and others suggest you may have some issue in dataset. Have you checked the sampling rate of the data?

@seyong92
Copy link
Author

Thank you for the fast comment! I will change the learning rate and share the results after training.

Also, I found that some of the files are not correctly resampled to 44100, as you said. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants