Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

differences in mel-spectogram #97

Open
amiteliav opened this issue Aug 27, 2021 · 4 comments
Open

differences in mel-spectogram #97

amiteliav opened this issue Aug 27, 2021 · 4 comments

Comments

@amiteliav
Copy link

amiteliav commented Aug 27, 2021

Hi

i working with your git - it is really good! thanks

im trying to generate my own mel-spectrogram with your code in "make_spect.py"
here is the Demo mel-spectrogram: p225_003
Demo

here is my mel-spectogram : (p225_003)
my

the sizes are not the same:
Demo: (376, 80)
My: (475, 80)

and you can see the spectrogram don't look the same, the demo is all over the range of the spectrogram
whereas my isn't. mine looks the same but more compressed.

when using the demo spectrogram - the conversion works.
when using my spectrogram - it doesn't

any idea why the spectrograms are different? and how the correct it?

thanks
Amit

@auspicious3000
Copy link
Owner

Your frequency axis and time axis are swapped.

@amiteliav
Copy link
Author

thanks, you are right, the axes were swapped, these are the new plots:

Demo:
Demo

My with make_spect:
my

but still, there are some differences. the size of the spectrograms is not the same.
the demo: (80, 376)
My with make_spect: (80, 475)

I used the code make_spect.py so i thought i should get the same results as the demo.
when i use these spectrograms, the results of the conversion are very different.
using the demo, I get a nice good conversion, but using the spectrogram I created with the make_spect i get a very unclear result.
hope you could help me understand why, because i cant get the model to convert new files, not from the demo dataset :/

thanks

@auspicious3000
Copy link
Owner

They should only differ by the amount of silence before and after. Please confirm if this is true.

@MHVali
Copy link

MHVali commented Nov 8, 2022

@amiteliav @auspicious3000 Hi, I am trying to get a good conversion quality using this repo, but I cannot. Could you please let me know what hyper-parameters you use for "dim_neck", "freq", "batch_size", and "num_itrs"? I am using the small data which is prepared in this repo. Could you pleas let me know if you use any other dataset that gives you a good conversion?
Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants