Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is a problem about training a conformer+RNN-T model? #38

Open
scufan1990 opened this issue Dec 15, 2021 · 6 comments
Open

There is a problem about training a conformer+RNN-T model? #38

scufan1990 opened this issue Dec 15, 2021 · 6 comments

Comments

@scufan1990
Copy link

Hi,
There is a problem about training a conformer+RNN-T model.
How about the cer and wer with one GPU?

I'm train the model on one RTX TITAN GPU, training the conformer(encoder layers 16, encoder dim 144, decoder layer 1, decoder dim 320), after 50 epoch training the CER is about 27 and don't reduce anymore.

@wszyy
Copy link

wszyy commented Jan 18, 2022

Hello, I meet the same problem as you, but I use the Conformer Encoder and Transformer Decoder. By the way, do you solve the problem about the output of DecoderRNNT? It's combined with 4 dimensions, how to use it to recognize speech?

@jingzhang0909
Copy link

jingzhang0909 commented Jan 19, 2022

Could you tell me what dataset you use in your training?How long it would use to train a ckpt? I find dataset Librispeech with 970 hours in paper. It seems that will cost a lot of time in training.

@wszyy
Copy link

wszyy commented Jan 19, 2022

Um, I use the aishell-1, training beyond 10 hours, but the effects is not very well. Actually, I use the Google Colab to train the model, it really takes a lot of time.
By the way, do you understand the 4 dimensions results? The auther just use torch.cat to connect the encoder_output matrix and decoder_output matrix, it seems that the network can not be used to recognize speech.
So, I build two networks:
1、Conformer's encoder and Transformer's decoder
2、Conformer's encoder and LSTM decoder with attention mechanism.
Now, I have been training the two network for several days.

@jingzhang0909
Copy link

Um, I use the aishell-1, training beyond 10 hours, but the effects is not very well. Actually, I use the Google Colab to train the model, it really takes a lot of time. By the way, do you understand the 4 dimensions results? The auther just use torch.cat to connect the encoder_output matrix and decoder_output matrix, it seems that the network can not be used to recognize speech. So, I build two networks: 1、Conformer's encoder and Transformer's decoder 2、Conformer's encoder and LSTM decoder with attention mechanism. Now, I have been training the two network for several days.

Thanks for your reply! I have not decide the model and dataset which to use yet. I would like to share with you if there is some futher info.

@wszyy
Copy link

wszyy commented Jan 19, 2022

That will be OK, I'm also need to communicate with other to know more about the network. Do you come from china? Maybe we can change the contact.

@wanglongR
Copy link

That will be OK, I'm also need to communicate with other to know more about the network. Do you come from china? Maybe we can change the contact.

hello wszyy,I come from China. I have been learning about conformer's model recently and would like to communicate with you about it. If you are willing, you can add my wechat, ID: scrushy518

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants