Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

g_loss is NaN cause of model.predictor_encoder and model.decoder #284

Closed
xorium opened this issue Sep 24, 2024 · 2 comments
Closed

g_loss is NaN cause of model.predictor_encoder and model.decoder #284

xorium opened this issue Sep 24, 2024 · 2 comments

Comments

@xorium
Copy link

xorium commented Sep 24, 2024

Hi! Thank you for you work!

I'm trying to run second stage training (single speaker, non-english language) and already in epoch 0, the code hits the line set_trace(). I checked the recommendations from here, and it seems that none of the points apply (I’m definitely using multilingual-PL-BERT, it’s epoch 0, I haven’t changed the code, and the training in the first stage went without any errors).

So I tried debugging the code to see the reason, and I noticed that everything starts with model.predictor_encoder and model.decoder returning NaN tensors with non-NaN input data.

I don't have enough knowledge to understand the cause further.
Could you at least guide me on where I should look next?
Thank you!

@martinambrus
Copy link

try checking this issue along with its pull request to see if it helps - #254

@xorium
Copy link
Author

xorium commented Sep 26, 2024

Thank you! This really helps!

@xorium xorium closed this as completed Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants