`g_loss` is NaN cause of model.predictor_encoder and model.decoder #284

xorium · 2024-09-24T09:44:13Z

Hi! Thank you for you work!

I'm trying to run second stage training (single speaker, non-english language) and already in epoch 0, the code hits the line set_trace(). I checked the recommendations from here, and it seems that none of the points apply (I’m definitely using multilingual-PL-BERT, it’s epoch 0, I haven’t changed the code, and the training in the first stage went without any errors).

So I tried debugging the code to see the reason, and I noticed that everything starts with model.predictor_encoder and model.decoder returning NaN tensors with non-NaN input data.

I don't have enough knowledge to understand the cause further.
Could you at least guide me on where I should look next?
Thank you!

The text was updated successfully, but these errors were encountered:

martinambrus · 2024-09-25T18:56:10Z

try checking this issue along with its pull request to see if it helps - #254

xorium · 2024-09-26T09:23:19Z

Thank you! This really helps!

xorium closed this as completed Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`g_loss` is NaN cause of model.predictor_encoder and model.decoder #284

`g_loss` is NaN cause of model.predictor_encoder and model.decoder #284

xorium commented Sep 24, 2024

martinambrus commented Sep 25, 2024

xorium commented Sep 26, 2024

g_loss is NaN cause of model.predictor_encoder and model.decoder #284

g_loss is NaN cause of model.predictor_encoder and model.decoder #284

Comments

xorium commented Sep 24, 2024

martinambrus commented Sep 25, 2024

xorium commented Sep 26, 2024

`g_loss` is NaN cause of model.predictor_encoder and model.decoder #284

`g_loss` is NaN cause of model.predictor_encoder and model.decoder #284