Question regarding LogSoftmax+nn.CrossEntropyLoss #1

anravich102 · 2019-01-24T23:50:38Z

Hi, doesn't the nn.CrossEntropyLoss already apply a logsoftmax? You have used a separate LogSoftmax in your decoder too. Did the model achieve Lev distance 10 despite this? Kindly clarify.
Also, are the default hyperparameters in your main.py what you used to get Lev. distance 10 on WSJ? With the defaults, the convergence seems too slow for me. After how many epochs did you see proper words and did you use mel_spectrograms as input features?
TIA!

MysteryVaibhav · 2019-01-25T09:20:25Z

doesn't the nn.CrossEntropyLoss already apply a logsoftmax?

Yes (nn.CrossEntropyLoss = logsoftmax + nn.NLLLoss). Still, I found that using logsoftmax with nn.CrossEntropyLoss resulted in stable training.

You have used a separate LogSoftmax in your decoder too. Did the model achieve Lev distance 10 despite this? Kindly clarify.

Yes

Also, are the default hyperparameters in your main.py what you used to get Lev. distance 10 on WSJ?

All the parameters are same with a learning rate of 0.001

With the defaults, the convergence seems too slow for me. After how many epochs did you see proper words

I just realized that the current version uses ASGD optimizer. I used Adam for the initial 20 or 30 epochs until convergence, then I changed the optimizer to ASGD with a reduced learning rate of 0.00096. I guess that's the reason for slow convergence (change Line 29 in trainer.py). If I remember correctly, my loss used to start in several thousands, and the model started producing proper words when the loss became much less than 1000.

and did you use mel_spectrograms as input features?

Yes, I used 40-dimensional mel-spectral vectors. To convert the speech to the mel spectrogram, it was segmented into frames, each 25 ms wide, where the stride between adjacent frames was 10 ms.

Let me know if you still face issues !

anravich102 · 2019-01-25T16:39:38Z

Thanks for your response. I modified my code to match what you said and it does converge much better. Do you happen to remember how long training took to get to the final model? I am running it on Librispeech (~1000hours) so I expect it to take around a week on 4 GPU's. Is that a reasonable estimate according to what you saw?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding LogSoftmax+nn.CrossEntropyLoss #1

Question regarding LogSoftmax+nn.CrossEntropyLoss #1

anravich102 commented Jan 24, 2019 •

edited

Loading

MysteryVaibhav commented Jan 25, 2019

anravich102 commented Jan 25, 2019

Question regarding LogSoftmax+nn.CrossEntropyLoss #1

Question regarding LogSoftmax+nn.CrossEntropyLoss #1

Comments

anravich102 commented Jan 24, 2019 • edited Loading

MysteryVaibhav commented Jan 25, 2019

anravich102 commented Jan 25, 2019

anravich102 commented Jan 24, 2019 •

edited

Loading