You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, doesn't the nn.CrossEntropyLoss already apply a logsoftmax? You have used a separate LogSoftmax in your decoder too. Did the model achieve Lev distance 10 despite this? Kindly clarify.
Also, are the default hyperparameters in your main.py what you used to get Lev. distance 10 on WSJ? With the defaults, the convergence seems too slow for me. After how many epochs did you see proper words and did you use mel_spectrograms as input features?
TIA!
The text was updated successfully, but these errors were encountered:
doesn't the nn.CrossEntropyLoss already apply a logsoftmax?
Yes (nn.CrossEntropyLoss = logsoftmax + nn.NLLLoss). Still, I found that using logsoftmax with nn.CrossEntropyLoss resulted in stable training.
You have used a separate LogSoftmax in your decoder too. Did the model achieve Lev distance 10 despite this? Kindly clarify.
Yes
Also, are the default hyperparameters in your main.py what you used to get Lev. distance 10 on WSJ?
All the parameters are same with a learning rate of 0.001
With the defaults, the convergence seems too slow for me. After how many epochs did you see proper words
I just realized that the current version uses ASGD optimizer. I used Adam for the initial 20 or 30 epochs until convergence, then I changed the optimizer to ASGD with a reduced learning rate of 0.00096. I guess that's the reason for slow convergence (change Line 29 in trainer.py). If I remember correctly, my loss used to start in several thousands, and the model started producing proper words when the loss became much less than 1000.
and did you use mel_spectrograms as input features?
Yes, I used 40-dimensional mel-spectral vectors. To convert the speech to the mel spectrogram, it was segmented into frames, each 25 ms wide, where the stride between adjacent frames was 10 ms.
Thanks for your response. I modified my code to match what you said and it does converge much better. Do you happen to remember how long training took to get to the final model? I am running it on Librispeech (~1000hours) so I expect it to take around a week on 4 GPU's. Is that a reasonable estimate according to what you saw?
Hi, doesn't the nn.CrossEntropyLoss already apply a logsoftmax? You have used a separate LogSoftmax in your decoder too. Did the model achieve Lev distance 10 despite this? Kindly clarify.
Also, are the default hyperparameters in your main.py what you used to get Lev. distance 10 on WSJ? With the defaults, the convergence seems too slow for me. After how many epochs did you see proper words and did you use mel_spectrograms as input features?
TIA!
The text was updated successfully, but these errors were encountered: