Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding LogSoftmax+nn.CrossEntropyLoss #1

Open
anravich102 opened this issue Jan 24, 2019 · 2 comments
Open

Question regarding LogSoftmax+nn.CrossEntropyLoss #1

anravich102 opened this issue Jan 24, 2019 · 2 comments

Comments

@anravich102
Copy link

anravich102 commented Jan 24, 2019

Hi, doesn't the nn.CrossEntropyLoss already apply a logsoftmax? You have used a separate LogSoftmax in your decoder too. Did the model achieve Lev distance 10 despite this? Kindly clarify.
Also, are the default hyperparameters in your main.py what you used to get Lev. distance 10 on WSJ? With the defaults, the convergence seems too slow for me. After how many epochs did you see proper words and did you use mel_spectrograms as input features?
TIA!

@MysteryVaibhav
Copy link
Owner

doesn't the nn.CrossEntropyLoss already apply a logsoftmax?

Yes (nn.CrossEntropyLoss = logsoftmax + nn.NLLLoss). Still, I found that using logsoftmax with nn.CrossEntropyLoss resulted in stable training.

You have used a separate LogSoftmax in your decoder too. Did the model achieve Lev distance 10 despite this? Kindly clarify.

Yes

Also, are the default hyperparameters in your main.py what you used to get Lev. distance 10 on WSJ?

All the parameters are same with a learning rate of 0.001

With the defaults, the convergence seems too slow for me. After how many epochs did you see proper words

I just realized that the current version uses ASGD optimizer. I used Adam for the initial 20 or 30 epochs until convergence, then I changed the optimizer to ASGD with a reduced learning rate of 0.00096. I guess that's the reason for slow convergence (change Line 29 in trainer.py). If I remember correctly, my loss used to start in several thousands, and the model started producing proper words when the loss became much less than 1000.

and did you use mel_spectrograms as input features?

Yes, I used 40-dimensional mel-spectral vectors. To convert the speech to the mel spectrogram, it was segmented into frames, each 25 ms wide, where the stride between adjacent frames was 10 ms.

Let me know if you still face issues !

@anravich102
Copy link
Author

Thanks for your response. I modified my code to match what you said and it does converge much better. Do you happen to remember how long training took to get to the final model? I am running it on Librispeech (~1000hours) so I expect it to take around a week on 4 GPU's. Is that a reasonable estimate according to what you saw?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants