You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The tutorial was markedly changed in June 2023, see commit 6c03bb3 which aimed at fixing the implementation of attention among other things (#2468). In doing so, several other things have been changed:
adding dataloader which returns a batch of zero-padded sequences to train the network
the foward() function of the Decoder process input one word at the time in parallel for all sentences
in the batch until MAX_LENGTH is reached.
I am not a torch expert but I think that the embedding layers in the encoder and decoder should have been modified to recognize padding (padding_idx=0 is missing). Using zero-padded sequence as input might also have other implications during learning but I am not sure. Can you confirm that the implementation is correct?
As a result of these change, the text does not describe well the code. I think that it would be nice to include a discussion of zero-padding and the implications of using batches on the code in the tutorial. I am also curious if there is really a gain in using a batch since most sentences are short.
Finally, I found a mention in the text about using teacher_forcing_ratio which is not included in the code. The tutorial or the code need to be adjusted.
If this is useful, I found another implementation of the same tutorial which seems to be a fork from a previous version (it was archived in 2021):
It does not does not use batches
It includes teacher_forcing_ratio to select the amount of forced teaching
It implements both Luong et al and Bahdanau et al. models of attention
Describe your environment
I appreciate this tutorial as it provides a simple introduction to Seq2Seq models with a small dataset. I am actually trying to port this tutorial in R with torch package.
Add Link
Link to the tutorial:
https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html
Describe the bug
The tutorial was markedly changed in June 2023, see commit 6c03bb3 which aimed at fixing the implementation of attention among other things (#2468). In doing so, several other things have been changed:
foward()
function of the Decoder process input one word at the time in parallel for all sentencesin the batch until MAX_LENGTH is reached.
I am not a torch expert but I think that the embedding layers in the encoder and decoder should have been modified to recognize padding (padding_idx=0 is missing). Using zero-padded sequence as input might also have other implications during learning but I am not sure. Can you confirm that the implementation is correct?
As a result of these change, the text does not describe well the code. I think that it would be nice to include a discussion of zero-padding and the implications of using batches on the code in the tutorial. I am also curious if there is really a gain in using a batch since most sentences are short.
Finally, I found a mention in the text about using
teacher_forcing_ratio
which is not included in the code. The tutorial or the code need to be adjusted.If this is useful, I found another implementation of the same tutorial which seems to be a fork from a previous version (it was archived in 2021):
teacher_forcing_ratio
to select the amount of forced teachingDescribe your environment
I appreciate this tutorial as it provides a simple introduction to Seq2Seq models with a small dataset. I am actually trying to port this tutorial in R with torch package.
cc @albanD
The text was updated successfully, but these errors were encountered: