-
Notifications
You must be signed in to change notification settings - Fork 27.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[generate] Fix encoder decoder models attention mask #36018
base: main
Are you sure you want to change the base?
[generate] Fix encoder decoder models attention mask #36018
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing it's broken, should've triggered some slow tests before merging!
I agree that we'd need to pass both attention masks for encoder-decoder models if available. And the attention mask in prepare_inputs
should come from decoder for further manipulations no top
Overall LGTM but would be cleaner if we could pass encoder-mask
from within kwargs
in step 7, imo. CC @gante for one more opinion on generation
agree that having a transformers/src/transformers/generation/utils.py Lines 1957 to 1959 in 9d2056f
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for the fix 🙏
P.S.: can we somehow add a fast test to ensure this doesn't happen again? This is a pretty basic feature, and should be covered by fast tests
@zucchini-nlp As @eustlb wrote, historically the naming convention is for the |
What does this PR do?
#35852 introduced a bug where the attention mask is lost for encoder-decoder models.
This PR implements a naïve fix.
cf #36020