[Generate] correct encoder_outputs are passed without attention_mask #14980

patrickvonplaten · 2021-12-29T17:13:36Z

What does this PR do?

Very very edge case is solved in this PR which occurs if "encoder_outputs" is passed in combination without passing "attention_mask", but a model that accepts an "attention_mask".

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

tests/test_generation_utils.py

sgugger

Thanks for fixing!

sgugger · 2021-12-29T18:27:19Z

tests/test_generation_utils.py

+        output_sequences_with_mask = model.generate(
+            encoder_outputs=encoder_outputs, attention_mask=attention_mask
+        ).cpu()


As usual, when formatted like this by black, I'd go for two lines :-)

Narsil

LGTM

The wav2vec2 doesn't seem to belong, if it does maybe a test should explain what the test is actually about.

tests/test_pipelines_automatic_speech_recognition.py

Narsil · 2021-12-29T18:37:03Z

tests/test_pipelines_automatic_speech_recognition.py

+            framework="pt",
+        )
+
+        ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation").sort("id")


Is the sort necessary ?

yeah since it different versions datasets might break the order

This seems like a very odd thing in datasets that order could be messed up by subsequent versions...

@anton-l and I have experienced this now a couple of times so tat we'll just make sure this way

Good to know !

…ckvonplaten/transformers into correct_generate_edge_case

…uggingface#14980) * [Generate] correct encoder_outputs are passed without attention_mask * Apply suggestions from code review * up

[Generate] correct encoder_outputs are passed without attention_mask

71e53f2