TF generate refactor - past without encoder outputs #15944

gante · 2022-03-04T20:29:21Z

What does this PR do?

As discussed in the original TF generate refactor plan (#15562), removes the encoder_outputs from past. In practice, these changes consist mostly in:

Delete the lines flagged by Patrick;
Adapt prepare_inputs_for_generation and _reorder_cache from PT to TF, for each class.

Three important notes:

Beam search was still in the old format, and a few changes there were needed to enable the changes above. They were mostly about how past or encoder_outputs were handled;
Some models have cross_attn_head_mask in prepare_inputs_for_generation, in their PT implementation, but raised errors in TF -> I've deleted it from the function output;
I've run RUN_SLOW=1 pytest -vv tests/model_name/test_modeling_tf_model_name.py for all affected models.

sgugger

I don't know this part of the library enough to give a useful review. Styling is good, but I'll defer to the others for approval :-)

src/transformers/models/bart/modeling_tf_bart.py

src/transformers/models/gpt2/modeling_tf_gpt2.py

patrickvonplaten · 2022-03-05T12:28:39Z

src/transformers/generation_tf_utils.py

+            # the refactored generate, without the encoder outputs in `past`, expects the `encoder_outputs`
+            # variable to contain all (encoder_outputs, encoder_hidden_states, encoder_attentions) in
+            # `prepare_inputs_for_generation`
+            if encoder_hidden_states is not None:


(nit) Why not wrap it into a TFEncoderOutputs class here?

Great question! I tried that, it would be the most sensible change IMO (as the updated generate gets the encoder outputs with return_dict=True). However, a TFEncoderOutputs would make T5 tests fail. At this point, I had 2 options: update TF T5 or write this. Since this PR is mostly about updating the past variable, I thought it would be the path of least resistance.

Happy to change T5 instead :)

patrickvonplaten · 2022-03-05T12:31:20Z

src/transformers/generation_tf_utils.py

-            model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation(
-                input_ids, return_dict_in_generate, model_kwargs
-            )
+            model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation(input_ids, model_kwargs)


(nit) you could maybe put the under the # 4. Prepare ... comment and change the comment to prepare model inputs which will be used for ...

src/transformers/generation_tf_utils.py

patrickvonplaten

Awesome work! Code looks much cleaner now and the prepare_inputs_for_generation_... functions are greatly simplified.

Left 1,2 nits. Finally, it would be nice if you could also update the TF encoder-decoder model templates (copy of TFBart) so that this test doesn't fail.

The fastest way to test these things locally is to do the following:

1. Update the templates similar to how TFBart was updated. Commit your changes
1. Create a new TFBart-like model with the add-new-model command & run tests for the created model.
1. run `git reset --hard`` so that the new model code disappears again
1. If tests are all passing, then you can commit, if not repeat 1-4

Would be nice if @Rocketknight1 could also take a look here

patrickvonplaten · 2022-03-07T23:36:20Z

Let's merge this? cc @Rocketknight1 ?

Rocketknight1

Overall, the TF changes look good to me and I don't see any problems. I -think- after talking to @gante that the change of trimming input_ids to only the last token whenever past is present is okay, but I'm still a bit confused about how that works!

patrickvonplaten · 2022-03-22T11:13:53Z

src/transformers/models/gpt2/modeling_tf_gpt2.py

        # only last token for inputs_ids if past is defined in kwargs
        if past:
            inputs = tf.expand_dims(inputs[:, -1], -1)

-        return {"input_ids": inputs, "past": past, "use_cache": kwargs["use_cache"]}
+        return {"input_ids": inputs, "past_key_values": past, "use_cache": use_cache}


This is not necessary if there is no past input variable name in GPT2

We should revert this I think and maybe deprecate past as an input argument name for all models in a seperate PR :-)

gante added 7 commits March 3, 2022 17:46

Remove packed past from generation_tf_utils; fix tf speech_to_text

6052c8f

Fix bart

6d5c50b

refactored and tested up to ctrl

06a10c2

up to t5 (excluding)

2ef7df0

all models updated, a few tests failing

71f28d7

final tweaks to open the PR

117d0ee

Merge branch 'master' into destroy_past

84867f0

gante requested review from patrickvonplaten, sgugger and Rocketknight1 March 4, 2022 20:30

sgugger reviewed Mar 4, 2022

View reviewed changes

patrickvonplaten reviewed Mar 5, 2022

View reviewed changes

src/transformers/models/bart/modeling_tf_bart.py Show resolved Hide resolved

patrickvonplaten reviewed Mar 5, 2022

View reviewed changes

src/transformers/models/bart/modeling_tf_bart.py Show resolved Hide resolved

patrickvonplaten reviewed Mar 5, 2022

View reviewed changes

src/transformers/models/gpt2/modeling_tf_gpt2.py Show resolved Hide resolved

patrickvonplaten reviewed Mar 5, 2022

View reviewed changes

src/transformers/generation_tf_utils.py Show resolved Hide resolved

patrickvonplaten approved these changes Mar 5, 2022

View reviewed changes

gante added 3 commits March 7, 2022 12:26

PR comments

2526021

update template tests accordingly

14117ac

correct template imports

17230aa

Rocketknight1 approved these changes Mar 8, 2022

View reviewed changes

gante merged commit 70203b5 into huggingface:master Mar 8, 2022

gante deleted the destroy_past branch March 8, 2022 14:46

ydshieh mentioned this pull request Mar 17, 2022

Fix TFBert past_key_values #16230

Closed

This was referenced Mar 18, 2022

Add unpack_inputs decorator for ctrl #16242

Merged

TF - update (vision_)encoder_decoder past variable #16260

Merged

patrickvonplaten reviewed Mar 22, 2022

View reviewed changes

patrickvonplaten mentioned this pull request Mar 22, 2022

TF GPT2: clearer model variable naming with @unpack_inputs #16311

Merged

5 tasks

gante mentioned this pull request Mar 22, 2022

TF - Fix interchangeable past/past_key_values and revert output variable name in GPT2 #16332

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TF generate refactor - past without encoder outputs #15944

TF generate refactor - past without encoder outputs #15944

gante commented Mar 4, 2022 •

edited

Loading

sgugger left a comment

patrickvonplaten Mar 5, 2022 •

edited

Loading

gante Mar 5, 2022

patrickvonplaten Mar 5, 2022

patrickvonplaten left a comment •

edited

Loading

patrickvonplaten commented Mar 7, 2022

Rocketknight1 left a comment

patrickvonplaten Mar 22, 2022

patrickvonplaten Mar 22, 2022 •

edited

Loading

TF generate refactor - past without encoder outputs #15944

TF generate refactor - past without encoder outputs #15944

Conversation

gante commented Mar 4, 2022 • edited Loading

What does this PR do?

sgugger left a comment

Choose a reason for hiding this comment

patrickvonplaten Mar 5, 2022 • edited Loading

Choose a reason for hiding this comment

gante Mar 5, 2022

Choose a reason for hiding this comment

patrickvonplaten Mar 5, 2022

Choose a reason for hiding this comment

patrickvonplaten left a comment • edited Loading

Choose a reason for hiding this comment

patrickvonplaten commented Mar 7, 2022

Rocketknight1 left a comment

Choose a reason for hiding this comment

patrickvonplaten Mar 22, 2022

Choose a reason for hiding this comment

patrickvonplaten Mar 22, 2022 • edited Loading

Choose a reason for hiding this comment

gante commented Mar 4, 2022 •

edited

Loading

patrickvonplaten Mar 5, 2022 •

edited

Loading

patrickvonplaten left a comment •

edited

Loading

patrickvonplaten Mar 22, 2022 •

edited

Loading