[TextGeneration] max token refactor #1217

dsikka · 2023-08-29T14:41:41Z

For this ticket:

https://app.asana.com/0/1201735099598270/1205276886236972/f

Summary:

Refactors the TextGeneration Constructor to remove the max_tokens argument and makes it part of the pipeline input
Adds num_generated_predictions to the input as well, which dictates the number of sequences that are generated for a given input. Similar to the hugging face implementation, we repeat the input based on the number provided, defaulting to 1. When num_generated_predictions is > 1, the engine's deterministic property is togged to False
In the case where the value is > 1, the output is a list of lists, where each list includes the generated sequences for a given prompt. This updates the sequences output to be one of str, List[str], and List[List[str]]

Testing

Tested locally using the new input arguments
Also added new tests to evaluate the num_generated_predictions

dbogunowicz

The good: much less additional code and complexity that I thought
The bad and ugly: could you add appropriate tests in tests/deepsparse/transformers/pipelines/test_text_generation.py ?

src/deepsparse/transformers/pipelines/text_generation.py

mgoin

Could max_tokens default to the prompt_length - sequence_length so we don't risk running out of kv cache context? I'm not sure what happens there actually, especially when using internal kv cache

dsikka · 2023-09-07T19:45:17Z

Could max_tokens default to the prompt_length - sequence_length so we don't risk running out of kv cache context? I'm not sure what happens there actually, especially when using internal kv cache

@dbogunowicz would like to get your opinion on this

dbogunowicz · 2023-09-08T10:27:07Z

@dsikka this is a very good idea.

src/deepsparse/transformers/pipelines/text_generation.py

- Remove max_generated_tokens from the constructor and add it to the TextGenerationInput Schema - Add num_generated_predictions to the TextGenerationInput which if > 1, repeats the input sequence and turns off deterministic prediction. If a sequence is already provided multiple times, the sequence is not repeated.

dsikka · 2023-09-12T15:38:34Z

Could max_tokens default to the prompt_length - sequence_length so we don't risk running out of kv cache context? I'm not sure what happens there actually, especially when using internal kv cache

Talking to the MLE team, I think for now we want to keep the defaults as is and update them once we've established best practices.

tests/deepsparse/transformers/pipelines/test_text_generation.py

dsikka changed the title ~~Update max token~~ [TextGeneration] max token refactor Aug 29, 2023

dsikka marked this pull request as ready for review August 29, 2023 22:52

dbogunowicz reviewed Aug 30, 2023

View reviewed changes

src/deepsparse/transformers/pipelines/text_generation.py Show resolved Hide resolved

dsikka requested review from Satrat, bfineran and rahul-tuli August 30, 2023 14:43

bfineran previously approved these changes Sep 1, 2023

View reviewed changes

dsikka dismissed bfineran’s stale review via 735d33d September 7, 2023 16:36

dsikka force-pushed the update_max_token branch from d1d7b7a to 735d33d Compare September 7, 2023 16:36

mgoin reviewed Sep 7, 2023

View reviewed changes

dsikka requested review from mgoin, bfineran and dbogunowicz September 7, 2023 19:44

bfineran reviewed Sep 8, 2023

View reviewed changes

src/deepsparse/transformers/pipelines/text_generation.py Outdated Show resolved Hide resolved

src/deepsparse/transformers/pipelines/text_generation.py Outdated Show resolved Hide resolved

src/deepsparse/transformers/pipelines/text_generation.py Outdated Show resolved Hide resolved

dsikka added 7 commits September 8, 2023 12:54

update description

0679c4f

quality

08cb0bb

rebase

57ecd7b

add comment explaining num_generated_predictions

31de38c

update and add tests

ccf127f

rebase

b5de75f

dsikka force-pushed the update_max_token branch from 55d1c08 to b5de75f Compare September 8, 2023 16:55

dsikka added 3 commits September 8, 2023 14:11

move to helpers

da7f7ef

remove extra check for multiple prompts

8fb72c4

Merge branch 'main' into update_max_token

bb54ece

group generated outputs if num_generated_predictions > 1

ecf12f3

dsikka requested a review from bfineran September 12, 2023 16:39

dsikka added 2 commits September 12, 2023 12:48

update docstring

ec90ea6

facepalm: updated input, not output

770f4dc

Satrat previously approved these changes Sep 12, 2023

View reviewed changes

tests/deepsparse/transformers/pipelines/test_text_generation.py Outdated Show resolved Hide resolved

update tests

a0d4dda

dsikka dismissed Satrat’s stale review via a0d4dda September 12, 2023 17:45

Satrat approved these changes Sep 12, 2023

View reviewed changes

bfineran approved these changes Sep 12, 2023

View reviewed changes

dsikka merged commit a49ab47 into main Sep 12, 2023

dsikka deleted the update_max_token branch September 12, 2023 20:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TextGeneration] max token refactor #1217

[TextGeneration] max token refactor #1217

dsikka commented Aug 29, 2023 •

edited

Loading

dbogunowicz left a comment

mgoin left a comment

dsikka commented Sep 7, 2023

dbogunowicz commented Sep 8, 2023

dsikka commented Sep 12, 2023

[TextGeneration] max token refactor #1217

[TextGeneration] max token refactor #1217

Conversation

dsikka commented Aug 29, 2023 • edited Loading

Summary:

Testing

dbogunowicz left a comment

Choose a reason for hiding this comment

mgoin left a comment

Choose a reason for hiding this comment

dsikka commented Sep 7, 2023

dbogunowicz commented Sep 8, 2023

dsikka commented Sep 12, 2023

dsikka commented Aug 29, 2023 •

edited

Loading