Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Text Generation][KVCacheStorage] TextGenerationPipeline refactor #1254

Merged
merged 12 commits into from
Sep 21, 2023

Conversation

dbogunowicz
Copy link
Contributor

@dbogunowicz dbogunowicz commented Sep 19, 2023

As agreed with the team, the old design for KVCacheSessionStorage was ugly, given the series of recent refactors.
The goal of this PR is to decouple the DecoderKVCache from the NLDecoderEngine. This will allow us to implement the upcoming ChatPipeline(TextGenerationPipeline) much more cleanly.

This is roughly the design envisioned:

image
Testing:

  • successfully ran the LLM testing suite in test_text_generation.py
  • note: did not run eval_downstream. Currently, this pathway is broken. Afaik this is in agreement with @alexm-nm and @dsikka, who are working on landing the eval_downstream refactor.

@@ -670,6 +654,9 @@ def engine_forward(
if streamer is not None:
streamer.end()

if self._debug:
self._debug = dict(kv_cache=session)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

purely for testing purposes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even for debug we won't want to update state at runtime, we should attach this to the returned output schema if possible (does not need to be part of the schema)

@dbogunowicz dbogunowicz changed the title Feature/damian/chat pipeline [Text Generation][KVCacheStorage] TextGenerationPipeline refactor Sep 20, 2023
@dbogunowicz dbogunowicz marked this pull request as ready for review September 20, 2023 12:36
Copy link
Member

@rahul-tuli rahul-tuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactor looks much nicer than original code! GG!

src/deepsparse/transformers/engines/nl_decoder_engine.py Outdated Show resolved Hide resolved
src/deepsparse/transformers/engines/nl_decoder_engine.py Outdated Show resolved Hide resolved
src/deepsparse/transformers/pipelines/text_generation.py Outdated Show resolved Hide resolved
src/deepsparse/transformers/pipelines/text_generation.py Outdated Show resolved Hide resolved
src/deepsparse/transformers/pipelines/text_generation.py Outdated Show resolved Hide resolved
@@ -670,6 +654,9 @@ def engine_forward(
if streamer is not None:
streamer.end()

if self._debug:
self._debug = dict(kv_cache=session)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even for debug we won't want to update state at runtime, we should attach this to the returned output schema if possible (does not need to be part of the schema)

src/deepsparse/transformers/utils/helpers.py Show resolved Hide resolved
Copy link
Member

@rahul-tuli rahul-tuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a few nits!

@bfineran
Copy link
Contributor

failures look unrelated - merging

@bfineran bfineran merged commit fdb5d44 into main Sep 21, 2023
@bfineran bfineran deleted the feature/damian/chat_pipeline branch September 21, 2023 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants