New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Text Generation][KVCacheStorage] `TextGenerationPipeline` refactor #1254

Merged

bfineran merged 12 commits into main from feature/damian/chat_pipeline

Sep 21, 2023

Contributor

dbogunowicz commented Sep 19, 2023 •

edited

Loading

As agreed with the team, the old design for KVCacheSessionStorage was ugly, given the series of recent refactors.
The goal of this PR is to decouple the DecoderKVCache from the NLDecoderEngine. This will allow us to implement the upcoming ChatPipeline(TextGenerationPipeline) much more cleanly.

This is roughly the design envisioned:

Testing:

successfully ran the LLM testing suite in test_text_generation.py
note: did not run eval_downstream. Currently, this pathway is broken. Afaik this is in agreement with @alexm-nm and @dsikka, who are working on landing the eval_downstream refactor.

dbogunowicz and others added 11 commits

September 14, 2023 15:20


          initial commit

837905a


          Merge branch 'main' of https://github.com/neuralmagic/deepsparse into…

22534d7

… main


          Merge branch 'main' of https://github.com/neuralmagic/deepsparse into…

d9b35e9

… main


          upload draft for review

803e536


          Merge branch 'main' of https://github.com/neuralmagic/deepsparse into…

4707d02

… main


          initial implementation. testing now

76ae856


          Merge branch 'main' into feature/damian/chat_pipeline

34a9164


          in this form tests pass

44cb44a


          Merge branch 'feature/damian/chat_pipeline' of https://github.com/neu…

2468e77

…ralmagic/deepsparse into feature/damian/chat_pipeline


          cleanup

0b0e75c


          ready for review

412cfc7

dbogunowicz commented

View reviewed changes

src/deepsparse/transformers/pipelines/text_generation.py Outdated

@@ @@ -670,6 +654,9 @@ def engine_forward( @@
                           if streamer is not None:
                               streamer.end()
+                      if self._debug:
+                          self._debug = dict(kv_cache=session)

Contributor Author

dbogunowicz Sep 20, 2023

purely for testing purposes

Contributor

bfineran Sep 20, 2023

even for debug we won't want to update state at runtime, we should attach this to the returned output schema if possible (does not need to be part of the schema)

dbogunowicz changed the title ~~Feature/damian/chat pipeline~~ [Text Generation][KVCacheStorage] TextGenerationPipeline refactor

dbogunowicz marked this pull request as ready for review

September 20, 2023 12:36

dbogunowicz requested review from bfineran, dsikka, Satrat and rahul-tuli

September 20, 2023 12:37

rahul-tuli reviewed

View reviewed changes

Member

rahul-tuli left a comment

The refactor looks much nicer than original code! GG!

src/deepsparse/transformers/pipelines/text_generation.py Show resolved Hide resolved

src/deepsparse/transformers/utils/helpers.py Show resolved Hide resolved

bfineran reviewed

View reviewed changes

src/deepsparse/transformers/engines/nl_decoder_engine.py Outdated Show resolved Hide resolved

src/deepsparse/transformers/engines/nl_decoder_engine.py Outdated Show resolved Hide resolved

src/deepsparse/transformers/pipelines/text_generation.py Outdated Show resolved Hide resolved

src/deepsparse/transformers/pipelines/text_generation.py Outdated Show resolved Hide resolved

src/deepsparse/transformers/pipelines/text_generation.py Outdated Show resolved Hide resolved

src/deepsparse/transformers/pipelines/text_generation.py Outdated

@@ @@ -670,6 +654,9 @@ def engine_forward( @@
                           if streamer is not None:
                               streamer.end()
+                      if self._debug:
+                          self._debug = dict(kv_cache=session)

Contributor

bfineran Sep 20, 2023

even for debug we won't want to update state at runtime, we should attach this to the returned output schema if possible (does not need to be part of the schema)

src/deepsparse/transformers/pipelines/text_generation.py Show resolved Hide resolved

src/deepsparse/transformers/utils/helpers.py Show resolved Hide resolved

Satrat reviewed

View reviewed changes

src/deepsparse/transformers/engines/nl_decoder_engine.py Show resolved Hide resolved

src/deepsparse/transformers/utils/decoder_kv_cache.py Show resolved Hide resolved

src/deepsparse/transformers/engines/nl_decoder_engine.py Outdated Show resolved Hide resolved

src/deepsparse/transformers/engines/nl_decoder_engine.py Show resolved Hide resolved


          PR review changes

86cacef

dbogunowicz requested review from Satrat, bfineran and rahul-tuli

September 21, 2023 06:50

rahul-tuli approved these changes

View reviewed changes

Member

rahul-tuli left a comment

LGTM with a few nits!

src/deepsparse/transformers/engines/nl_decoder_engine.py Show resolved Hide resolved

src/deepsparse/transformers/engines/nl_decoder_engine.py Show resolved Hide resolved

src/deepsparse/transformers/engines/nl_decoder_engine.py Show resolved Hide resolved

src/deepsparse/transformers/utils/decoder_kv_cache.py Show resolved Hide resolved

Satrat approved these changes

View reviewed changes

Contributor

bfineran commented Sep 21, 2023

failures look unrelated - merging

bfineran merged commit fdb5d44 into main

bfineran deleted the feature/damian/chat_pipeline branch

September 21, 2023 17:55

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet