Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Text Generation] KV Cache internal Deepsparse support #1135

Merged
merged 35 commits into from
Aug 9, 2023

Conversation

SageMoore
Copy link
Contributor

@SageMoore SageMoore commented Jul 20, 2023

Enable internal kv cache support for the engine.

@SageMoore SageMoore requested a review from dbogunowicz July 20, 2023 20:52
@dbogunowicz
Copy link
Contributor

@SageMoore now, since we have a perplexity metric working for LLMs, let's use it to validate the correctness of the internally managed kv cache. Let's huddle today.

@dbogunowicz dbogunowicz changed the title WIP: Fix the kv cache plumbing to the engine [Text Generation] KV Cache internal Deepsparse support Jul 25, 2023
@dbogunowicz dbogunowicz marked this pull request as draft July 25, 2023 07:22
@dbogunowicz dbogunowicz marked this pull request as ready for review August 3, 2023 15:58
@dbogunowicz dbogunowicz changed the base branch from main to tests/damian/llms August 7, 2023 08:40
Base automatically changed from tests/damian/llms to tests/feature/nl_dec_engine August 8, 2023 06:09
Base automatically changed from tests/feature/nl_dec_engine to feature/damian/fb_testing August 8, 2023 06:09
@dbogunowicz dbogunowicz marked this pull request as draft August 8, 2023 15:57
Copy link
Contributor

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall

@dbogunowicz dbogunowicz marked this pull request as ready for review August 9, 2023 08:16
@natuan natuan merged commit e1e10b3 into feature/damian/fb_testing Aug 9, 2023
@natuan natuan deleted the kv-cache-fixes branch August 9, 2023 19:41
@natuan
Copy link
Contributor

natuan commented Aug 9, 2023

Merge after confirming with Damian

bfineran added a commit that referenced this pull request Aug 10, 2023
…rk (#1163)

* Create test_nl_decoder_engine.py

* [Text Generation][Tests] DecoderKVCache (#1154)

* [Text Generation][Tests] NLDecoderEngine (#1155)

* initial commit

* initial commit

* [Text Generation][Tests] Text Generation Pipeline (#1162)

* initial implementation

* problems with multitoken prefill

* almost there...

* finally all tests pass

* just need to change to stub

* fix bad merge

* Make tests work with stub (as much as possible), cleanup test names,  disable heavy tests, include patch for running without causal mask

* use patch from unittest library - remove additional dependency

* Update tests/deepsparse/transformers/pipelines/test_text_generation.py

* clarify todo comment

* [Text Generation]  KV Cache internal Deepsparse support (#1135)

* fix kv cache

* refactor

* add validation pathway

* avx2 support

* initial commit

* initial commit

* initial implementation

* problems with multitoken prefill

* its working

* almost there...

* finally all tests pass

* just need to change to stub

* fix bad merge

* added some tests

* ready for review

* full support

---------

Co-authored-by: dbogunowicz <[email protected]>
Co-authored-by: Damian <[email protected]>

* incomplete string in parametrize

* few nits before the merge

---------

Co-authored-by: Benjamin Fineran <[email protected]>
Co-authored-by: Sage Moore <[email protected]>
bfineran added a commit that referenced this pull request Aug 10, 2023
…itial testing framework (#1172)

* initial commit

* improved logic

* additional improvements

* Update src/deepsparse/transformers/pipelines/text_generation.py

* Update src/deepsparse/utils/onnx.py

Co-authored-by: Benjamin Fineran <[email protected]>

* Update src/deepsparse/utils/onnx.py

Co-authored-by: Benjamin Fineran <[email protected]>

* response to Ben's comments

* finish rebasing

* update user messages + add assertion for safety

* minor improvements before landing

* Fix the helper function that has been broken after a merge

* [Text Generation] Internal KV Cache Support + Initial Testing Framework (#1163)

* Create test_nl_decoder_engine.py

* [Text Generation][Tests] DecoderKVCache (#1154)

* [Text Generation][Tests] NLDecoderEngine (#1155)

* initial commit

* initial commit

* [Text Generation][Tests] Text Generation Pipeline (#1162)

* initial implementation

* problems with multitoken prefill

* almost there...

* finally all tests pass

* just need to change to stub

* fix bad merge

* Make tests work with stub (as much as possible), cleanup test names,  disable heavy tests, include patch for running without causal mask

* use patch from unittest library - remove additional dependency

* Update tests/deepsparse/transformers/pipelines/test_text_generation.py

* clarify todo comment

* [Text Generation]  KV Cache internal Deepsparse support (#1135)

* fix kv cache

* refactor

* add validation pathway

* avx2 support

* initial commit

* initial commit

* initial implementation

* problems with multitoken prefill

* its working

* almost there...

* finally all tests pass

* just need to change to stub

* fix bad merge

* added some tests

* ready for review

* full support

---------

Co-authored-by: dbogunowicz <[email protected]>
Co-authored-by: Damian <[email protected]>

* incomplete string in parametrize

* few nits before the merge

---------

Co-authored-by: Benjamin Fineran <[email protected]>
Co-authored-by: Sage Moore <[email protected]>

---------

Co-authored-by: Benjamin Fineran <[email protected]>
Co-authored-by: Sage Moore <[email protected]>
dbogunowicz added a commit that referenced this pull request Aug 28, 2023
…che logic when internal KV cache management enabled (#1175)

* fix kv cache

* refactor

* add validation pathway

* avx2 support

* initial commit

* initial commit

* initial implementation

* problems with multitoken prefill

* its working

* Create test_nl_decoder_engine.py

* almost there...

* finally all tests pass

* just need to change to stub

* fix bad merge

* added some tests

* ready for review

* [Text Generation][Tests] DecoderKVCache (#1154)

* [Text Generation][Tests] NLDecoderEngine (#1155)

* initial commit

* initial commit

* [Text Generation][Tests] Text Generation Pipeline (#1162)

* initial implementation

* problems with multitoken prefill

* almost there...

* finally all tests pass

* just need to change to stub

* fix bad merge

* Make tests work with stub (as much as possible), cleanup test names,  disable heavy tests, include patch for running without causal mask

* initial commit

* use patch from unittest library - remove additional dependency

* improved logic

* additional improvements

* Update src/deepsparse/transformers/pipelines/text_generation.py

* Update src/deepsparse/utils/onnx.py

Co-authored-by: Benjamin Fineran <[email protected]>

* Update src/deepsparse/utils/onnx.py

Co-authored-by: Benjamin Fineran <[email protected]>

* response to Ben's comments

* finish rebasing

* full support

* Update tests/deepsparse/transformers/pipelines/test_text_generation.py

* initial commit

* clarify todo comment

* update user messages + add assertion for safety

* [Text Generation]  KV Cache internal Deepsparse support (#1135)

* fix kv cache

* refactor

* add validation pathway

* avx2 support

* initial commit

* initial commit

* initial implementation

* problems with multitoken prefill

* its working

* almost there...

* finally all tests pass

* just need to change to stub

* fix bad merge

* added some tests

* ready for review

* full support

---------

Co-authored-by: dbogunowicz <[email protected]>
Co-authored-by: Damian <[email protected]>

* minor improvements before landing

* Fix the helper function that has been broken after a merge

* incomplete string in parametrize

* few nits before the merge

* pass dummy cache if internal cache management supported

* Apply suggestions from code review

* add missing property

* cleaner func

* PR ready

* add timing for KV cache update

* initial commit

* code review comments

* Nit: docstring typo

* nit: docstring style

* fix style

* fix broken test

* fixing bad rebase

---------

Co-authored-by: Sage Moore <[email protected]>
Co-authored-by: Benjamin Fineran <[email protected]>
Co-authored-by: Benjamin <[email protected]>
bfineran added a commit that referenced this pull request Sep 13, 2023
…unning internally (#1195)

* fix kv cache

* refactor

* add validation pathway

* avx2 support

* initial commit

* initial commit

* initial implementation

* problems with multitoken prefill

* its working

* Create test_nl_decoder_engine.py

* almost there...

* finally all tests pass

* just need to change to stub

* fix bad merge

* added some tests

* ready for review

* [Text Generation][Tests] DecoderKVCache (#1154)

* [Text Generation][Tests] NLDecoderEngine (#1155)

* initial commit

* initial commit

* [Text Generation][Tests] Text Generation Pipeline (#1162)

* initial implementation

* problems with multitoken prefill

* almost there...

* finally all tests pass

* just need to change to stub

* fix bad merge

* Make tests work with stub (as much as possible), cleanup test names,  disable heavy tests, include patch for running without causal mask

* initial commit

* use patch from unittest library - remove additional dependency

* improved logic

* additional improvements

* Update src/deepsparse/transformers/pipelines/text_generation.py

* Update src/deepsparse/utils/onnx.py

Co-authored-by: Benjamin Fineran <[email protected]>

* Update src/deepsparse/utils/onnx.py

Co-authored-by: Benjamin Fineran <[email protected]>

* response to Ben's comments

* finish rebasing

* full support

* Update tests/deepsparse/transformers/pipelines/test_text_generation.py

* initial commit

* clarify todo comment

* update user messages + add assertion for safety

* [Text Generation]  KV Cache internal Deepsparse support (#1135)

* fix kv cache

* refactor

* add validation pathway

* avx2 support

* initial commit

* initial commit

* initial implementation

* problems with multitoken prefill

* its working

* almost there...

* finally all tests pass

* just need to change to stub

* fix bad merge

* added some tests

* ready for review

* full support

---------

Co-authored-by: dbogunowicz <[email protected]>
Co-authored-by: Damian <[email protected]>

* minor improvements before landing

* Fix the helper function that has been broken after a merge

* incomplete string in parametrize

* few nits before the merge

* pass dummy cache if internal cache management supported

* Apply suggestions from code review

* add missing property

* cleaner func

* PR ready

* initial commit

* code review comments

* set kv cache inputs to empty arrays (size 0) when running internally

* TEMP: removing inputs filtering by name

* remove obsolete argument

* trying to find a solution

* this is working

* improve documentation

* review comments

* inline comment instead of warning

---------

Co-authored-by: Sage Moore <[email protected]>
Co-authored-by: dbogunowicz <[email protected]>
Co-authored-by: Damian <[email protected]>
Co-authored-by: Luka Govedic <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants