-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Text Generation] KV Cache internal Deepsparse support #1135
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dbogunowicz
reviewed
Jul 24, 2023
@SageMoore now, since we have a perplexity metric working for LLMs, let's use it to validate the correctness of the internally managed kv cache. Let's huddle today. |
dbogunowicz
changed the title
WIP: Fix the kv cache plumbing to the engine
[Text Generation] KV Cache internal Deepsparse support
Jul 25, 2023
…parse into kv-cache-fixes
…nto tests/feature/nl_dec_engine
…o tests/damian/llms
bfineran
approved these changes
Aug 7, 2023
Base automatically changed from
tests/damian/llms
to
tests/feature/nl_dec_engine
August 8, 2023 06:09
Base automatically changed from
tests/feature/nl_dec_engine
to
feature/damian/fb_testing
August 8, 2023 06:09
bfineran
approved these changes
Aug 8, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall
dbogunowicz
approved these changes
Aug 9, 2023
mgoin
approved these changes
Aug 9, 2023
Merge after confirming with Damian |
4 tasks
bfineran
added a commit
that referenced
this pull request
Aug 10, 2023
…rk (#1163) * Create test_nl_decoder_engine.py * [Text Generation][Tests] DecoderKVCache (#1154) * [Text Generation][Tests] NLDecoderEngine (#1155) * initial commit * initial commit * [Text Generation][Tests] Text Generation Pipeline (#1162) * initial implementation * problems with multitoken prefill * almost there... * finally all tests pass * just need to change to stub * fix bad merge * Make tests work with stub (as much as possible), cleanup test names, disable heavy tests, include patch for running without causal mask * use patch from unittest library - remove additional dependency * Update tests/deepsparse/transformers/pipelines/test_text_generation.py * clarify todo comment * [Text Generation] KV Cache internal Deepsparse support (#1135) * fix kv cache * refactor * add validation pathway * avx2 support * initial commit * initial commit * initial implementation * problems with multitoken prefill * its working * almost there... * finally all tests pass * just need to change to stub * fix bad merge * added some tests * ready for review * full support --------- Co-authored-by: dbogunowicz <[email protected]> Co-authored-by: Damian <[email protected]> * incomplete string in parametrize * few nits before the merge --------- Co-authored-by: Benjamin Fineran <[email protected]> Co-authored-by: Sage Moore <[email protected]>
Merged
4 tasks
bfineran
added a commit
that referenced
this pull request
Aug 10, 2023
…itial testing framework (#1172) * initial commit * improved logic * additional improvements * Update src/deepsparse/transformers/pipelines/text_generation.py * Update src/deepsparse/utils/onnx.py Co-authored-by: Benjamin Fineran <[email protected]> * Update src/deepsparse/utils/onnx.py Co-authored-by: Benjamin Fineran <[email protected]> * response to Ben's comments * finish rebasing * update user messages + add assertion for safety * minor improvements before landing * Fix the helper function that has been broken after a merge * [Text Generation] Internal KV Cache Support + Initial Testing Framework (#1163) * Create test_nl_decoder_engine.py * [Text Generation][Tests] DecoderKVCache (#1154) * [Text Generation][Tests] NLDecoderEngine (#1155) * initial commit * initial commit * [Text Generation][Tests] Text Generation Pipeline (#1162) * initial implementation * problems with multitoken prefill * almost there... * finally all tests pass * just need to change to stub * fix bad merge * Make tests work with stub (as much as possible), cleanup test names, disable heavy tests, include patch for running without causal mask * use patch from unittest library - remove additional dependency * Update tests/deepsparse/transformers/pipelines/test_text_generation.py * clarify todo comment * [Text Generation] KV Cache internal Deepsparse support (#1135) * fix kv cache * refactor * add validation pathway * avx2 support * initial commit * initial commit * initial implementation * problems with multitoken prefill * its working * almost there... * finally all tests pass * just need to change to stub * fix bad merge * added some tests * ready for review * full support --------- Co-authored-by: dbogunowicz <[email protected]> Co-authored-by: Damian <[email protected]> * incomplete string in parametrize * few nits before the merge --------- Co-authored-by: Benjamin Fineran <[email protected]> Co-authored-by: Sage Moore <[email protected]> --------- Co-authored-by: Benjamin Fineran <[email protected]> Co-authored-by: Sage Moore <[email protected]>
dbogunowicz
added a commit
that referenced
this pull request
Aug 28, 2023
…che logic when internal KV cache management enabled (#1175) * fix kv cache * refactor * add validation pathway * avx2 support * initial commit * initial commit * initial implementation * problems with multitoken prefill * its working * Create test_nl_decoder_engine.py * almost there... * finally all tests pass * just need to change to stub * fix bad merge * added some tests * ready for review * [Text Generation][Tests] DecoderKVCache (#1154) * [Text Generation][Tests] NLDecoderEngine (#1155) * initial commit * initial commit * [Text Generation][Tests] Text Generation Pipeline (#1162) * initial implementation * problems with multitoken prefill * almost there... * finally all tests pass * just need to change to stub * fix bad merge * Make tests work with stub (as much as possible), cleanup test names, disable heavy tests, include patch for running without causal mask * initial commit * use patch from unittest library - remove additional dependency * improved logic * additional improvements * Update src/deepsparse/transformers/pipelines/text_generation.py * Update src/deepsparse/utils/onnx.py Co-authored-by: Benjamin Fineran <[email protected]> * Update src/deepsparse/utils/onnx.py Co-authored-by: Benjamin Fineran <[email protected]> * response to Ben's comments * finish rebasing * full support * Update tests/deepsparse/transformers/pipelines/test_text_generation.py * initial commit * clarify todo comment * update user messages + add assertion for safety * [Text Generation] KV Cache internal Deepsparse support (#1135) * fix kv cache * refactor * add validation pathway * avx2 support * initial commit * initial commit * initial implementation * problems with multitoken prefill * its working * almost there... * finally all tests pass * just need to change to stub * fix bad merge * added some tests * ready for review * full support --------- Co-authored-by: dbogunowicz <[email protected]> Co-authored-by: Damian <[email protected]> * minor improvements before landing * Fix the helper function that has been broken after a merge * incomplete string in parametrize * few nits before the merge * pass dummy cache if internal cache management supported * Apply suggestions from code review * add missing property * cleaner func * PR ready * add timing for KV cache update * initial commit * code review comments * Nit: docstring typo * nit: docstring style * fix style * fix broken test * fixing bad rebase --------- Co-authored-by: Sage Moore <[email protected]> Co-authored-by: Benjamin Fineran <[email protected]> Co-authored-by: Benjamin <[email protected]>
bfineran
added a commit
that referenced
this pull request
Sep 13, 2023
…unning internally (#1195) * fix kv cache * refactor * add validation pathway * avx2 support * initial commit * initial commit * initial implementation * problems with multitoken prefill * its working * Create test_nl_decoder_engine.py * almost there... * finally all tests pass * just need to change to stub * fix bad merge * added some tests * ready for review * [Text Generation][Tests] DecoderKVCache (#1154) * [Text Generation][Tests] NLDecoderEngine (#1155) * initial commit * initial commit * [Text Generation][Tests] Text Generation Pipeline (#1162) * initial implementation * problems with multitoken prefill * almost there... * finally all tests pass * just need to change to stub * fix bad merge * Make tests work with stub (as much as possible), cleanup test names, disable heavy tests, include patch for running without causal mask * initial commit * use patch from unittest library - remove additional dependency * improved logic * additional improvements * Update src/deepsparse/transformers/pipelines/text_generation.py * Update src/deepsparse/utils/onnx.py Co-authored-by: Benjamin Fineran <[email protected]> * Update src/deepsparse/utils/onnx.py Co-authored-by: Benjamin Fineran <[email protected]> * response to Ben's comments * finish rebasing * full support * Update tests/deepsparse/transformers/pipelines/test_text_generation.py * initial commit * clarify todo comment * update user messages + add assertion for safety * [Text Generation] KV Cache internal Deepsparse support (#1135) * fix kv cache * refactor * add validation pathway * avx2 support * initial commit * initial commit * initial implementation * problems with multitoken prefill * its working * almost there... * finally all tests pass * just need to change to stub * fix bad merge * added some tests * ready for review * full support --------- Co-authored-by: dbogunowicz <[email protected]> Co-authored-by: Damian <[email protected]> * minor improvements before landing * Fix the helper function that has been broken after a merge * incomplete string in parametrize * few nits before the merge * pass dummy cache if internal cache management supported * Apply suggestions from code review * add missing property * cleaner func * PR ready * initial commit * code review comments * set kv cache inputs to empty arrays (size 0) when running internally * TEMP: removing inputs filtering by name * remove obsolete argument * trying to find a solution * this is working * improve documentation * review comments * inline comment instead of warning --------- Co-authored-by: Sage Moore <[email protected]> Co-authored-by: dbogunowicz <[email protected]> Co-authored-by: Damian <[email protected]> Co-authored-by: Luka Govedic <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Enable internal kv cache support for the engine.