[Text Generation] Turn off the (currently) inefficient external KV cache logic when internal KV cache management enabled #1175

dbogunowicz · 2023-08-09T12:17:34Z

Having an external KV cache loop running in parallel to the internal KV cache logic (if enabled) does not make much sense.

It slows down our most efficient pipeline mode
It could be potentially used for testing. But as of now, the cache values returned by the internal kv cache are nonsensical, so we do not have ground truth to compare the external cache with.

This PR has been tested using our internal (modest) LLM testing harness.

In general, this PR roughly accelerates inference by 25%

import time
from deepsparse import Pipeline

opt = Pipeline.create(task="opt",
                      model_path="/home/ubuntu/damian/sparseml/deployment_opt",
                      max_generated_tokens = 512,
                      prompt_processing_sequence_length = 3,
                      use_deepsparse_cache =False, # or True
)
counter = 0
for n in range(100):
    start = time.time()
    output = opt(sequences=prompts[0])
    counter += (time.time() - start)
print(prompts[0] + output.sequences[0])
print(f"Time taken: {counter/n} seconds")

Time taken: 21.29 seconds # use_deepsparse_case = False

Time taken: 15.99 seconds # use_deepsparse_case = True

Edit: confirmed by a more exhaustive investigation

…parse into kv-cache-fixes

…_cache

…nto tests/feature/nl_dec_engine

…o tests/damian/llms

…-fixes

…_cache

…timize_update_kv_cache

rahul-tuli

LGTM!

…hub.com/neuralmagic/deepsparse into feature/damian/optimize_update_kv_cache

The base branch was changed.

…hub.com/neuralmagic/deepsparse into feature/damian/optimize_update_kv_cache

rahul-tuli

(n+1)st time is the charm 🚀

src/deepsparse/transformers/engines/nl_decoder_engine.py

SageMoore and others added 30 commits July 20, 2023 16:48

fix kv cache

0bcf1ea

Merge branch 'main' into kv-cache-fixes

d9487bc

refactor

1485478

add validation pathway

23c97c8

avx2 support

2deb33b

add import

4526499

Merge remote-tracking branch 'origin/main' into kv-cache-fixes

1898a56

initial commit

f41689a

initial commit

4ed646a

initial implementation

7f34062

Merge branch 'main' into kv-cache-fixes

817a1fa

problems with multitoken prefill

95b0082

Merge branch 'main' into kv-cache-fixes

db566c9

its working

ebf76fc

Merge branch 'kv-cache-fixes' of https://github.com/neuralmagic/deeps…

c8e54b8

…parse into kv-cache-fixes

Create test_nl_decoder_engine.py

bc0e010

Merge branch 'feature/damian/fb_testing' into tests/damian/decoder_kv…

36c6664

…_cache

almost there...

2353cb2

Merge branch 'main' into kv-cache-fixes

124a922

finally all tests pass

aac5b85

just need to change to stub

7ee2577

Merge remote-tracking branch 'origin/tests/damian/decoder_kv_cache' i…

ef38160

…nto tests/feature/nl_dec_engine

Merge remote-tracking branch 'origin/tests/feature/nl_dec_engine' int…

21b9456

…o tests/damian/llms

fix bad merge

caef2f7

Merge branch 'main' into kv-cache-fixes

ffdc7fb

Merge remote-tracking branch 'origin/tests/damian/llms' into kv-cache…

8d98f73

…-fixes

added some tests

f6b6807

ready for review

a80c46d

Merge branch 'main' into feature/damian/fb_testing

d752f53

Merge branch 'feature/damian/fb_testing' into tests/damian/decoder_kv…

c055873

…_cache

dbogunowicz and others added 3 commits August 23, 2023 18:28

Merge branch 'feature/damian/optimize_decoder' into feature/damian/op…

26f1e64

…timize_update_kv_cache

fix style

24f3a87

Merge branch 'feature/damian/optimize_decoder' into feature/damian/op…

ce150cc

…timize_update_kv_cache

rahul-tuli previously approved these changes Aug 23, 2023

View reviewed changes

Merge branch 'feature/damian/optimize_update_kv_cache' of https://git…

d1b3f5e

…hub.com/neuralmagic/deepsparse into feature/damian/optimize_update_kv_cache

bfineran previously approved these changes Aug 24, 2023

View reviewed changes

Base automatically changed from feature/damian/optimize_decoder to main August 24, 2023 13:17

Merge branch 'main' into feature/damian/optimize_update_kv_cache

921f2f5

dbogunowicz requested review from bfineran and rahul-tuli August 24, 2023 13:17

dbogunowicz and others added 3 commits August 24, 2023 15:20

Merge branch 'main' into feature/damian/optimize_update_kv_cache

c90a06b

Merge branch 'feature/damian/optimize_update_kv_cache' of https://git…

4c34f1d

…hub.com/neuralmagic/deepsparse into feature/damian/optimize_update_kv_cache

fix broken test

519bf1b

rahul-tuli previously approved these changes Aug 24, 2023

View reviewed changes

bfineran previously approved these changes Aug 24, 2023

View reviewed changes

dbogunowicz dismissed stale reviews from bfineran and rahul-tuli via 5bf6cf4 August 25, 2023 07:55

dbogunowicz force-pushed the feature/damian/optimize_update_kv_cache branch 2 times, most recently from 227ebe0 to 519bf1b Compare August 25, 2023 07:58

dbogunowicz and others added 3 commits August 25, 2023 08:02

fixing bad rebase

7da933b

Merge remote-tracking branch 'origin/main' into HEAD

41ef43c

Merge branch 'main' into feature/damian/optimize_update_kv_cache

a43b063

Satrat reviewed Aug 28, 2023

View reviewed changes

src/deepsparse/transformers/engines/nl_decoder_engine.py Show resolved Hide resolved

Satrat approved these changes Aug 28, 2023

View reviewed changes

Merge branch 'main' into feature/damian/optimize_update_kv_cache

b7f03fc

bfineran approved these changes Aug 28, 2023

View reviewed changes

dbogunowicz merged commit a6d46be into main Aug 28, 2023

dbogunowicz deleted the feature/damian/optimize_update_kv_cache branch August 28, 2023 14:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Text Generation] Turn off the (currently) inefficient external KV cache logic when internal KV cache management enabled #1175

[Text Generation] Turn off the (currently) inefficient external KV cache logic when internal KV cache management enabled #1175

dbogunowicz commented Aug 9, 2023 •

edited

Loading

rahul-tuli left a comment

rahul-tuli left a comment

[Text Generation] Turn off the (currently) inefficient external KV cache logic when internal KV cache management enabled #1175

[Text Generation] Turn off the (currently) inefficient external KV cache logic when internal KV cache management enabled #1175

Conversation

dbogunowicz commented Aug 9, 2023 • edited Loading

rahul-tuli left a comment

Choose a reason for hiding this comment

rahul-tuli left a comment

Choose a reason for hiding this comment

dbogunowicz commented Aug 9, 2023 •

edited

Loading