[KVCache] Support passing in attn_score_scaling_factor into KV cache #16606

rickzx · 2024-02-19T01:43:03Z

In GPT-2, attention calculation requires an additional feature scale_attn_by_inverse_layer_idx. It provides a scaling factor per attention layer when calculating the attention score, before applying the softmax function.

This PR supports this additional parameter in KV cache.

rickzx · 2024-02-19T01:45:39Z

cc: @MasterJH5574 Will need flashinfer-ai/flashinfer#126 to be merged first:

…cache

In GPT-2, attention calculation requires an additional feature scale_attn_by_inverse_layer_idx. It provides a scaling factor per attention layer when calculating the attention score, before applying the softmax function. This PR supports this additional parameter in tvm_wrapper. See: apache/tvm#16606

MasterJH5574 · 2024-02-19T16:21:04Z

Given flashinfer-ai/flashinfer#126 has been merged, let's bump 3rdparty/flashinfer to the latest FlashInfer

MasterJH5574 · 2024-02-19T16:21:38Z

Also there is a format issue https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-lint/detail/PR-16606/3/pipeline

rickzx marked this pull request as draft February 19, 2024 01:43

rickzx mentioned this pull request Feb 19, 2024

Passing in attn_score_scaling_factor into tvm_wrapper flashinfer-ai/flashinfer#126

Merged

[KVCache] Support passing in attn_score_scaling_factor into paged KV …

04e0512

…cache

rickzx force-pushed the mlc branch from d0d4b41 to 04e0512 Compare February 19, 2024 01:57

fix lint errors

fffb7c9

MasterJH5574 mentioned this pull request Feb 19, 2024

[KVCache] Migrate GPT-2 model to PagedKVCache, add support for attention score scaling in PagedKVCache mlc-ai/mlc-llm#1784

Merged

Fix lint issue and update flashinfer

e3c606f

rickzx marked this pull request as ready for review February 19, 2024 17:52

MasterJH5574 approved these changes Feb 20, 2024

View reviewed changes

MasterJH5574 merged commit 4600002 into apache:main Feb 20, 2024
16 checks passed

ysh329 mentioned this pull request Apr 21, 2024

[Release] v0.16.0 Release Candidate Notes #16911

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KVCache] Support passing in attn_score_scaling_factor into KV cache #16606

[KVCache] Support passing in attn_score_scaling_factor into KV cache #16606

rickzx commented Feb 19, 2024

rickzx commented Feb 19, 2024

MasterJH5574 commented Feb 19, 2024

MasterJH5574 commented Feb 19, 2024

[KVCache] Support passing in attn_score_scaling_factor into KV cache #16606

[KVCache] Support passing in attn_score_scaling_factor into KV cache #16606

Conversation

rickzx commented Feb 19, 2024

rickzx commented Feb 19, 2024

MasterJH5574 commented Feb 19, 2024

MasterJH5574 commented Feb 19, 2024