Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KVCache] Support passing in attn_score_scaling_factor into KV cache #16606

Merged
merged 3 commits into from
Feb 20, 2024

Conversation

rickzx
Copy link
Contributor

@rickzx rickzx commented Feb 19, 2024

In GPT-2, attention calculation requires an additional feature scale_attn_by_inverse_layer_idx. It provides a scaling factor per attention layer when calculating the attention score, before applying the softmax function.

This PR supports this additional parameter in KV cache.

@rickzx
Copy link
Contributor Author

rickzx commented Feb 19, 2024

cc: @MasterJH5574 Will need flashinfer-ai/flashinfer#126 to be merged first:

yzh119 pushed a commit to flashinfer-ai/flashinfer that referenced this pull request Feb 19, 2024
In GPT-2, attention calculation requires an additional feature
scale_attn_by_inverse_layer_idx. It provides a scaling factor per
attention layer when calculating the attention score, before applying
the softmax function.

This PR supports this additional parameter in tvm_wrapper.

See: apache/tvm#16606
@MasterJH5574
Copy link
Contributor

Given flashinfer-ai/flashinfer#126 has been merged, let's bump 3rdparty/flashinfer to the latest FlashInfer

@MasterJH5574
Copy link
Contributor

Also there is a format issue https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-lint/detail/PR-16606/3/pipeline

@rickzx rickzx marked this pull request as ready for review February 19, 2024 17:52
@MasterJH5574 MasterJH5574 merged commit 4600002 into apache:main Feb 20, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants