-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Does paged attention demonstrate prefix sharing? #2354
Comments
Same question. Is there any update? |
Is it related to the PR? |
Thanks! @franklyd, but is there any detailed document/API regarding this mechanism? For example, how exactly they store the prefixes, how long it gonna lasts, how to match, etc.. New to vllm here :) |
I believe this #2614 issue can resolve your question! (it is also merged yesterday) |
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you! |
Reading https://arxiv.org/abs/2311.04934 and wondering if I would gain anything from prompt cache.
My use case is having prompts with overlaping prefixes (mostly a few big ones). And I already use vllm paged attention.
Assuming I would only want to cache kv states for prefixes (not positioned anywhere like in the paper).
Would there be any gains in caching attention prefix states, or is paged attention and vllm indeed already doing this?
Paper:
Goal:
So do we with paged attention already skip the attention for the shared inputs, or is there anything to be gainend from
additionally caching prefix kvs?
If it already caches across requests, what is the mechanism that keeps kv-cache entries from busting?
Wondering if there are still potential tweaks to make to make sure certain prefixes stay in
kv-cache
.The text was updated successfully, but these errors were encountered: