You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#2762 Provides a great way to improve efficiency when multiple requests share the same prefix through KV-cache reuse. Nevertheless, the user probably does not want to share KV-cache across two different LoRA adapters since the values would most likely be different.
As the test cases in PR #3263 suggest, the code changes in #2762 might require a bit more work to distinguish between blocks from different LoRA adapters. Previously, #1804 avoided this conflict by including adapter_id in the tuple while generating hashes for prefixes. (source). The fix proposed in #3263 drew inspiration from this approach.
The text was updated successfully, but these errors were encountered:
#2762 Provides a great way to improve efficiency when multiple requests share the same prefix through KV-cache reuse. Nevertheless, the user probably does not want to share KV-cache across two different LoRA adapters since the values would most likely be different.
As the test cases in PR #3263 suggest, the code changes in #2762 might require a bit more work to distinguish between blocks from different LoRA adapters. Previously, #1804 avoided this conflict by including adapter_id in the tuple while generating hashes for prefixes. (source). The fix proposed in #3263 drew inspiration from this approach.
The text was updated successfully, but these errors were encountered: