Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic Prefix Caching (#2792) might conflict with multi-LoRA (#1804) #3264

Closed
jacobthebanana opened this issue Mar 7, 2024 · 1 comment

Comments

@jacobthebanana
Copy link
Contributor

#2762 Provides a great way to improve efficiency when multiple requests share the same prefix through KV-cache reuse. Nevertheless, the user probably does not want to share KV-cache across two different LoRA adapters since the values would most likely be different.

As the test cases in PR #3263 suggest, the code changes in #2762 might require a bit more work to distinguish between blocks from different LoRA adapters. Previously, #1804 avoided this conflict by including adapter_id in the tuple while generating hashes for prefixes. (source). The fix proposed in #3263 drew inspiration from this approach.

@jacobthebanana
Copy link
Contributor Author

Resolved in #3263.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant