Automatic Prefix Caching (#2792) might conflict with multi-LoRA (#1804) #3264

jacobthebanana · 2024-03-07T21:55:55Z

#2762 Provides a great way to improve efficiency when multiple requests share the same prefix through KV-cache reuse. Nevertheless, the user probably does not want to share KV-cache across two different LoRA adapters since the values would most likely be different.

As the test cases in PR #3263 suggest, the code changes in #2762 might require a bit more work to distinguish between blocks from different LoRA adapters. Previously, #1804 avoided this conflict by including adapter_id in the tuple while generating hashes for prefixes. (source). The fix proposed in #3263 drew inspiration from this approach.

jacobthebanana · 2024-03-08T01:58:11Z

Resolved in #3263.

jacobthebanana mentioned this issue Mar 7, 2024

Possible fix for conflict between Automated Prefix Caching (#2762) and multi-LoRA support (#1804) #3263

Merged

jacobthebanana closed this as completed Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic Prefix Caching (#2792) might conflict with multi-LoRA (#1804) #3264

Automatic Prefix Caching (#2792) might conflict with multi-LoRA (#1804) #3264

jacobthebanana commented Mar 7, 2024

jacobthebanana commented Mar 8, 2024

Automatic Prefix Caching (#2792) might conflict with multi-LoRA (#1804) #3264

Automatic Prefix Caching (#2792) might conflict with multi-LoRA (#1804) #3264

Comments

jacobthebanana commented Mar 7, 2024

jacobthebanana commented Mar 8, 2024