You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@p-christ from my understanding StaticCache is an attempt to get something like the pre-allocated KVCache vLLM already has with PagedAttention. I wouldn't expect it to offer improvements to this project then.
There is an item on the roadmap for exploring torch.compile support though, which could be useful to fuse the small operations in models #2681
Does vLLM make use of the new speed up in transformers around using a static cache and torch compile?
https://x.com/art_zucker/status/1758510984631845278?s=46
The text was updated successfully, but these errors were encountered: