Static cache & torch.compile #2969

p-christ · 2024-02-21T20:31:07Z

Does vLLM make use of the new speed up in transformers around using a static cache and torch compile?

https://x.com/art_zucker/status/1758510984631845278?s=46

mgoin · 2024-02-22T17:16:14Z

@p-christ from my understanding StaticCache is an attempt to get something like the pre-allocated KVCache vLLM already has with PagedAttention. I wouldn't expect it to offer improvements to this project then.
There is an item on the roadmap for exploring torch.compile support though, which could be useful to fuse the small operations in models #2681

hmellor closed this as completed Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Static cache & torch.compile #2969

Static cache & torch.compile #2969

p-christ commented Feb 21, 2024

mgoin commented Feb 22, 2024

Static cache & torch.compile #2969

Static cache & torch.compile #2969

Comments

p-christ commented Feb 21, 2024

mgoin commented Feb 22, 2024