Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: I doubt about the meaning of --enable-prefix-caching #4390

Closed
chenchunhui97 opened this issue Apr 26, 2024 · 4 comments
Closed

[Usage]: I doubt about the meaning of --enable-prefix-caching #4390

chenchunhui97 opened this issue Apr 26, 2024 · 4 comments
Labels
usage How to use vllm

Comments

@chenchunhui97
Copy link

Your current environment

  1. I doubt which version that starting support feat --enable-prefix-caching. (seems v0.3.0 still not support yet, and v0.4.0 has supported it already)
  2. does --enable-prefix-caching mean to prefix system prompt only? or is it an implementation of RadixAttention (https://arxiv.org/abs/2312.07104)? what is the difference between prefix-caching and prefix-sharing (the following implementation)?
if prefix_len != None:
      # start inference
      if prompt_token_ids != None:
          outputs = llm.generate(prompt_token_ids=prompt_token_ids,
                                 sampling_params=sampling_params,
                                 prefix_pos=prefix_len * (len(prompts) // len(prefix_len)))
      else:
          outputs = llm.generate(prompts=prompts,
                                 sampling_params=sampling_params,
                                 prefix_pos=prefix_len * (len(prompts) // len(prefix_len)))
  else:
      outputs = llm.generate(prompts, sampling_params=sampling_params)

How would you like to use vllm

I want to know more details about --enable-prefix-caching and the releated paper.

@chenchunhui97 chenchunhui97 added the usage How to use vllm label Apr 26, 2024
@zhuohan123
Copy link
Member

Please refer to #2614 for the details for now. We will publish a blogpost explaining our design soon. Stay tuned!

@timothylimyl
Copy link

@zhuohan123 look forward to the blog.

@samos123
Copy link
Contributor

samos123 commented Jul 4, 2024

Was the blog post ever published? Please also update the flag documentation. I'm trying to understand what it actually does.

@Playerrrrr
Copy link

Where is the blog? @zhuohan123

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage How to use vllm
Projects
None yet
Development

No branches or pull requests

5 participants