[Bugfix] Fix hard-coded value of x in context_attention_fwd #6373

tdoublep · 2024-07-12T14:05:15Z

The prefix prefill code assumes that x=8 but this is only the case for fp16 (e.g., see this line). In certain scenarios it is useful to be able to use the prefix prefill code with fp32 (e.g., to compare logprobs generated by chunked prefill against HF generate in CI tests). The correct value of x is very easy to extract from the key_cache.

cc @caoshiyi @zhuohan123 @DouHappy

Signed-off-by: Thomas Parnell <[email protected]>

WoosukKwon

LGTM! Thanks for the fix!

…ject#6373) Signed-off-by: Thomas Parnell <[email protected]>

…ject#6373) Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: Alvant <[email protected]>

Fix hard-coded value of x in context_attention_fwd

598bc98

Signed-off-by: Thomas Parnell <[email protected]>

tdoublep mentioned this pull request Jul 12, 2024

[Model] RowParallelLinear: pass bias to quant_method.apply #6327

Merged

WoosukKwon approved these changes Jul 13, 2024

View reviewed changes

WoosukKwon merged commit e1684a7 into vllm-project:main Jul 13, 2024
72 checks passed

dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 17, 2024

[Bugfix] Fix hard-coded value of x in context_attention_fwd (vllm-pro…

8c989e3

…ject#6373) Signed-off-by: Thomas Parnell <[email protected]>

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024

[Bugfix] Fix hard-coded value of x in context_attention_fwd (vllm-pro…

e4693dc

…ject#6373) Signed-off-by: Thomas Parnell <[email protected]>

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[Bugfix] Fix hard-coded value of x in context_attention_fwd (vllm-pro…

5176598

…ject#6373) Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: Alvant <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix hard-coded value of x in context_attention_fwd #6373

[Bugfix] Fix hard-coded value of x in context_attention_fwd #6373

tdoublep commented Jul 12, 2024 •

edited

Loading

WoosukKwon left a comment

[Bugfix] Fix hard-coded value of x in context_attention_fwd #6373

[Bugfix] Fix hard-coded value of x in context_attention_fwd #6373

Conversation

tdoublep commented Jul 12, 2024 • edited Loading

WoosukKwon left a comment

Choose a reason for hiding this comment

tdoublep commented Jul 12, 2024 •

edited

Loading