-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix vllm:prompt_tokens_total
metric calculation
#2869
Fix vllm:prompt_tokens_total
metric calculation
#2869
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
tests/conftest.py
Outdated
@@ -14,11 +14,9 @@ | |||
|
|||
|
|||
def _read_prompts(filename: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this function has wrong signature. output seems to be List[str]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! fixed.
@simon-mo This is small fix for stats. Safe to merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you for your contribution!
I noticed that the 2nd prompt of Possibly related to: #975 |
I have identified an issue with the
vllm:prompt_tokens_total
counter metric when there are multiple prompts in a batch with different token lengths.The root cause of the problem lies in the metric counting the token length of the longest prompt in the batch multiplied by the number of prompts in the batch (as if the shorter prompts were padded to match the logest).
Code ref:
vllm/vllm/core/scheduler.py
Lines 262 to 263 in 7e45107
This PR resolves this issue by accurately counting the tokens of all the prompts in the batch. A simple unit test has been added to validate the correctness of the counter.
Additionally, I have fixed
_read_prompts()
to read all prompts from a file, rather than the first prompt.