[Bugfix] Fix tqdm progress bar when SamplingParams.n > 1 #12428

yanyc428 · 2025-01-25T18:28:34Z

This issue occurs when SamplingParams.n > 1. In the recent changes to vLLM, the total of the progress bar was modified to self.llm_engine.get_num_unfinished_requests(), which seems to be the number of prompts multiplied by SamplingParams.n. However, the progress bar still only updates by 1 step at a time, causing it to display incorrectly. For example, if run vllm on a dataset consisting of 1000 examples and set n = 5, tqdm displays as 0-5000, while it actually finishes when tqdm is loaded to 1000. (From #10949)

This PR resolves the issue by updating the progress bar by the number of outputs for each step, ensuring that the progress bar reaches its end when the model finishes generating.

We would like to mention that perhaps changing the total variable to reflect the actual number of prompts would be a better solution. However, we are concerned about causing unintended side effects, so we implemented the current modification instead.

FIX #11519
FIX #10949

Signed-off-by: Yuchen Yan <[email protected]>

github-actions · 2025-01-25T18:28:45Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

FrederickVu · 2025-01-31T00:24:30Z

The proposed fix here will cause issues with the planned implementation of parallel sampling in the V1 engine in PR #10980. Changing the total variable to accurately reflect the number of requests instead of SequenceGroup objects would remedy this, but it will be a more involved bug fix.

The bug was originally caused by PR #9302, namely by the introduction of the ParallelSampleSequenceGroup class and the modification of _add_processed_request() in vllm/engine/llm_engine.py. It should be possible to modify this code to make it functionally similar to the proposed code in #10980 in order to fix the bug without introducing incompatibilities with the tqdm bar between the V0 and V1 engines.

[Bugfix] Fix tqdm progress bar when SamplingParams.n > 1

a274453

Signed-off-by: Yuchen Yan <[email protected]>

mergify bot added the frontend label Jan 25, 2025

DarkLight1337 requested a review from WoosukKwon January 27, 2025 05:54

FrederickVu mentioned this pull request Jan 30, 2025

[V1][WIP] V1 sampler implements parallel sampling (PR 1/N for parallel sampling support) #10980

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix tqdm progress bar when SamplingParams.n > 1 #12428

[Bugfix] Fix tqdm progress bar when SamplingParams.n > 1 #12428

yanyc428 commented Jan 25, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 25, 2025

FrederickVu commented Jan 31, 2025

[Bugfix] Fix tqdm progress bar when SamplingParams.n > 1 #12428

Are you sure you want to change the base?

[Bugfix] Fix tqdm progress bar when SamplingParams.n > 1 #12428

Conversation

yanyc428 commented Jan 25, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 25, 2025

FrederickVu commented Jan 31, 2025

yanyc428 commented Jan 25, 2025 •

edited by github-actions bot

Loading