Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] Fix tqdm progress bar when SamplingParams.n > 1 #12428

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yanyc428
Copy link

@yanyc428 yanyc428 commented Jan 25, 2025

This issue occurs when SamplingParams.n > 1. In the recent changes to vLLM, the total of the progress bar was modified to self.llm_engine.get_num_unfinished_requests(), which seems to be the number of prompts multiplied by SamplingParams.n. However, the progress bar still only updates by 1 step at a time, causing it to display incorrectly. For example, if run vllm on a dataset consisting of 1000 examples and set n = 5, tqdm displays as 0-5000, while it actually finishes when tqdm is loaded to 1000. (From #10949)

This PR resolves the issue by updating the progress bar by the number of outputs for each step, ensuring that the progress bar reaches its end when the model finishes generating.

We would like to mention that perhaps changing the total variable to reflect the actual number of prompts would be a better solution. However, we are concerned about causing unintended side effects, so we implemented the current modification instead.

FIX #11519
FIX #10949

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@FrederickVu
Copy link

The proposed fix here will cause issues with the planned implementation of parallel sampling in the V1 engine in PR #10980. Changing the total variable to accurately reflect the number of requests instead of SequenceGroup objects would remedy this, but it will be a more involved bug fix.

The bug was originally caused by PR #9302, namely by the introduction of the ParallelSampleSequenceGroup class and the modification of _add_processed_request() in vllm/engine/llm_engine.py. It should be possible to modify this code to make it functionally similar to the proposed code in #10980 in order to fix the bug without introducing incompatibilities with the tqdm bar between the V0 and V1 engines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: tqdm progress bar seems to be wrong. [Bug]: Mismatch of tqdm when n > 1
2 participants