Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] Fix the fp8 kv_cache check error that occurs when failing to obtain the CUDA version. #4173

Merged
merged 1 commit into from
May 1, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion vllm/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -325,7 +325,8 @@ def _verify_cache_dtype(self) -> None:
elif self.cache_dtype == "fp8":
if not is_hip():
nvcc_cuda_version = get_nvcc_cuda_version()
if nvcc_cuda_version < Version("11.8"):
if nvcc_cuda_version is not None \
and nvcc_cuda_version < Version("11.8"):
Comment on lines 327 to +329
Copy link
Contributor

@chiragjn chiragjn May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

Just curious, why is this check on nvcc and not on libcudart version in the first place?
E.g. torch.version.cuda or some other way?


I was wondering if I have cuda runtime 11.8 without nvcc installed this condition evaluates to False and no error would be raised.

For the docker image it would not matter because it has cuda 12.x

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to check cuda version on libcudart, as vllm-openai images only has cuda runtime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I fixed a bug for removing nvcc dependency: #4666

raise ValueError(
"FP8 is not supported when cuda version is"
"lower than 11.8.")
Expand Down
Loading