Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove obsolete HIP workaround #11080

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

sARY77
Copy link

@sARY77 sARY77 commented Jan 5, 2025

I have two identical AMD GPUs and noticed the discrepancy in the free memory reported.
I traced it down to the rocblas_initialize call after which the VRAM usage on one of the GPUs jumps up by 498 MiB.

PR that introduced the workaround:
ROCm Port

According to the discussion of the ROCm issue that made the workaround necessary, it has been resolved a long time ago:
[Bug]: Incorrect results when using GPUs with different architectures

I tested my change on a model that offloads about 20 GiB on each of the GPUs and did not notice any differences.
I also ran the CI and there and there we no failures reported.

Before:
llama_load_model_from_file: using device ROCm0 (Radeon RX 7900 XTX) - 24026 MiB free
llama_load_model_from_file: using device ROCm1 (Radeon RX 7900 XTX) - 24524 MiB free
After:
llama_load_model_from_file: using device ROCm0 (Radeon RX 7900 XTX) - 24524 MiB free
llama_load_model_from_file: using device ROCm1 (Radeon RX 7900 XTX) - 24524 MiB free

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jan 5, 2025
@JohannesGaessler
Copy link
Collaborator

Is there a downside to having this call though? My understanding is that rocBLAS would be initialized anyways once it's being used so you wouldn't be saving any memory.

@sARY77 sARY77 requested a review from ngxson as a code owner January 6, 2025 04:12
@github-actions github-actions bot added build Compilation issues nix Issues specific to consuming flake.nix, or generally concerned with ❄ Nix-based llama.cpp deployment devops improvements to build systems and github actions labels Jan 6, 2025
@sARY77
Copy link
Author

sARY77 commented Jan 6, 2025

Is there a downside to having this call though? My understanding is that rocBLAS would be initialized anyways once it's being used so you wouldn't be saving any memory.

I was able to remove all references to rocBLAS from the code and make files. And llama-cli can still use both my GPUs. Does this mean it's no longer needed?

@JohannesGaessler
Copy link
Collaborator

To my knowledge rocBLAS is still used internally by HIP even if it is not referenced directly. More generally, while the bug seems to have been fixed for v6.0 I have not seen confirmation that it was fixed for v5.7. And I don't think this PR would provide any benefits other than cosmetic ones. So my stance is that the workaround should be kept.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Compilation issues devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning nix Issues specific to consuming flake.nix, or generally concerned with ❄ Nix-based llama.cpp deployment Nvidia GPU Issues specific to Nvidia GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants