Remove obsolete HIP workaround #11080

sARY77 · 2025-01-05T01:11:22Z

I have two identical AMD GPUs and noticed the discrepancy in the free memory reported.
I traced it down to the rocblas_initialize call after which the VRAM usage on one of the GPUs jumps up by 498 MiB.

PR that introduced the workaround:
ROCm Port

According to the discussion of the ROCm issue that made the workaround necessary, it has been resolved a long time ago:
[Bug]: Incorrect results when using GPUs with different architectures

I tested my change on a model that offloads about 20 GiB on each of the GPUs and did not notice any differences.
I also ran the CI and there and there we no failures reported.

Before:
llama_load_model_from_file: using device ROCm0 (Radeon RX 7900 XTX) - 24026 MiB free
llama_load_model_from_file: using device ROCm1 (Radeon RX 7900 XTX) - 24524 MiB free
After:
llama_load_model_from_file: using device ROCm0 (Radeon RX 7900 XTX) - 24524 MiB free
llama_load_model_from_file: using device ROCm1 (Radeon RX 7900 XTX) - 24524 MiB free

JohannesGaessler · 2025-01-05T20:28:46Z

Is there a downside to having this call though? My understanding is that rocBLAS would be initialized anyways once it's being used so you wouldn't be saving any memory.

sARY77 · 2025-01-06T04:19:51Z

Is there a downside to having this call though? My understanding is that rocBLAS would be initialized anyways once it's being used so you wouldn't be saving any memory.

I was able to remove all references to rocBLAS from the code and make files. And llama-cli can still use both my GPUs. Does this mean it's no longer needed?

…IP_workaround

JohannesGaessler · 2025-01-06T22:36:01Z

To my knowledge rocBLAS is still used internally by HIP even if it is not referenced directly. More generally, while the bug seems to have been fixed for v6.0 I have not seen confirmation that it was fixed for v5.7. And I don't think this PR would provide any benefits other than cosmetic ones. So my stance is that the workaround should be kept.

Remove obsolete HIP workaround

9ad2e7d

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jan 5, 2025

Remove more references to rocBLAS

7aba1f9

sARY77 requested a review from ngxson as a code owner January 6, 2025 04:12

github-actions bot added build Compilation issues nix Issues specific to consuming flake.nix, or generally concerned with ❄ Nix-based llama.cpp deployment devops improvements to build systems and github actions labels Jan 6, 2025

Merge remote-tracking branch 'upstream/master' into Remove_obsolete_H…

8d01c89

…IP_workaround

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove obsolete HIP workaround #11080

Remove obsolete HIP workaround #11080

sARY77 commented Jan 5, 2025

JohannesGaessler commented Jan 5, 2025

sARY77 commented Jan 6, 2025

JohannesGaessler commented Jan 6, 2025

Remove obsolete HIP workaround #11080

Are you sure you want to change the base?

Remove obsolete HIP workaround #11080

Conversation

sARY77 commented Jan 5, 2025

JohannesGaessler commented Jan 5, 2025

sARY77 commented Jan 6, 2025

JohannesGaessler commented Jan 6, 2025