Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error (again ! :-)) #554

Open
olivbrau opened this issue Jan 7, 2025 · 5 comments
Open

CUDA error (again ! :-)) #554

olivbrau opened this issue Jan 7, 2025 · 5 comments

Comments

@olivbrau
Copy link

olivbrau commented Jan 7, 2025

Hi,
I've tried SD 1.4 and CUDA backend on 2 configurations :

  1. On my personal computer with RTX 4070, everything works well, thanks to ag2s20150909 and the build https://github.com/ag2s20150909/stable-diffusion.cpp/releases/tag/master-74a21a7

  2. On my working computer, a laptop with RTX A1000, I still get errors that I don't understand :

ggml_cuda_compute_forward: GET_ROWS failed
CUDA error: no kernel image is available for execution on the device
current device: 0, in function ggml_cuda_compute_forward at D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:2174

Here is the full log :

D:\Users\braultoli\Desktop\sd-master-9578fdc-bin-win-avx2-x64\inference_tool_CUDA_2025_01_01>"sd.exe" -m "..\StableDiffusion 1.4 F32\sd-v1-4.ckpt" -p "a cute cat" --sampling-method euler --steps 10 -W 512 -H 512 -s 42 -t 20
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA RTX A1000 Laptop GPU, compute capability 8.6, VMM: yes
[INFO ] stable-diffusion.cpp:195 - loading model from '..\StableDiffusion 1.4 F32\sd-v1-4.ckpt'
[INFO ] model.cpp:891 - load ..\StableDiffusion 1.4 F32\sd-v1-4.ckpt using checkpoint format
ZIP 0, name = archive/data.pkl, dir = archive/
[INFO ] stable-diffusion.cpp:242 - Version: SD 1.x
[INFO ] stable-diffusion.cpp:275 - Weight type: f32
[INFO ] stable-diffusion.cpp:276 - Conditioner weight type: f32
[INFO ] stable-diffusion.cpp:277 - Diffusion model weight type: f32
[INFO ] stable-diffusion.cpp:278 - VAE weight type: f32
|==================================================| 1131/1131 - 0.00it/s←[KKKK
[INFO ] stable-diffusion.cpp:516 - total params memory size = 2719.24MB (VRAM 2719.24MB, RAM 0.00MB): clip 469.44MB(VRAM), unet 2155.33MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:520 - loading model from '..\StableDiffusion 1.4 F32\sd-v1-4.ckpt' completed, taking 9.06s
[INFO ] stable-diffusion.cpp:550 - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:682 - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1235 - apply_loras completed, taking 0.00s
ggml_cuda_compute_forward: GET_ROWS failed
CUDA error: no kernel image is available for execution on the device
current device: 0, in function ggml_cuda_compute_forward at D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:2174
err
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:70: CUDA error

Does anybody have an idea ?

Thanks a lot in advance

Olivier

@ClarkChin08
Copy link

ClarkChin08 commented Jan 8, 2025

Issue with Prebuilt CUDA SD Binary Size Discrepancy

I've noticed a potential issue with the prebuilt CUDA version of stable-diffusion.cpp:

Current release after Nov 30:
Download URL: https://github.com/leejet/stable-diffusion.cpp/releases/download/master-dcf91f9/sd-master-dcf91f9-bin-win-cuda12-x64.zip
File size: 20.2MB
Status: Appears incomplete/incorrect

Previous release (Nov 23):
File size: 137MB
Status: Functioned correctly

This significant size reduction (approximately 85% smaller) suggests that the latest prebuilt binary might be missing essential components or was incorrectly packaged. The properly functioning version should be closer to the 137MB size of the November 23rd release.

Recommendation:
Consider using the November 23rd release until this issue is investigated and resolved, or build from source if possible.

@olivbrau
Copy link
Author

olivbrau commented Jan 8, 2025

In fact, the release of 23 Nov is much bigger than the releases of december.
But I've tried and still get another error :

CUDA error: the provided PTX was compiled with an unsupported toolchain.
current device: 0, in function ggml_cuda_compute_forward at D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda.cu:2326
err
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda.cu:102: CUDA error

And moreover, it doesn't explain, why the version of december works well on my other computer (with RTX 4070)

(In fact, it is Flux1Dev that works on this other computer, and the error I mentioned here on the other computer, coincerns SD1.4 (since I've not enough VRAM to Run FluxDev), so my comparison is not perfect)

@ag2s20150909
Copy link
Contributor

RTX A1000 is Ampere GPU architecture source,Maybe you need to change it from 89 to 87. -DCMAKE_CUDA_ARCHITECTURES=89-real to -DCMAKE_CUDA_ARCHITECTURES=87-real or build it in your local machine.

https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#virtual-architecture-feature-list

@ag2s20150909
Copy link
Contributor

It seems to be caused by ggml upstream:ggml-org/ggml@77d37f5
This commit c3eeb66 update ggml but forgot set CMAKE_CUDA_ARCHITECTURES on GitHub Action.

@icebearlala
Copy link

It seems to be caused by ggml upstream:ggerganov/ggml@77d37f5 This commit c3eeb66 update ggml but forgot set CMAKE_CUDA_ARCHITECTURES on GitHub Action.

so how should we solve this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants