CUDA error (again ! :-)) #554

olivbrau · 2025-01-07T12:41:30Z

Hi,
I've tried SD 1.4 and CUDA backend on 2 configurations :

On my personal computer with RTX 4070, everything works well, thanks to ag2s20150909 and the build https://github.com/ag2s20150909/stable-diffusion.cpp/releases/tag/master-74a21a7
On my working computer, a laptop with RTX A1000, I still get errors that I don't understand :

ggml_cuda_compute_forward: GET_ROWS failed
CUDA error: no kernel image is available for execution on the device
current device: 0, in function ggml_cuda_compute_forward at D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:2174

Here is the full log :

D:\Users\braultoli\Desktop\sd-master-9578fdc-bin-win-avx2-x64\inference_tool_CUDA_2025_01_01>"sd.exe" -m "..\StableDiffusion 1.4 F32\sd-v1-4.ckpt" -p "a cute cat" --sampling-method euler --steps 10 -W 512 -H 512 -s 42 -t 20
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA RTX A1000 Laptop GPU, compute capability 8.6, VMM: yes
[INFO ] stable-diffusion.cpp:195 - loading model from '..\StableDiffusion 1.4 F32\sd-v1-4.ckpt'
[INFO ] model.cpp:891 - load ..\StableDiffusion 1.4 F32\sd-v1-4.ckpt using checkpoint format
ZIP 0, name = archive/data.pkl, dir = archive/
[INFO ] stable-diffusion.cpp:242 - Version: SD 1.x
[INFO ] stable-diffusion.cpp:275 - Weight type: f32
[INFO ] stable-diffusion.cpp:276 - Conditioner weight type: f32
[INFO ] stable-diffusion.cpp:277 - Diffusion model weight type: f32
[INFO ] stable-diffusion.cpp:278 - VAE weight type: f32
|==================================================| 1131/1131 - 0.00it/s←[KKKK
[INFO ] stable-diffusion.cpp:516 - total params memory size = 2719.24MB (VRAM 2719.24MB, RAM 0.00MB): clip 469.44MB(VRAM), unet 2155.33MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:520 - loading model from '..\StableDiffusion 1.4 F32\sd-v1-4.ckpt' completed, taking 9.06s
[INFO ] stable-diffusion.cpp:550 - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:682 - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1235 - apply_loras completed, taking 0.00s
ggml_cuda_compute_forward: GET_ROWS failed
CUDA error: no kernel image is available for execution on the device
current device: 0, in function ggml_cuda_compute_forward at D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:2174
err
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:70: CUDA error

Does anybody have an idea ?

Thanks a lot in advance

Olivier

ClarkChin08 · 2025-01-08T02:35:04Z

Issue with Prebuilt CUDA SD Binary Size Discrepancy

I've noticed a potential issue with the prebuilt CUDA version of stable-diffusion.cpp:

Current release after Nov 30:
Download URL: https://github.com/leejet/stable-diffusion.cpp/releases/download/master-dcf91f9/sd-master-dcf91f9-bin-win-cuda12-x64.zip
File size: 20.2MB
Status: Appears incomplete/incorrect

Previous release (Nov 23):
File size: 137MB
Status: Functioned correctly

This significant size reduction (approximately 85% smaller) suggests that the latest prebuilt binary might be missing essential components or was incorrectly packaged. The properly functioning version should be closer to the 137MB size of the November 23rd release.

Recommendation:
Consider using the November 23rd release until this issue is investigated and resolved, or build from source if possible.

olivbrau · 2025-01-08T12:37:08Z

In fact, the release of 23 Nov is much bigger than the releases of december.
But I've tried and still get another error :

CUDA error: the provided PTX was compiled with an unsupported toolchain.
current device: 0, in function ggml_cuda_compute_forward at D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda.cu:2326
err
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda.cu:102: CUDA error

And moreover, it doesn't explain, why the version of december works well on my other computer (with RTX 4070)

(In fact, it is Flux1Dev that works on this other computer, and the error I mentioned here on the other computer, coincerns SD1.4 (since I've not enough VRAM to Run FluxDev), so my comparison is not perfect)

ag2s20150909 · 2025-01-11T02:03:01Z

RTX A1000 is Ampere GPU architecture source，Maybe you need to change it from 89 to 87. -DCMAKE_CUDA_ARCHITECTURES=89-real to -DCMAKE_CUDA_ARCHITECTURES=87-real or build it in your local machine.

https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#virtual-architecture-feature-list

ag2s20150909 · 2025-01-11T02:17:38Z

It seems to be caused by ggml upstream:ggml-org/ggml@77d37f5
This commit c3eeb66 update ggml but forgot set CMAKE_CUDA_ARCHITECTURES on GitHub Action.

icebearlala · 2025-01-17T13:09:05Z

It seems to be caused by ggml upstream:ggerganov/ggml@77d37f5 This commit c3eeb66 update ggml but forgot set CMAKE_CUDA_ARCHITECTURES on GitHub Action.

so how should we solve this issue?

mpulukkinen mentioned this issue Feb 7, 2025

Once I draw a 1024*1024 image, the program crashes DarthAffe/StableDiffusion.NET#49

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA error (again ! :-)) #554

CUDA error (again ! :-)) #554

olivbrau commented Jan 7, 2025

ClarkChin08 commented Jan 8, 2025 •

edited

Loading

olivbrau commented Jan 8, 2025

ag2s20150909 commented Jan 11, 2025

ag2s20150909 commented Jan 11, 2025

icebearlala commented Jan 17, 2025

CUDA error (again ! :-)) #554

CUDA error (again ! :-)) #554

Comments

olivbrau commented Jan 7, 2025

ClarkChin08 commented Jan 8, 2025 • edited Loading

olivbrau commented Jan 8, 2025

ag2s20150909 commented Jan 11, 2025

ag2s20150909 commented Jan 11, 2025

icebearlala commented Jan 17, 2025

ClarkChin08 commented Jan 8, 2025 •

edited

Loading