Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Torch is not able to use GPU | Ubuntu | Nvidia GeForce GTX 960M #4950

Open
1 task done
Night3890 opened this issue Nov 22, 2022 · 16 comments
Open
1 task done
Labels
asking-for-help-with-local-system-issues This issue is asking for help related to local system; please offer assistance

Comments

@Night3890
Copy link

Night3890 commented Nov 22, 2022

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits

What happened?

Originally posted in community support but I did not get a response. Given how pervasive this is I think it belongs here anyway, and I haven't seen a good solution for Linux, specifically:

Torch is not able to use GPU

  • Ubuntu Version: "22.04.1 LTS (Jammy Jellyfish)"

  • 3d controller: "NVIDIA Corporation GM107M [GeForce GTX 960M] (rev a2)"

  • VGA compatible controller: "Intel Corporation HD Graphics 530 (rev 06)"

  • Driver: nouveau display driver (changed to nvidia-driver-510)

  • cuda toolkit (11.8.0-1)

Please note: I don't have much experience with python, so please tell me if there is any more information I should post regarding software versions I'm currently running and I will add it

I get the error "Torch is not able to use GPU" when running the command bash <(wget -qO- https://raw.githubusercontent.com/AUTOMATIC1111/stable-diffusion-webui/master/webui.sh).

Steps to reproduce the problem

I passed bash <(wget -qO- https://raw.githubusercontent.com/AUTOMATIC1111/stable-diffusion-webui/master/webui.sh) according to the README.

What should have happened?

I should have been able to run stable diffusion without any problems, or at least been able to open the WebUI

Commit where the problem happens

Commit hash: 98947d1

What platforms do you use to access UI ?

Linux

What browsers do you use to access the UI ?

No response

Command Line Arguments

No additional arguments were passed.

Additional information, context and logs

Here is my terminal output

$ bash <(wget -qO- https://raw.githubusercontent.com/AUTOMATIC1111/stable-diffusion-webui/master/webui.sh)

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye)
################################################################

################################################################
Running on administrator user
################################################################

################################################################
Create and activate python venv
################################################################

################################################################
Launching launch.py...
################################################################
Python 3.10.6 (main, Nov  2 2022, 18:53:38) [GCC 11.3.0]
Commit hash: 98947d173e3f1667eba29c904f681047dea9de90
Traceback (most recent call last):
  File "/home/administrator/stable-diffusion-webui/launch.py", line 255, in <module>
    prepare_enviroment()
  File "/home/administrator/stable-diffusion-webui/launch.py", line 176, in prepare_enviroment
    run_python("import torch; assert torch.cuda.is_available(), 'Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check'")
  File "/home/administrator/stable-diffusion-webui/launch.py", line 58, in run_python
    return run(f'"{python}" -c "{code}"', desc, errdesc)
  File "/home/administrator/stable-diffusion-webui/launch.py", line 34, in run
    raise RuntimeError(message)
RuntimeError: Error running command.
Command: "/home/administrator/stable-diffusion-webui/venv/bin/python3" -c "import torch; assert torch.cuda.is_available(), 'Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check'"
Error code: 1
stdout: <empty>
stderr: Traceback (most recent call last):
  File "<string>", line 1, in <module>
AssertionError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check
@Night3890 Night3890 added the bug-report Report of a bug, yet to be confirmed label Nov 22, 2022
@SoftwareGuy
Copy link

SoftwareGuy commented Nov 22, 2022

It's most likely due to the fact the Intel GPU is GPU 0 and the nVidia GPU is GPU 1, while Torch is looking at GPU 0 instead of GPU 1.

You may need to pass a parameter in the command line arguments so Torch can use the mobile discrete GPU than the integrated CPU GPU.

@Night3890
Copy link
Author

Night3890 commented Nov 22, 2022

It's most likely due to the fact the Intel GPU is GPU 0 and the nVidia GPU is GPU 1, while Torch is looking at GPU 0 instead of GPU 1.

You may need to pass a parameter in the command line arguments so Torch can use the mobile discrete GPU than the integrated CPU GPU.

Thank you I will give this a try. What argument would I use to do this?

I read on stackexchange that this kind of setup would not make a practical difference.

@SoftwareGuy
Copy link

I'm not sure on that one, but it looks like you'll need to either modify the script or do some environmental stuffs. There's something here that might help: https://discuss.pytorch.org/t/how-to-specify-gpu-usage/945

AUTOMATIC1111 might have to do a patch that allows a command line argument to specify what GPU to use. Or maybe they can poll GPUs, find one with CUDA support and use that when booting up the AI drawing model.

@SoftwareGuy
Copy link

SoftwareGuy commented Nov 22, 2022

Wait. I just noticed something, you're running nouveau for your GPU instead of the nvidia drivers.

This is a big no-no as CUDA isn't supported on non-nvidia binary drivers. Install the nvidia driver via Ubuntu's driver manager. Then I think you'll also need to install the CUDA toolkit (a quick google should point you in the right direction for that on Ubuntu).

I use Arch myself, so to work around this issue I would just do pacman -Sy linux-headers nvidia-dkms cuda and call it a day. Of course, Ubuntu is a little different.

@Night3890
Copy link
Author

Night3890 commented Nov 22, 2022

@SoftwareGuy

Wait. I just noticed something, you're running nouveau for your GPU instead of the nvidia drivers.

This is a big no-no as CUDA isn't supported on non-nvidia binary drivers. Install the nvidia driver via Ubuntu's driver manager. Then I think you'll also need to install the CUDA toolkit (a quick google should point you in the right direction for that on Ubuntu).

  • Changed driver to nvidia-driver-510

  • Installed cuda toolkit (11.8.0-1)

Still the same problem...

@DylanLoader
Copy link

@SoftwareGuy

Wait. I just noticed something, you're running nouveau for your GPU instead of the nvidia drivers.
This is a big no-no as CUDA isn't supported on non-nvidia binary drivers. Install the nvidia driver via Ubuntu's driver manager. Then I think you'll also need to install the CUDA toolkit (a quick google should point you in the right direction for that on Ubuntu).

* Changed driver to nvidia-driver-510

* Installed [cuda toolkit](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_network) (11.8.0-1)

Still the same problem...

Did you ever find a solution for this issue? I have a single 3090 and am hitting this error as well

@Night3890
Copy link
Author

@SoftwareGuy

Wait. I just noticed something, you're running nouveau for your GPU instead of the nvidia drivers.
This is a big no-no as CUDA isn't supported on non-nvidia binary drivers. Install the nvidia driver via Ubuntu's driver manager. Then I think you'll also need to install the CUDA toolkit (a quick google should point you in the right direction for that on Ubuntu).

* Changed driver to nvidia-driver-510

* Installed [cuda toolkit](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_network) (11.8.0-1)

Still the same problem...

Did you ever find a solution for this issue? I have a single 3090 and am hitting this error as well

No but I got the Sigil WebUI working. It's installation process has gotten much easier since I tried it last.

In the end, it didn't matter, though. My GeForce GTX 960M GPU isn't strong enough for stable diffusion.

@Night3890
Copy link
Author

@DylanLoader

@SoftwareGuy

Wait. I just noticed something, you're running nouveau for your GPU instead of the nvidia drivers.
This is a big no-no as CUDA isn't supported on non-nvidia binary drivers. Install the nvidia driver via Ubuntu's driver manager. Then I think you'll also need to install the CUDA toolkit (a quick google should point you in the right direction for that on Ubuntu).

* Changed driver to nvidia-driver-510

* Installed [cuda toolkit](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_network) (11.8.0-1)

Still the same problem...

Did you ever find a solution for this issue? I have a single 3090 and am hitting this error as well

No but I got the Sigil WebUI working. It's installation process has gotten much easier since I tried it last.

In the end, it didn't matter, though. My GeForce GTX 960M GPU isn't strong enough for stable diffusion.

@DylanLoader
Copy link

@DylanLoader

@SoftwareGuy

Wait. I just noticed something, you're running nouveau for your GPU instead of the nvidia drivers.
This is a big no-no as CUDA isn't supported on non-nvidia binary drivers. Install the nvidia driver via Ubuntu's driver manager. Then I think you'll also need to install the CUDA toolkit (a quick google should point you in the right direction for that on Ubuntu).

* Changed driver to nvidia-driver-510

* Installed [cuda toolkit](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_network) (11.8.0-1)

Still the same problem...

Did you ever find a solution for this issue? I have a single 3090 and am hitting this error as well

No but I got the Sigil WebUI working. It's installation process has gotten much easier since I tried it last.
In the end, it didn't matter, though. My GeForce GTX 960M GPU isn't strong enough for stable diffusion.

Thanks for following up even though you couldn't get it working.

For anyone else reading this I fixed my issue installing on WSL2 on Windows 11 by deactivating my conda base env, then activating the venv, force installing pytorch and requirements (eg. pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116) then got a new error that was cause by my 768-v-ema.yaml file being saved as a .txt (oops)

Once I got that sorted it started up. Kind of a pain, but worth it once it's running.

@SoftwareGuy
Copy link

I personally do not recommend running GPU AI stuffs under WSL, but you do you I guess.

@DylanLoader
Copy link

I personally do not recommend running GPU AI stuffs under WSL, but you do you I guess.

Feel free to send me a 1TB+ NVME and I will put whatever OS you want on it. WSL2 on W11 is nearly identical to a standalone parallel installation other than the compute overhead from running it within W11.

I'd rather have more storage space and not have to deal with dual-boot on my personal machine for tinkering with SD, since I send any serious research/work to Google VM instances, but you do you I guess.

@wwboynton
Copy link

wwboynton commented Dec 5, 2022

@DylanLoader
For anyone else reading this I fixed my issue installing on WSL2 on Windows 11 by deactivating my conda base env, then activating the venv, force installing pytorch and requirements (eg. pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116) then got a new error that was cause by my 768-v-ema.yaml file being saved as a .txt (oops)

Once I got that sorted it started up. Kind of a pain, but worth it once it's running.

This got me up and running on Windows/WSL2 Ubuntu, thank you :)

@mezotaken mezotaken added asking-for-help-with-local-system-issues This issue is asking for help related to local system; please offer assistance and removed bug-report Report of a bug, yet to be confirmed labels Jan 17, 2023
@mcelia2324
Copy link

Having this same issue now on ubuntu server 22

@JohnTesla
Copy link

I have the same problem on Ubuntu20 x64 with GTX1070, change python ver, torch ver, cuda and drivers ver - this is cannot solve the problem! I ever run it on p3.8 p3.10 p3.11 problem is the same Torch test cannot get answer from GPU (no id devices)

@JohnTesla
Copy link

This is a great solution (Linux SD installation) - https://hub.tcno.co/ai/stable-diffusion/automatic1111-fast/
But also this is not works too...

@JohnTesla
Copy link

....ubuntu-webui/env/lib/python3.8/site-packages/torch/cuda/init.py:107: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 802: system not yet initialized (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
asking-for-help-with-local-system-issues This issue is asking for help related to local system; please offer assistance
Projects
None yet
Development

No branches or pull requests

7 participants