-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cifar-10 example - RuntimeError: Error building extension 'fused_adam' #694
Comments
Hello, For the record I am currently having the same issue with CUDA 10.1 / Ubuntu 18.04 / torch 1.7.1 ! |
I used the trick of changing
|
@Axe-- and @TevenLeScao , sorry that you are having this issue. Unfortunately, I was unable to repro the problem on my side. I have tried to recreate your environment as best possible, please review further below in case I missed a config. So my suggestion to further debug is to build fused_adam during installation instead of JIT. To do this you will need to clone and build DeepSpeed. Specifically, you want to uninstall and build DeepSpeed with the following two commands.
Please let me know how it goes. Thanks! Below is my environment when I installed in JIT-mode in an attempt to repro the issue.
|
I had issues with installation and was following the idea in #629 (comment) to change CUDA from 10.1.105 to 10.1.243 and ended up installing 10.2 instead, which fixed this issue. Sorry, I won't have time to revert to 10.1 to look for the underlying cause, but in any case, that should be an easy fix in the meantime. |
@TevenLeScao, no worries about reverting to 10.1. I am glad you are unblocked, which is the most important thing. From your description it seems the underlying issue is a mismatch in the cuda versions of torch and another component, probably deepspeed. Can you please share the result of ds_report on your working setup? Thanks. |
There it is:
|
Hey, switching to Cuda 10.2 solves this indeed! |
In my case, the same issue happened even after I update cuda to version 10.1.243, and I could not update CUDA 10.2 as my Ubuntu is 14.04 |
Closing this issue, since it is resolved. Please reopen if needed. |
@tjruwase I also running into something similar:
Here are additional details:
Could you share any pointers to resolve this? |
@sayakpaul, your ds_report shows a mismatch in cuda versions. Your deepspeed wheel is built with 10.2, while your cuda installation is 11.0. Can you try building deepspeed from source so that it is compiled with your installed 11.0? |
Sure. Let me do that and get back. |
@tjruwase here's my
I see that it says |
Turned out |
Met this issue when using
|
I'm facing the similar issue. Here is the ds_report:
pip freeze:
NVCC version:
nvidia-smi details:
Any help pls? |
Closing this issue as the original issue is resolved and any new users who encounter issues here should open a new issue and link this one and we would be happy to take a look. |
I'm facing the similar issue. here is the log:
Here's the virtual environment:
Here is the ds_report:
|
Did you manage to resolve the issue? :) |
@Excelsiorl - your error is this:
This looks to be a GCC/build setup error. Can you try reinstalling GCC or resolving that error first if the file does exist on your system? |
hi @Excelsiorl , I had the similar issue & it was resolved after upgrading the cuda version. Also, for cuDNN, I have used following instructions,
Once your setup is finished & cuda path is updated, you can run |
Hey, I was trying out the cifar-10 tutorial (link).
Could you assist with the runtime error.
On executing (run_ds.sh):
Here's ds_report:
Running with CUDA 10.1 on Ubuntu 18/04.
Here's the virtual environment:
The text was updated successfully, but these errors were encountered: