-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA_HOME does not exist, unable to compile CUDA op(s) #48
Comments
Fixed with install cuda toolkit
|
Got new error
|
The issue is that the script cannot recognize the parser.add_argument('--local_rank', type=int, default=-1, help='local rank for distributed training') This has been added in the fix bug commit. You can pull the latest code and try again. Thank you for identifying this potential bug; there was indeed an oversight. Thanks! |
I'm not sure if you have a Chinese background. The format can be referenced as written in the |
(can't type Chinese yet, since I just installed the ubuntu yesterday) Thanks for the fix, after pull the latest codes, I got another issue
|
After I set
Full logs:
|
Everything is working fine, no issues were reproduced. Are you sure there are two GPUs on the device? If not, set 😊 |
Thanks, I follow #26, and not realized the author has 2 GPUs. Now, got
|
You can try adjusting the batch size to 32/16/8 or even 4, and experiment with running it using a batch size smaller than 64. Thank you. If you start it by running |
It is better, after adjust to Later, I adjusted the SWAP to 100GB, but still crash in the middle |
@ozbillwang Another approach is to try setting |
Thanks, still same issue. |
Got this issue when run the command
deepspeed --master_port 29500 --num_gpus=2 1-pretrain.py
Here is the full log
I installed all python packages via Virtualenv .
Notes:
rrequirements.txt
doesn't support latest python 3.12.x, so I have to usepyenv
to install Python 3.11.xnvidia-cuda-toolkit
(CUDA_HOME does not exist, unable to compile CUDA op(s) #48 (comment))export CUDA_VISIBLE_DEVICES=0
(CUDA_HOME does not exist, unable to compile CUDA op(s) #48 (comment))--num_gpus=1
, since I have only one GPUOut of Memory
error, as recommended, feed--batch-size
, but commanddeepspeed
doesn't support--batch-size
yet, so I adjust and run python directly/etc/fstab
killed
in the middle, then recommend to adjustmax_seq_len
to200
in the filemodel/LMConfig.py
The text was updated successfully, but these errors were encountered: