Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: an illegal memory access was encountered #32

Closed
xiao10ma opened this issue Nov 26, 2024 · 5 comments
Closed

Comments

@xiao10ma
Copy link

xiao10ma commented Nov 26, 2024

  File "/data/duantong/mazipei/3DGStream/train_frames.py", line 91, in training_one_frame
    loss += (1.0 - opt.lambda_dssim) * Ll1 + opt.lambda_dssim * (1.0 - ssim(image, gt_image))
                                                                       ^^^^^^^^^^^^^^^^^^^^^
  File "/data/duantong/mazipei/3DGStream/utils/loss_utils.py", line 48, in ssim
    window = window.cuda(img1.get_device())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I used the method in 3DGS's issue: graphdeco-inria/gaussian-splatting#41 (comment)

However, it doesn't work for me.

My env:

OS: Ubuntu 18.04
GPU: RTX 3090
Driver: 520.61.05 
CUDA: 11.8
Python: 3.11.10
Pytorch: 2.0.1+cu118
tinycudann: 1.7

Do you have any idea to deal with it? Thank you!

@SJoJoK
Copy link
Owner

SJoJoK commented Nov 26, 2024

There are many possibilities for this error. Since you are executing the code asynchronously (the default setting), the root cause may not be in this line of code, but in a previous step.
I recommend debugging with CUDA_LAUNCH_BLOCKING=1 to find out the root cause of the problem. In my experience, this is usually due to some nan variable or some unproperly allocated memory.
To be honest, I do encounter this error occasionally in our 3DGStream codebase and some other 3DGS-related repo, but it's not deterministically reproducible, so I just leave it unfixed...

@xiao10ma
Copy link
Author

Okay, thank you for your advice, I'll try it :)

@xiao10ma
Copy link
Author

I'm wondering what the use of bwd_depth is. It seems that it is initialized as false and is not being used.

If it's useless, I may try to pip install another version of diff_gaussian_rasterization. However, I noticed it might be related to tinycudann. Can I safely pip install another version?

@SJoJoK
Copy link
Owner

SJoJoK commented Nov 27, 2024

I'm wondering what the use of bwd_depth is. It seems that it is initialized as false and is not being used.

If it's useless, I may try to pip install another version of diff_gaussian_rasterization. However, I noticed it might be related to tinycudann. Can I safely pip install another version?

Yes, it's a legacy setting in our codebase and is not used in our experiments, so you can safely update it. But you need to modify some related python code to match the input and output.

@xiao10ma
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants