Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs to recommend torch 2.1 #2748

Merged
merged 1 commit into from
Jan 11, 2024
Merged

Update docs to recommend torch 2.1 #2748

merged 1 commit into from
Jan 11, 2024

Conversation

brentyi
Copy link
Collaborator

@brentyi brentyi commented Jan 11, 2024

I couldn't get gsplat to build automatically from a PyPI install on any of my servers with torch 2.0.1. gsplat worked fine when installing from the git repo.

The root cause was that my CUDA libraries, when installed via conda, needed to be linked from $CUDA_HOME/lib and not $CUDA_HOME/lib64. This is fixed in 2.1: pytorch/pytorch#101285

@brentyi
Copy link
Collaborator Author

brentyi commented Jan 11, 2024

For posterity, the compile error I was running into was:

[5/5] c++ bindings.cuda.o forward.cuda.o backward.cuda.o ext.o -shared -L/home/brent/miniconda/envs/nerfstudio/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -
ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/home/brent/miniconda/envs/nerfstudio/lib64 -lcudart -o gsplat_cuda.so 
FAILED: gsplat_cuda.so                                                                                                                                                 
c++ bindings.cuda.o forward.cuda.o backward.cuda.o ext.o -shared -L/home/brent/miniconda/envs/nerfstudio/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch
_cpu -ltorch_cuda -ltorch -ltorch_python -L/home/brent/miniconda/envs/nerfstudio/lib64 -lcudart -o gsplat_cuda.so                                                      
/usr/bin/ld: cannot find -lcudart: No such file or directory                                                                                                           
collect2: error: ld returned 1 exit status                                                                                                                             
ninja: build stopped: subcommand failed.                                                                                                                               

This only happened from the JIT compile that happens when running ns-train gaussian-splatting, and not the install-time setup.py / CppExtension compile. (which follows a different code path in torch)

@brentyi brentyi requested a review from tancik January 11, 2024 21:58
Copy link
Contributor

@tancik tancik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable enough

@brentyi brentyi merged commit 55b7a2c into main Jan 11, 2024
4 checks passed
@brentyi brentyi deleted the brent/recommend-torch-2.1 branch January 11, 2024 23:12
@Guangyun-Xu
Copy link
Contributor

Does the Dockerfile also need to be updated?

@ichsan2895
Copy link

Strangely, FYI,
My Runpod environment run Gsplat successfully with Torch==2.0.1. I will stay in this version despite dump version in project.toml.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants