Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to install nvdiffrast on GeForce RTX 3090. #56

Closed
zoe2718 opened this issue Nov 29, 2021 · 7 comments
Closed

Unable to install nvdiffrast on GeForce RTX 3090. #56

zoe2718 opened this issue Nov 29, 2021 · 7 comments

Comments

@zoe2718
Copy link

zoe2718 commented Nov 29, 2021

Environment:
1638178319(1)
cuda 11.2

I have tried pytorch:1.8.0-cuda11.1-cudnn8 and pytorch:1.7.1-cuda11.0-cudnn8, but both are failed.
I use the provided Dockerfile with only pytorch and cuda version changed.
The command bash ./run_sample.sh --build-container (or docker build -f docker/Dockerfile -t name:tagname .) can be executed successfully, but after that the nvdiffrast is still not installed (when import nvdiffrast.torch, raise ModuleNotFoundError: No module named 'nvdiffrast.torch').

I have successfully installed nvdiffrast with the same steps on 2080ti GPU+cuda10.2, but failed on 3090 GPU+cuda11.2.
Is there anyone know how to install nvdiffrast on 3090 GPU? Thanks.

@zoe2718
Copy link
Author

zoe2718 commented Nov 29, 2021

Here is the command line output when running ./run_sample.sh --build-container:

Sending build context to Docker daemon 11.36MB
Step 1/14 : ARG BASE_IMAGE=pytorch/pytorch:1.7.1-cuda11.0-cudnn8-devel
Step 2/14 : FROM $BASE_IMAGE
---> 7554ac65eba5
Step 3/14 : RUN apt-get update && apt-get install -y --no-install-recommends pkg-config libglvnd0 libgl1 libglx0 libegl1 libgles2 libglvnd-dev libgl1-mesa-dev libegl1-mesa-dev libgles2-mesa-dev cmake curl
---> Using cache
---> 2021ade4a5c8
Step 4/14 : ENV PYTHONDONTWRITEBYTECODE=1
---> Using cache
---> ca9221b6d071
Step 5/14 : ENV PYTHONUNBUFFERED=1
---> Using cache
---> ec3e675141ce
Step 6/14 : ENV LD_LIBRARY_PATH /usr/lib64:$LD_LIBRARY_PATH
---> Using cache
---> 54956580fb6d
Step 7/14 : ENV NVIDIA_VISIBLE_DEVICES all
---> Using cache
---> 400b43470c33
Step 8/14 : ENV NVIDIA_DRIVER_CAPABILITIES compute,utility,graphics
---> Using cache
---> 1bead4e2f9e5
Step 9/14 : ENV PYOPENGL_PLATFORM egl
---> Using cache
---> 2ac6364927ab
Step 10/14 : COPY docker/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json
---> Using cache
---> f7bcbcf17535
Step 11/14 : RUN pip install ninja imageio imageio-ffmpeg
---> Running in 40f85408f32e
Collecting ninja
Downloading ninja-1.10.2.3-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB)
Collecting imageio
Downloading imageio-2.13.0-py3-none-any.whl (3.3 MB)
Collecting imageio-ffmpeg
Downloading imageio_ffmpeg-0.4.5-py3-none-manylinux2010_x86_64.whl (26.9 MB)
Collecting pillow>=8.3.2
Downloading Pillow-8.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
Requirement already satisfied: numpy in /opt/conda/lib/python3.8/site-packages (from imageio) (1.19.2)
Installing collected packages: ninja, pillow, imageio, imageio-ffmpeg
Attempting uninstall: pillow
Found existing installation: Pillow 8.1.0
Uninstalling Pillow-8.1.0:
Successfully uninstalled Pillow-8.1.0
Successfully installed imageio-2.13.0 imageio-ffmpeg-0.4.5 ninja-1.10.2.3 pillow-8.4.0
Removing intermediate container 40f85408f32e
---> 7d9a9bf8ff95
Step 12/14 : COPY nvdiffrast /tmp/pip/nvdiffrast/
---> e07b7ff78278
Step 13/14 : COPY README.md setup.py /tmp/pip/
---> a985488b664c
Step 14/14 : RUN cd /tmp/pip && pip install .
---> Running in bafb10e7d7e5
Processing /tmp/pip
Requirement already satisfied: numpy in /opt/conda/lib/python3.8/site-packages (from nvdiffrast==0.2.7) (1.19.2)
Building wheels for collected packages: nvdiffrast
Building wheel for nvdiffrast (setup.py): started
Building wheel for nvdiffrast (setup.py): finished with status 'done'
Created wheel for nvdiffrast: filename=nvdiffrast-0.2.7-py3-none-any.whl size=92264 sha256=bcb73fab8d4628893c608442ab57cde8fc5cddd963469b2b545bba14b5533e71
Stored in directory: /tmp/pip-ephem-wheel-cache-vkr9plma/wheels/3c/e6/6c/927e643f0816c802008017bea0b43743b6e13629535e616820
Successfully built nvdiffrast
Installing collected packages: nvdiffrast
Successfully installed nvdiffrast-0.2.7
Removing intermediate container bafb10e7d7e5
---> e5fd4265b280
Successfully built e5fd4265b280
Successfully tagged name:tagname

No python sample given or file '' not found. Exiting.

@zoe2718
Copy link
Author

zoe2718 commented Nov 29, 2021

If I use the command pip install ., nvdiffrast can be installed, but an error will be reported when executing glctx = dr.RasterizeGLContext(device=device):

Traceback (most recent call last):
File "/home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1673, in _run_ninja_build
env=env)
File "/home/wsj/.conda/envs/cu111/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "uv_test_nvdiffrast.py", line 66, in
uv_ops = UVOperation(uv_size, facemodel, device, batch_size=1)
File "/home/wsj/code/Deep3DFaceRecon/uvuv.py", line 108, in init
self.glctx = dr.RasterizeGLContext(device=device)
File "/home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/torch/ops.py", line 160, in init
self.cpp_wrapper = _get_plugin().RasterizeGLStateWrapper(output_db, mode == 'automatic', cuda_device_idx)
File "/home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/torch/ops.py", line 84, in _get_plugin
torch.utils.cpp_extension.load(name=plugin_name, sources=source_paths, extra_cflags=opts, extra_cuda_cflags=opts, extra_ldflags=ldflags, with_cuda=True, verbose=False)
File "/home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1091, in load
keep_intermediates=keep_intermediates)
File "/home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1302, in _jit_compile
is_standalone=is_standalone)
File "/home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1407, in _write_ninja_file_and_build_library
error_prefix=f"Error building extension '{name}'")
File "/home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1683, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'nvdiffrast_plugin': [1/4] c++ -MMD -MF torch_rasterize.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/TH -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda-11.2/include -isystem /home/wsj/.conda/envs/cu111/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/torch/torch_rasterize.cpp -o torch_rasterize.o
FAILED: torch_rasterize.o
c++ -MMD -MF torch_rasterize.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/TH -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda-11.2/include -isystem /home/wsj/.conda/envs/cu111/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/torch/torch_rasterize.cpp -o torch_rasterize.o
In file included from /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/torch/../common/rasterize.h:42:0,
from /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/torch/torch_rasterize.cpp:12:
/home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/torch/../common/glutil.h:36:10: fatal error: EGL/egl.h: No such file or directory
#include <EGL/egl.h>
^~~~~~~~~~~
compilation terminated.
[2/4] c++ -MMD -MF glutil.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/TH -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda-11.2/include -isystem /home/wsj/.conda/envs/cu111/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/common/glutil.cpp -o glutil.o
FAILED: glutil.o
c++ -MMD -MF glutil.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/TH -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda-11.2/include -isystem /home/wsj/.conda/envs/cu111/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/common/glutil.cpp -o glutil.o
In file included from /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/common/glutil.cpp:14:0:
/home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/common/glutil.h:36:10: fatal error: EGL/egl.h: No such file or directory
#include <EGL/egl.h>
^~~~~~~~~~~
compilation terminated.
[3/4] c++ -MMD -MF rasterize.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/TH -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda-11.2/include -isystem /home/wsj/.conda/envs/cu111/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/common/rasterize.cpp -o rasterize.o
FAILED: rasterize.o
c++ -MMD -MF rasterize.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/TH -isystem /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda-11.2/include -isystem /home/wsj/.conda/envs/cu111/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/common/rasterize.cpp -o rasterize.o
In file included from /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/common/rasterize.h:42:0,
from /home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/common/rasterize.cpp:9:
/home/wsj/.conda/envs/cu111/lib/python3.7/site-packages/nvdiffrast/common/glutil.h:36:10: fatal error: EGL/egl.h: No such file or directory
#include <EGL/egl.h>
^~~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.

@s-laine
Copy link
Collaborator

s-laine commented Nov 29, 2021

Target architecture compute_86 that is native to RTX 3090 is only supported in Cuda 11.2 and later, so that is probably why the modified Dockerfiles with Cuda 11.0/11.1 fail. This is because PyTorch's C++/Cuda plugin builder always targets the native architecture of the GPU installed in the system, no matter if the available Cuda toolkit supports it or not, and if it doesn't, the compilation fails. Cuda 11.2 in your host environment should work, but apparently you don't have EGL properly installed there, given that header file EGL/egl.h is not found by the compiler.

I think you have three options here:

The best option is to 1) Use our latest Dockerfile as-is. It uses a base image with up-to-date PyTorch and Cuda versions that support the latest GPUs.

The second option is to 2) Install EGL in your host environment. You should use our Dockerfile as a reference on how to do that, as it is not trivial to get a working setup.

If everything else fails, you can also 3) Modify the plugin compilation function _get_plugin() in nvdiffrast/torch/ops.py to force an older target architecture for NVCC. You can do this by setting, e.g., os.environ['TORCH_CUDA_ARCH_LIST'] = '8.0' on line 71. However, sticking to old versions of tools is generally not great — you may run into compatibility issues, and you may not get the best possible performance out of your hardware.

@s-laine
Copy link
Collaborator

s-laine commented Nov 29, 2021

On the other hand, it appears that since version 1.8.0, PyTorch attempts to clamp the architecture to what the installed Cuda toolkit supports (as seen here). Therefore PyTorch 1.8.0 with Cuda 11.1 should in theory work, compiling to architecture compute_80.

So either that clamping logic fails somehow, or there is some other issue preventing the compilation from succeeding. Setting verbose=True on line 84 of nvdiffrast/torch/ops.py should make compilation errors visible which may help in diagnosing the problem.

@zoe2718
Copy link
Author

zoe2718 commented Dec 1, 2021

Thanks for your quick response.

I update the cuda driver and install cuda 11.3.
1638361596(1)

Then use the provided Dockerfile to build a image.

Because the GPUs of our server has been set user groups and permissions, I cannot run ./run_sample.sh ./samples/torch/cube.py --resolution 32 directly (which will raise RuntimeError: No CUDA GPUs are available), I run a docker container using the command docker run --privileged -it --gpus all --pid=host -v /home/:/home/ "gltorch:latest" /bin/bash, and run python cube.py --resolution 32 inside the container. However, there is an error:

No output directory specified, not saving log or images
Mesh has 12 triangles and 8 vertices.
Using /root/.cache/torch_extensions/py37_cu113 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py37_cu113/nvdiffrast_plugin/build.ninja...
Building extension module nvdiffrast_plugin...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/14] c++ -MMD -MF common.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/common/common.cpp -o common.o
[2/14] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS
-D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -DNVDR_TORCH -std=c++14 -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/common/rasterize.cu -o rasterize.cuda.o
[3/14] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -DNVDR_TORCH -std=c++14 -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/common/antialias.cu -o antialias.cuda.o
[4/14] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -DNVDR_TORCH -std=c++14 -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/common/interpolate.cu -o interpolate.cuda.o
[5/14] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -DNVDR_TORCH -std=c++14 -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/common/texture.cu -o texture.cuda.o
[6/14] c++ -MMD -MF texture.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/common/texture.cpp -o texture.o
[7/14] c++ -MMD -MF rasterize.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/common/rasterize.cpp -o rasterize.o
[8/14] c++ -MMD -MF torch_rasterize.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/torch/torch_rasterize.cpp -o torch_rasterize.o
[9/14] c++ -MMD -MF torch_antialias.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/torch/torch_antialias.cpp -o torch_antialias.o
[10/14] c++ -MMD -MF glutil.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/common/glutil.cpp -o glutil.o
[11/14] c++ -MMD -MF torch_texture.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/torch/torch_texture.cpp -o torch_texture.o
[12/14] c++ -MMD -MF torch_interpolate.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/torch/torch_interpolate.cpp -o torch_interpolate.o
[13/14] c++ -MMD -MF torch_bindings.o.d -DTORCH_EXTENSION_NAME=nvdiffrast_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -DNVDR_TORCH -c /opt/conda/lib/python3.7/site-packages/nvdiffrast/torch/torch_bindings.cpp -o torch_bindings.o
[14/14] c++ common.o glutil.o rasterize.cuda.o rasterize.o interpolate.cuda.o texture.cuda.o texture.o antialias.cuda.o torch_bindings.o torch_rasterize.o torch_interpolate.o torch_texture.o torch_antialias.o -shared -lGL -lEGL -L/opt/conda/lib/python3.7/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o nvdiffrast_plugin.so
Loading extension module nvdiffrast_plugin...
[F glutil.cpp:338] eglInitialize() failed
Aborted (core dumped)

@s-laine
Copy link
Collaborator

s-laine commented Dec 1, 2021

Nvdiffrast requires an OpenGL device for executing the rasterization op, and EGL is required for to get an OpenGL context, i.e., to get access to the graphics pipeline of the GPU. The EGL initialization failure suggests that the OpenGL configuration is somehow not functional in your cluster environment. This could perhaps be an issue with permissions, but I don't think that should result in EGL initialization failure. Thus it's probably related to some other part of the cluster configuration, and likely not something you can fix without going through the cluster management. Maybe there are some OS-level Nvidia drivers missing in the cluster machine?

@zoe2718
Copy link
Author

zoe2718 commented Dec 3, 2021

This is indeed caused by the nvidia driver. The nvidia driver was installed with argument -no-opengl-files before.
I reinstall the nvidia driver without -no-opengl-files and all problems are gone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants