Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation in Docker : error in compilation of extlib/mat_mul.cu #265

Closed
fmagera opened this issue Aug 4, 2022 · 5 comments
Closed

Installation in Docker : error in compilation of extlib/mat_mul.cu #265

fmagera opened this issue Aug 4, 2022 · 5 comments

Comments

@fmagera
Copy link

fmagera commented Aug 4, 2022

Hi !
Would it be possible to have a Dockerfile with the right config ?
I've tried many compatible versions of pytorch and CUDA, but I always get the same error when building theseus-ai..
In my last trial, I started from nvidia ngc pytorch container nvcr.io/nvidia/pytorch:21.06-py3, which is an Ubuntu 20.04 with cuda 11.3, python 3.8 and I reinstalled torch==1.10.1+cu113 version.

Here's the full error :

    /home/dir/theseus/theseus/extlib/mat_mult.cu(74): error: no instance of overloaded function "atomicAdd" matches the argument list
                argument types are: (double *, double)
    /home/dir/theseus/theseus/extlib/mat_mult.cu(239): error: no instance of overloaded function "atomicAdd" matches the argument list
                argument types are: (double *, double)
    2 errors detected in the compilation of "/home/dir/theseus/theseus/extlib/mat_mult.cu".
    ninja: build stopped: subcommand failed.
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1717, in _run_ninja_build
        subprocess.run(
      File "/opt/conda/lib/python3.8/subprocess.py", line 516, in run
        raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
    The above exception was the direct cause of the following exception:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/dir/theseus/setup.py", line 60, in <module>
        setuptools.setup(
      File "/opt/conda/lib/python3.8/site-packages/setuptools/__init__.py", line 163, in setup
        return distutils.core.setup(**attrs)
      File "/opt/conda/lib/python3.8/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/opt/conda/lib/python3.8/distutils/dist.py", line 966, in run_commands
        self.run_command(cmd)
      File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/opt/conda/lib/python3.8/site-packages/setuptools/command/develop.py", line 38, in run
        self.install_for_development()
      File "/opt/conda/lib/python3.8/site-packages/setuptools/command/develop.py", line 140, in install_for_development
        self.run_command('build_ext')
      File "/opt/conda/lib/python3.8/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/opt/conda/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 87, in run
        _build_ext.run(self)
      File "/opt/conda/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
        _build_ext.build_ext.run(self)
      File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 340, in run
        self.build_extensions()
      File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 735, in build_extensions
        build_ext.build_extensions(self)
      File "/opt/conda/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 194, in build_extensions
        self.build_extension(ext)
      File "/opt/conda/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 208, in build_extension
        _build_ext.build_extension(self, ext)
      File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension
        objects = self.compiler.compile(sources,
      File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 556, in unix_wrap_ninja_compile
        _write_ninja_file_and_compile_objects(
      File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1399, in _write_ninja_file_and_compile_objects
        _run_ninja_build(
      File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
        raise RuntimeError(message) from e
    RuntimeError: Error compiling objects for extension
    ----------------------------------------
ERROR: Command errored out with exit status 1: /opt/conda/bin/python3.8 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/dir/theseus/setup.py'"'"'; __file__='"'"'/home/dir/theseus/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps Check the logs for full command output.


@luisenp
Copy link
Contributor

luisenp commented Aug 4, 2022

HI @fmagera, sorry about the install issues. Can you take a look at #257 and see if it helps? It adds a bash script that can be used to generate a Dockerfile for compiling wheels using pytorch's manylinux containers. Usage is ./build_scripts/build_wheel.sh <OUTPUT_DIR>. If the wheels don't work for you maybe you can use it as a starting point?

@fanyangr
Copy link

fanyangr commented Aug 4, 2022

Hi @fmagera , I had a similar issue. What I did was to use conda to install ninja again first. Then I completely remove the theseus file and clone it again, because there might be some cmake files from previous installlation that could lead to additional error. Hope that also works for you.

@fmagera
Copy link
Author

fmagera commented Aug 5, 2022

The script worked, thanks 👍

@fmagera fmagera closed this as completed Aug 5, 2022
@MickShen7558
Copy link

Hi @fmagera,

Thank you for your scripts. I had the same issue when installing Theseus in my container. Now I built the wheel and successfully installed it to my conda env in my container. But when I run pytest, I failed with the following message:

======================================================================================================= short test summary info ========================================================================================================
FAILED theseus/extlib/tests/test_cusolver_lu_solver.py::test_lu_solver_1 - ModuleNotFoundError: No module named 'theseus.extlib.cusolver_lu_solver'
FAILED theseus/extlib/tests/test_cusolver_lu_solver.py::test_lu_solver_2 - ModuleNotFoundError: No module named 'theseus.extlib.cusolver_lu_solver'
FAILED theseus/extlib/tests/test_cusolver_lu_solver.py::test_lu_solver_3 - ModuleNotFoundError: No module named 'theseus.extlib.cusolver_lu_solver'
FAILED theseus/extlib/tests/test_cusolver_lu_solver.py::test_lu_solver_4 - ModuleNotFoundError: No module named 'theseus.extlib.cusolver_lu_solver'
FAILED theseus/extlib/tests/test_cusolver_lu_solver.py::test_lu_solver_5 - ModuleNotFoundError: No module named 'theseus.extlib.cusolver_lu_solver'
FAILED theseus/extlib/tests/test_cusolver_lu_solver.py::test_lu_solver_6 - ModuleNotFoundError: No module named 'theseus.extlib.cusolver_lu_solver'
FAILED theseus/extlib/tests/test_mat_mult.py::test_mat_mult_1 - ModuleNotFoundError: No module named 'theseus.extlib.mat_mult'
FAILED theseus/extlib/tests/test_mat_mult.py::test_mat_mult_2 - ModuleNotFoundError: No module named 'theseus.extlib.mat_mult'
FAILED theseus/extlib/tests/test_mat_mult.py::test_mat_mult_3 - ModuleNotFoundError: No module named 'theseus.extlib.mat_mult'
FAILED theseus/extlib/tests/test_mat_mult.py::test_mat_mult_4 - ModuleNotFoundError: No module named 'theseus.extlib.mat_mult'
FAILED theseus/extlib/tests/test_mat_mult.py::test_mat_mult_5 - ModuleNotFoundError: No module named 'theseus.extlib.mat_mult'
FAILED theseus/extlib/tests/test_mat_mult.py::test_mat_mult_6 - ModuleNotFoundError: No module named 'theseus.extlib.mat_mult'
FAILED theseus/optimizer/autograd/tests/test_lu_cuda_sparse_backward.py::test_sparse_backward_step - RuntimeError: Theseus C++/Cuda extension cannot be loaded
FAILED theseus/optimizer/linear/tests/test_lu_cuda_sparse_solver.py::test_sparse_solver - RuntimeError: Theseus C++/Cuda extension cannot be loaded
FAILED theseus/optimizer/linear/tests/test_lu_cuda_sparse_solver.py::test_sparse_solver_multistep_gradient - RuntimeError: Theseus C++/Cuda extension cannot be loaded
FAILED theseus/optimizer/linear/tests/test_lu_cuda_sparse_solver.py::test_sparse_solver_multistep_exception - RuntimeError: Theseus C++/Cuda extension cannot be loaded
FAILED theseus/optimizer/nonlinear/tests/test_levenberg_marquardt.py::test_ellipsoidal_damping_compatibility_cuda - RuntimeError: Theseus C++/Cuda extension cannot be loaded

Seems like cusolver_lu_solver and mat_mult are not correctly installed. Besides, there are issues when loading the Theseus C++/Cuda extension. Do you have any ideas on how to fix those?

@fmagera
Copy link
Author

fmagera commented Aug 11, 2022

Hi @MickShen7558,

It's not my scripts, I'm not the one to thank :)
Yes I had the same issue, I was running the tests from my copy of theseus. I guess it's just an import confusion problem so I worked around it by copying the installed .so libraries in the local repo to run the tests, and they all passed.

@facebookresearch facebookresearch locked and limited conversation to collaborators Aug 12, 2022
@mhmukadam mhmukadam converted this issue into discussion #269 Aug 12, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants