Installation failed with cmake error #355

RuiWang1998 · 2023-08-03T07:01:47Z

Hi,

We are testing our new Hopper machines (H800/H100) and trying to use fp8 for training for the first time, but are having trouble installing TransformerEngine. It reports RuntimeError: Error when running CMake: Command '['/usr/local/bin/cmake', '-S', '/tmp/pip-req-build-p6kjladj/transformer_engine', '-B', '/tmp/tmps08o01xi', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-p6kjladj/build/lib.linux-x86_64-cpython-310', '-GNinja']' returned non-zero exit status 1..

We tried to invoke the command outside of pip and it just reports that there are no source directory.

We are trying docker right now but our internet configuration does not let us use docker very conveniently so we usually would prefer not use it. Could you should us where we could find any clues on how we can proceed? Much appreciated.

The text was updated successfully, but these errors were encountered:

ptrendx · 2023-08-03T18:07:00Z

Hi @RuiWang1998, could you share the command you use for installation and a full error message that you are getting? Thank you!

RuiWang1998 · 2023-08-04T03:11:09Z

Hi @ptrendx, we used both pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable and pip install git+https://github.com/NVIDIA/TransformerEngine.git@main and tried python version from 3.9 to 3.11. Everytime we simply install pytorch==2.0.1 and packaging and then ran the two commands. They both returned the same error

RuiWang1998 · 2023-08-04T09:52:00Z

Hi @ptrendx, after a little digging, we think we have located the problem but not sure what's the solution here:

/usr/bin/c++ -Dtransformer_engine_EXPORTS -I/home/rui/TransformerEngine/transformer_engine -I/home/rui/TransformerEngine/transformer_engine/common/include -I/usr/local/cuda-11.8/targets/x86_64-linux/include -I/home/rui/TransformerEngine/transformer_engine/../3rdparty/cudnn-frontend/include -I/tmp/tmp9cj2vyni/common/string_headers -isystem /usr/local/cuda-11.8/include -O3 -DNDEBUG -std=gnu++17 -fPIC -MD -MT common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn.cpp.o -MF common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn.cpp.o.d -o common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn.cpp.o -c /home/rui/TransformerEngine/transformer_engine/common/fused_attn/fused_attn.cpp
In file included from /usr/local/cuda-11.8/include/cuda_fp8.h:350,
                 from /home/rui/TransformerEngine/transformer_engine/common/fused_attn/../common.h:14,
                 from /home/rui/TransformerEngine/transformer_engine/common/fused_attn/fused_attn.cpp:8:
/usr/local/cuda-11.8/include/cuda_fp8.hpp: In member function ‘__nv_fp8_e5m2::operator short unsigned int() const’:
/usr/local/cuda-11.8/include/cuda_fp8.hpp:735:16: error: ‘__half2ushort_rz’ was not declared in this scope
  735 |         return __half2ushort_rz(__half(*this));
      |                ^~~~~~~~~~~~~~~~
/usr/local/cuda-11.8/include/cuda_fp8.hpp: In member function ‘__nv_fp8_e5m2::operator unsigned int() const’:
/usr/local/cuda-11.8/include/cuda_fp8.hpp:744:16: error: ‘__half2uint_rz’ was not declared in this scope
  744 |         return __half2uint_rz(__half(*this));
      |                ^~~~~~~~~~~~~~
/usr/local/cuda-11.8/include/cuda_fp8.hpp: In member function ‘__nv_fp8_e5m2::operator long long unsigned int() const’:
/usr/local/cuda-11.8/include/cuda_fp8.hpp:753:16: error: ‘__half2ull_rz’ was not declared in this scope; did you mean ‘__half2_raw’?
  753 |         return __half2ull_rz(__half(*this));
      |                ^~~~~~~~~~~~~
      |                __half2_raw
/usr/local/cuda-11.8/include/cuda_fp8.hpp: In member function ‘__nv_fp8_e5m2::operator short int() const’:
/usr/local/cuda-11.8/include/cuda_fp8.hpp:791:16: error: ‘__half2short_rz’ was not declared in this scope
  791 |         return __half2short_rz(__half(*this));
      |                ^~~~~~~~~~~~~~~
/usr/local/cuda-11.8/include/cuda_fp8.hpp: In member function ‘__nv_fp8_e5m2::operator int() const’:
/usr/local/cuda-11.8/include/cuda_fp8.hpp:800:16: error: ‘__half2int_rz’ was not declared in this scope; did you mean ‘__half2_raw’?
  800 |         return __half2int_rz(__half(*this));
      |                ^~~~~~~~~~~~~
      |                __half2_raw
/usr/local/cuda-11.8/include/cuda_fp8.hpp: In member function ‘__nv_fp8_e5m2::operator long long int() const’:
/usr/local/cuda-11.8/include/cuda_fp8.hpp:809:16: error: ‘__half2ll_rz’ was not declared in this scope; did you mean ‘__half2_raw’?
  809 |         return __half2ll_rz(__half(*this));
      |                ^~~~~~~~~~~~
      |                __half2_raw
/usr/local/cuda-11.8/include/cuda_fp8.hpp: In member function ‘__nv_fp8_e4m3::operator short unsigned int() const’:
/usr/local/cuda-11.8/include/cuda_fp8.hpp:1248:16: error: ‘__half2ushort_rz’ was not declared in this scope
 1248 |         return __half2ushort_rz(__half(*this));
      |                ^~~~~~~~~~~~~~~~
/usr/local/cuda-11.8/include/cuda_fp8.hpp: In member function ‘__nv_fp8_e4m3::operator unsigned int() const’:
/usr/local/cuda-11.8/include/cuda_fp8.hpp:1257:16: error: ‘__half2uint_rz’ was not declared in this scope
 1257 |         return __half2uint_rz(__half(*this));
      |                ^~~~~~~~~~~~~~
/usr/local/cuda-11.8/include/cuda_fp8.hpp: In member function ‘__nv_fp8_e4m3::operator long long unsigned int() const’:
/usr/local/cuda-11.8/include/cuda_fp8.hpp:1266:16: error: ‘__half2ull_rz’ was not declared in this scope; did you mean ‘__half2_raw’?
 1266 |         return __half2ull_rz(__half(*this));
      |                ^~~~~~~~~~~~~
      |                __half2_raw
/usr/local/cuda-11.8/include/cuda_fp8.hpp: In member function ‘__nv_fp8_e4m3::operator short int() const’:
/usr/local/cuda-11.8/include/cuda_fp8.hpp:1303:16: error: ‘__half2short_rz’ was not declared in this scope
 1303 |         return __half2short_rz(__half(*this));
      |                ^~~~~~~~~~~~~~~
/usr/local/cuda-11.8/include/cuda_fp8.hpp: In member function ‘__nv_fp8_e4m3::operator int() const’:
/usr/local/cuda-11.8/include/cuda_fp8.hpp:1311:16: error: ‘__half2int_rz’ was not declared in this scope; did you mean ‘__half2_raw’?
 1311 |         return __half2int_rz(__half(*this));
      |                ^~~~~~~~~~~~~
      |                __half2_raw
/usr/local/cuda-11.8/include/cuda_fp8.hpp: In member function ‘__nv_fp8_e4m3::operator long long int() const’:
/usr/local/cuda-11.8/include/cuda_fp8.hpp:1319:16: error: ‘__half2ll_rz’ was not declared in this scope; did you mean ‘__half2_raw’?
 1319 |         return __half2ll_rz(__half(*this));
      |                ^~~~~~~~~~~~
      |                __half2_raw
ninja: build stopped: subcommand failed.

Seems like we are missing some headers, where can we include one?

We have machines with CUDA 11.8 and machines with CUDA 12 and we believe they share the same reason here.

RuiWang1998 · 2023-08-04T11:31:55Z

Hi,

Some updates, our machines with H800 can successfully install now but A100 machines cannot yet. H800 machines just needed CUDNN but A100 machines, even after installation of CUDNN, still meets the error above.

ptrendx · 2023-08-07T20:24:22Z

Hi, this is a pretty strange error - functions like __half2ushort_rz are declared inside the cuda_fp16.hpp file, which should be in the include directory in your CUDA installation (in this case /usr/local/cuda-11.8/include or /usr/local/cuda-11.8/targets/x86_64-linux/include). Could you confirm that such file exists there?

RuiWang1998 · 2023-08-08T02:33:37Z

Hi, yes it is in /usr/local/cuda-11.8/include and it seems that __half2ushort_rz is declared there.

MicPie · 2023-08-31T12:28:43Z

Any update on this issue?

RuiWang1998 · 2023-09-01T03:58:53Z

Hi, @MicPie ,

We have been able to install this with newer commits now. Were you trying on stable releases?

mahdip72 · 2023-11-21T09:05:30Z

I have the same problem in my workstation with A6000 ada.

raise RuntimeError(f"Error when running CMake: {e}")
      RuntimeError: Error when running CMake: Command '['/usr/bin/cmake', '-S', '/tmp/pip-req-build-hnl1xnl7/transformer_engine', '-B', '/tmp/tmp6vkf06mc', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-hnl1xnl7/build/lib.linux-x86_64-cpython-311']' returned non-zero exit status 1.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for transformer-engine

@RuiWang1998 Could you help me what should I do? Install CUDNN?
Cuda 11.8
pytorch 2.1.0
python 3.11
ubuntu 22.04

RuiWang1998 · 2023-11-21T10:28:10Z

Hi, You would have to modify setup.py and make it output the actual error message (maybe by manual input of commands in terminal) s.t. we can know exactly what is going on. Best, Rui On Nov 21, 2023 at 5:05 PM +0800, mahdip72 ***@***.***>, wrote: I have the same problem in my workstation with A6000 ada. raise RuntimeError(f"Error when running CMake: {e}") RuntimeError: Error when running CMake: Command '['/usr/bin/cmake', '-S', '/tmp/pip-req-build-hnl1xnl7/transformer_engine', '-B', '/tmp/tmp6vkf06mc', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-hnl1xnl7/build/lib.linux-x86_64-cpython-311']' returned non-zero exit status 1. [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for transformer-engine @RuiWang1998<https://github.com/RuiWang1998> Could you help me what should I do? Install CUDNN? — Reply to this email directly, view it on GitHub<#355 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AHUU7JFXB74O7EPHGY5HJULYFRVGNAVCNFSM6AAAAAA3CJV7S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRQGUYDGOJSHA>. You are receiving this because you were mentioned.Message ID: ***@***.***>

liuchangdm · 2024-02-19T07:40:57Z

Hi, @MicPie ,

We have been able to install this with newer commits now. Were you trying on stable releases?

@RuiWang1998 Could you show which release version that you use ? I had the same problems. Thanks.

hellangleZ · 2024-04-02T05:27:51Z

Same issue

File "/aml2/TransformerEngine/setup.py", line 338, in _build_cmake
raise RuntimeError(f"Error when running CMake: {e}")
RuntimeError: Error when running CMake: Command '['/aml/conda/bin/cmake', '-S', '/aml2/TransformerEngine/transformer_engine', '-B', '/aml2/TransformerEngine/build/cmake', '-DPython_EXECUTABLE=/aml2/ds2/bin/python', '-DPython_INCLUDE_DIR=/aml2/ds2/include/python3.10', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/aml2/TransformerEngine/build/lib.linux-x86_64-cpython-310', '-GNinja', '-Dpybind11_DIR=/aml2/ds2/lib/python3.10/site-packages/pybind11/share/cmake/pybind11']' returned non-zero exit status 1.
[end of output]

timmoon10 · 2024-04-02T23:53:54Z

The CMake error message should already be printed to stderr, although it is somewhat buried within the Python stacktrace from setup.py. It may be helpful to search for "Building CMake extension transformer_engine" within your build logs.

If the error is happening during CMake configuration, it's probably because CUDA or cuDNN are not properly installed. See CUDA instructions at #700 (comment). For cuDNN, make sure CUDNN_PATH is set in your environment.

BrunoFANG1 · 2024-04-28T18:07:11Z

I solved this issue by simply use this command

git submodule update --init --recursive

Under the TransformerEngine dir, I hope this might help you.

sfdeggb · 2024-07-16T10:13:11Z

I also meet the question. the question details information is :

raise RuntimeError(f"Error when running CMake: {e}")
RuntimeError: Error when running CMake: Command '['/usr/bin/cmake', '-S', '/tmp/pip-req-build-yvwm9h7r/transformer_engine', '-B', '/tmp/pip-req-build-yvwm9h7r/build/cmake',
DPython_EXECUTABLE=/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/bin/python3.1', '-DPython_INCLUDE_DIR=/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/include/python3.11', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-yvwm9h7r/build/lib.linux-x86_64-cpython-311', '-GNinja']' returned non-zero exit status 1.

My environment is below:
ubuntu 22.04
cuda:11.7
python: 3.11
torch:2.3.1
nvidia driver:535.183.06
Look forward to a solution！

wplf · 2024-07-16T10:20:17Z

I also meet the question. the question details information is :

raise RuntimeError(f"Error when running CMake: {e}") RuntimeError: Error when running CMake: Command '['/usr/bin/cmake', '-S', '/tmp/pip-req-build-yvwm9h7r/transformer_engine', '-B', '/tmp/pip-req-build-yvwm9h7r/build/cmake', DPython_EXECUTABLE=/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/bin/python3.1', '-DPython_INCLUDE_DIR=/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/include/python3.11', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-yvwm9h7r/build/lib.linux-x86_64-cpython-311', '-GNinja']' returned non-zero exit status 1.

My environment is below: ubuntu 22.04 cuda:11.7 python: 3.11 torch:2.3.1 nvidia driver:535.183.06 Look forward to a solution！

Hello, my friend!
You can check if your nvcc is added to environment.

nvcc --version

If error occurs, you may fix it by export PATH=/usr/local/cuda/bin:$PATH or something like this.

sfdeggb · 2024-07-16T10:35:58Z

@wplf yeah! my nvcc is seem ok! the information is below:

ubuntu@ip-172-31-38-93:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0
Are there any other solutions？

wplf · 2024-07-16T10:39:07Z

compiler

Can you check your cmake version?
You can install cmake by pip install cmake

sfdeggb · 2024-07-16T10:47:21Z

@wplf
the cmake version is below:

(yuxunlian) ubuntu@ip-172-31-38-93:~$ cmake --version
cmake version 3.22.1
CMake suite maintained and supported by Kitware (kitware.com/cmake).

Is this version appropriate？

wplf · 2024-07-16T10:49:19Z

@wplf the cmake version is below:

(yuxunlian) ubuntu@ip-172-31-38-93:~$ cmake --version cmake version 3.22.1 CMake suite maintained and supported by Kitware (kitware.com/cmake).

Is this version appropriate？

Yes， this is ok。
Sorry， I can't help you anymore.

sfdeggb · 2024-07-16T10:52:00Z

@wplf
it does not matter! Thank you for your reply！

FidanVural · 2024-10-04T07:17:16Z

Any update on this issue? I'm still getting the same error.

timmoon10 · 2024-10-04T18:38:09Z

If you are experiencing an error that looks like RuntimeError: Error when running CMake, then something has failed in the build process (probably a CMake configuration error or a compilation error). Please look through the build logs to find more details or post enough of the build logs so we can figure out what's going on. To print the maximum amount of information during the build process:

cd transformer_engine
pip install -v -v -v .

Some common build errors and fixes:

Uninitialized Git submodules: Run git submodule update --init --recursive.
CMake can't find a C++ compiler: Set CXX in the environment.
CMake can't find CUDA: Set CUDA_PATH in the environment.
CMake can't find cuDNN: Set CUDNN_PATH in the environment.
Invalid dependency versions: Consult TE's requirements. As of TE 1.11, TE requires CUDA 12.0+ and cuDNN 8.1+.
Hang during compilation: Try disabling parallelism in the build process by setting MAX_JOBS=1 and NVTE_BUILD_THREADS_PER_JOB=1 in the environment. See stuck at building wheel #1077 (comment) for more guidance.

I'll lock this issue to make this comment easier for users to find, but please open a new issue if you are encountering a build error (with enough of the build log for us to help).

timmoon10 closed this as completed Oct 4, 2024

NVIDIA locked as resolved and limited conversation to collaborators Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Installation failed with cmake error #355

Installation failed with cmake error #355

RuiWang1998 commented Aug 3, 2023

ptrendx commented Aug 3, 2023

RuiWang1998 commented Aug 4, 2023

RuiWang1998 commented Aug 4, 2023

RuiWang1998 commented Aug 4, 2023

ptrendx commented Aug 7, 2023

RuiWang1998 commented Aug 8, 2023

MicPie commented Aug 31, 2023

RuiWang1998 commented Sep 1, 2023

mahdip72 commented Nov 21, 2023 •

edited

Loading

RuiWang1998 commented Nov 21, 2023 via email

liuchangdm commented Feb 19, 2024 •

edited

Loading

hellangleZ commented Apr 2, 2024

timmoon10 commented Apr 2, 2024

BrunoFANG1 commented Apr 28, 2024

sfdeggb commented Jul 16, 2024

wplf commented Jul 16, 2024 •

edited

Loading

sfdeggb commented Jul 16, 2024

wplf commented Jul 16, 2024

sfdeggb commented Jul 16, 2024

wplf commented Jul 16, 2024

sfdeggb commented Jul 16, 2024

FidanVural commented Oct 4, 2024

timmoon10 commented Oct 4, 2024 •

edited

Loading

Installation failed with cmake error #355

Installation failed with cmake error #355

Comments

RuiWang1998 commented Aug 3, 2023

ptrendx commented Aug 3, 2023

RuiWang1998 commented Aug 4, 2023

RuiWang1998 commented Aug 4, 2023

RuiWang1998 commented Aug 4, 2023

ptrendx commented Aug 7, 2023

RuiWang1998 commented Aug 8, 2023

MicPie commented Aug 31, 2023

RuiWang1998 commented Sep 1, 2023

mahdip72 commented Nov 21, 2023 • edited Loading

RuiWang1998 commented Nov 21, 2023 via email

liuchangdm commented Feb 19, 2024 • edited Loading

hellangleZ commented Apr 2, 2024

timmoon10 commented Apr 2, 2024

BrunoFANG1 commented Apr 28, 2024

sfdeggb commented Jul 16, 2024

wplf commented Jul 16, 2024 • edited Loading

sfdeggb commented Jul 16, 2024

wplf commented Jul 16, 2024

sfdeggb commented Jul 16, 2024

wplf commented Jul 16, 2024

sfdeggb commented Jul 16, 2024

FidanVural commented Oct 4, 2024

timmoon10 commented Oct 4, 2024 • edited Loading

mahdip72 commented Nov 21, 2023 •

edited

Loading

liuchangdm commented Feb 19, 2024 •

edited

Loading

wplf commented Jul 16, 2024 •

edited

Loading

timmoon10 commented Oct 4, 2024 •

edited

Loading