RuntimeError: Error building extension 'cpu_adam' #2682

arain60gb · 2023-01-09T19:15:25Z

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[INFO|trainer.py:414] 2023-01-09 19:06:58,180 >> Using amp fp16 backend
[2023-01-09 19:06:58,187] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.7, git-hash=unknown, git-branch=unknown
[2023-01-09 19:06:58,191] [WARNING] [config_utils.py:67:process_deprecated_field] Config parameter cpu_offload is deprecated use offload_optimizer instead
[2023-01-09 19:07:05,242] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root...
Creating extension directory /root/.cache/torch_extensions/py38_cu117/cpu_adam...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py38_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[1/3] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/THC -isystem /root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -L/root/anaconda3/envs/bitten/lib64 -lcudart -lcublas -g -march=native -fopenmp -DAVX256 -c /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o
FAILED: cpu_adam.o
c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/THC -isystem /root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -L/root/anaconda3/envs/bitten/lib64 -lcudart -lcublas -g -march=native -fopenmp -DAVX256 -c /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o
In file included from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/context.h:11:0,
from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/custom_cuda_layers.h:16,
from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/cpu_adam.h:11,
from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp:1:
/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/gemm_test.h:6:10: fatal error: cuda_profiler_api.h: No such file or directory
#include <cuda_profiler_api.h>
^~~~~~~~~
compilation terminated.
[2/3] /root/anaconda3/envs/bitten/bin/nvcc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/THC -isystem /root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -D_CUDA_NO_HALF_CONVERSIONS -D_CUDA_NO_BFLOAT16_CONVERSIONS -D_CUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U_CUDA_NO_HALF_OPERATORS -U_CUDA_NO_HALF_CONVERSIONS_ -U_CUDA_NO_HALF2_OPERATORS_ -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -c /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o
FAILED: custom_cuda_kernel.cuda.o
/root/anaconda3/envs/bitten/bin/nvcc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/THC -isystem /root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -D_CUDA_NO_HALF_CONVERSIONS -D_CUDA_NO_BFLOAT16_CONVERSIONS_ -D_CUDA_NO_HALF2_OPERATORS_ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U_CUDA_NO_HALF_OPERATORS_ -U_CUDA_NO_HALF_CONVERSIONS_ -U_CUDA_NO_HALF2_OPERATORS_ -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -c /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o
In file included from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/context.h:11:0,
from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/custom_cuda_layers.h:16,
from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu:1:
/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/gemm_test.h:6:10: fatal error: cuda_profiler_api.h: No such file or directory
#include <cuda_profiler_api.h>
^~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build
subprocess.run(
File "/root/anaconda3/envs/bitten/lib/python3.8/subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "run_clm.py", line 478, in
main()
File "run_clm.py", line 441, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/transformers/trainer.py", line 1112, in train
deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/transformers/deepspeed.py", line 355, in deepspeed_init
model, optimizer, _, lr_scheduler = deepspeed.initialize(
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/init.py", line 125, in initialize
engine = DeepSpeedEngine(args=args,
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 330, in init
self._configure_optimizer(optimizer, model_parameters)
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1195, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1266, in _configure_basic_optimizer
optimizer = DeepSpeedCPUAdam(model_parameters,
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 94, in init
self.ds_opt_adam = CPUAdamBuilder().load()
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 460, in load
return self.jit_load(verbose)
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 495, in jit_load
op_module = load(
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1508, in _jit_compile
_write_ninja_file_and_build_library(
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1623, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'cpu_adam'
Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7efc99bbd1f0>
Traceback (most recent call last):
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 108, in del
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
[2023-01-09 19:07:12,824] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 7728
[2023-01-09 19:07:12,826] [ERROR] [launch.py:324:sigkill_handler]

The text was updated successfully, but these errors were encountered:

arain60gb · 2023-01-09T19:17:18Z

kindly guide me how i resolve this issue.Thanks

tjruwase · 2023-01-09T21:38:22Z

@zainali60, from your log this might be the same as #2622.

HeyangQin · 2023-01-12T00:49:33Z

To follow up on this issue: the root cause is on the pytorch side. They accidentally shipped the nvcc with their conda package which breaks the toolchain. The issue has been reported to the pytorch team and it should be fixed in the next release.

For now, please use temporary workaround: export PATH=/usr/local/cuda/bin:$PATH

Ref: https://discuss.pytorch.org/t/not-able-to-include-cusolverdn-h/169122

Please feel free to reopen the issue if the above solution doesn't work.

arain60gb added bug Something isn't working compression labels Jan 9, 2023

tjruwase added training and removed compression labels Jan 9, 2023

tjruwase assigned HeyangQin Jan 11, 2023

HeyangQin closed this as completed Jan 12, 2023

SingL3 mentioned this issue Apr 27, 2023

[BUG] Compiling Error: ModuleNotFoundError: No module named 'cmake' and failed to set dynamic section sizes: bad value #3395

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Error building extension 'cpu_adam' #2682

RuntimeError: Error building extension 'cpu_adam' #2682

arain60gb commented Jan 9, 2023

arain60gb commented Jan 9, 2023

tjruwase commented Jan 9, 2023 •

edited

Loading

HeyangQin commented Jan 12, 2023

RuntimeError: Error building extension 'cpu_adam' #2682

RuntimeError: Error building extension 'cpu_adam' #2682

Comments

arain60gb commented Jan 9, 2023

arain60gb commented Jan 9, 2023

tjruwase commented Jan 9, 2023 • edited Loading

HeyangQin commented Jan 12, 2023

tjruwase commented Jan 9, 2023 •

edited

Loading