Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

instal from sourece error: compiler_compat/ld: cannot find -lcuda #2720

Closed
leonardxie opened this issue Feb 2, 2024 · 5 comments
Closed

Comments

@leonardxie
Copy link

Here are the errors. l failed to install the vllm0.3.0 from source. CUDA==12.1
the install log is shown belown

`(py310_cu121_vllm) [ vllm-0.3.0]$ proxychains4 python setup.py install
running install
/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:66:
running bdist_egg
running egg_info
creating vllm.egg-info
writing vllm.egg-info/PKG-INFO
writing dependency_links to vllm.egg-info/dependency_links.txt
writing requirements to vllm.egg-info/requires.txt
writing top-level names to vllm.egg-info/top_level.txt
writing manifest file 'vllm.egg-info/SOURCES.txt'
reading manifest file 'vllm.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file 'vllm.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib.linux-x86_64-cpython-310
creating build/lib.linux-x86_64-cpython-310/vllm
copying vllm/block.py -> build/lib.linux-x86_64-cpython-310/vllm
copying vllm/outputs.py -> build/lib.linux-x86_64-cpython-310/vllm
copying vllm/sampling_params.py -> build/lib.linux-x86_64-cpython-310/vllm
copying vllm/config.py -> build/lib.linux-x86_64-cpython-310/vllm
copying vllm/test_utils.py -> build/lib.linux-x86_64-cpython-310/vllm
copying vllm/utils.py -> build/lib.linux-x86_64-cpython-310/vllm
copying vllm/logger.py -> build/lib.linux-x86_64-cpython-310/vllm
copying vllm/prefix.py -> build/lib.linux-x86_64-cpython-310/vllm
copying vllm/sequence.py -> build/lib.linux-x86_64-cpython-310/vllm
copying vllm/init.py -> build/lib.linux-x86_64-cpython-310/vllm
creating build/lib.linux-x86_64-cpython-310/vllm/transformers_utils
copying vllm/transformers_utils/tokenizer.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils
copying vllm/transformers_utils/config.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils
copying vllm/transformers_utils/init.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils
creating build/lib.linux-x86_64-cpython-310/vllm/entrypoints
copying vllm/entrypoints/llm.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints
copying vllm/entrypoints/api_server.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints
copying vllm/entrypoints/init.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints
creating build/lib.linux-x86_64-cpython-310/vllm/core
copying vllm/core/scheduler.py -> build/lib.linux-x86_64-cpython-310/vllm/core
copying vllm/core/init.py -> build/lib.linux-x86_64-cpython-310/vllm/core
copying vllm/core/block_manager.py -> build/lib.linux-x86_64-cpython-310/vllm/core
copying vllm/core/policy.py -> build/lib.linux-x86_64-cpython-310/vllm/core
creating build/lib.linux-x86_64-cpython-310/vllm/lora
copying vllm/lora/models.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
copying vllm/lora/lora.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
copying vllm/lora/request.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
copying vllm/lora/layers.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
copying vllm/lora/punica.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
copying vllm/lora/worker_manager.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
copying vllm/lora/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
copying vllm/lora/init.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
creating build/lib.linux-x86_64-cpython-310/vllm/worker
copying vllm/worker/model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
copying vllm/worker/cache_engine.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
copying vllm/worker/worker.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
copying vllm/worker/init.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
creating build/lib.linux-x86_64-cpython-310/vllm/engine
copying vllm/engine/llm_engine.py -> build/lib.linux-x86_64-cpython-310/vllm/engine
copying vllm/engine/async_llm_engine.py -> build/lib.linux-x86_64-cpython-310/vllm/engine
copying vllm/engine/metrics.py -> build/lib.linux-x86_64-cpython-310/vllm/engine
copying vllm/engine/ray_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/engine
copying vllm/engine/init.py -> build/lib.linux-x86_64-cpython-310/vllm/engine
copying vllm/engine/arg_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/engine
creating build/lib.linux-x86_64-cpython-310/vllm/model_executor
copying vllm/model_executor/model_loader.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor
copying vllm/model_executor/sampling_metadata.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor
copying vllm/model_executor/weight_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor
copying vllm/model_executor/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor
copying vllm/model_executor/init.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor
copying vllm/model_executor/input_metadata.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor
creating build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
copying vllm/transformers_utils/configs/mpt.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
copying vllm/transformers_utils/configs/aquila.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
copying vllm/transformers_utils/configs/falcon.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
copying vllm/transformers_utils/configs/chatglm.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
copying vllm/transformers_utils/configs/qwen.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
copying vllm/transformers_utils/configs/yi.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
copying vllm/transformers_utils/configs/init.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
copying vllm/transformers_utils/configs/baichuan.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
creating build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizers
copying vllm/transformers_utils/tokenizers/init.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizers
copying vllm/transformers_utils/tokenizers/baichuan.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizers
creating build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
copying vllm/entrypoints/openai/serving_chat.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
copying vllm/entrypoints/openai/serving_completion.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
copying vllm/entrypoints/openai/api_server.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
copying vllm/entrypoints/openai/serving_engine.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
copying vllm/entrypoints/openai/protocol.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
copying vllm/entrypoints/openai/init.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/deepseek.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/internlm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/gpt_j.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/gpt_neox.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/gpt2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/mpt.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/stablelm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/aquila.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/falcon.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/qwen2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/llama.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/phi.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/chatglm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/bloom.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/gpt_bigcode.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/decilm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/opt.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/mixtral.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/qwen.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/mistral.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/yi.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/init.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/mixtral_quant.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/baichuan.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/parallel_utils
copying vllm/model_executor/parallel_utils/parallel_state.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/parallel_utils
copying vllm/model_executor/parallel_utils/communication_op.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/parallel_utils
copying vllm/model_executor/parallel_utils/custom_all_reduce.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/parallel_utils
copying vllm/model_executor/parallel_utils/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/parallel_utils
copying vllm/model_executor/parallel_utils/init.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/parallel_utils
creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
copying vllm/model_executor/layers/activation.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
copying vllm/model_executor/layers/fused_moe.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
copying vllm/model_executor/layers/attention.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
copying vllm/model_executor/layers/linear.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
copying vllm/model_executor/layers/rotary_embedding.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
copying vllm/model_executor/layers/vocab_parallel_embedding.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
copying vllm/model_executor/layers/sampler.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
copying vllm/model_executor/layers/layernorm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
copying vllm/model_executor/layers/init.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
copying vllm/model_executor/layers/rejection_sampler.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/triton_kernel
copying vllm/model_executor/layers/triton_kernel/prefix_prefill.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/triton_kernel
copying vllm/model_executor/layers/triton_kernel/init.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/triton_kernel
creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
copying vllm/model_executor/layers/quantization/awq.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
copying vllm/model_executor/layers/quantization/squeezellm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
copying vllm/model_executor/layers/quantization/gptq.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
copying vllm/model_executor/layers/quantization/base_config.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
copying vllm/model_executor/layers/quantization/init.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
creating build/lib.linux-x86_64-cpython-310/tests
creating build/lib.linux-x86_64-cpython-310/tests/lora
copying tests/lora/test_punica.py -> build/lib.linux-x86_64-cpython-310/tests/lora
copying tests/lora/test_lora.py -> build/lib.linux-x86_64-cpython-310/tests/lora
copying tests/lora/test_lora_manager.py -> build/lib.linux-x86_64-cpython-310/tests/lora
copying tests/lora/test_layers.py -> build/lib.linux-x86_64-cpython-310/tests/lora
copying tests/lora/conftest.py -> build/lib.linux-x86_64-cpython-310/tests/lora
copying tests/lora/test_llama.py -> build/lib.linux-x86_64-cpython-310/tests/lora
copying tests/lora/test_utils.py -> build/lib.linux-x86_64-cpython-310/tests/lora
copying tests/lora/test_tokenizer.py -> build/lib.linux-x86_64-cpython-310/tests/lora
copying tests/lora/utils.py -> build/lib.linux-x86_64-cpython-310/tests/lora
copying tests/lora/test_worker.py -> build/lib.linux-x86_64-cpython-310/tests/lora
copying tests/lora/init.py -> build/lib.linux-x86_64-cpython-310/tests/lora
creating build/lib.linux-x86_64-cpython-310/tests/worker
copying tests/worker/test_model_runner.py -> build/lib.linux-x86_64-cpython-310/tests/worker
copying tests/worker/init.py -> build/lib.linux-x86_64-cpython-310/tests/worker
creating build/lib.linux-x86_64-cpython-310/tests/worker/spec_decode
copying tests/worker/spec_decode/test_multi_step_worker.py -> build/lib.linux-x86_64-cpython-310/tests/worker/spec_decode
copying tests/worker/spec_decode/utils.py -> build/lib.linux-x86_64-cpython-310/tests/worker/spec_decode
copying tests/worker/spec_decode/init.py -> build/lib.linux-x86_64-cpython-310/tests/worker/spec_decode
copying vllm/py.typed -> build/lib.linux-x86_64-cpython-310/vllm
running build_ext
building 'vllm._C' extension
creating /vllm-0.3.0/build/temp.linux-x86_64-cpython-310
creating /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc
creating /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/attention
creating /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/quantization
creating /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/quantization/awq
creating /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/quantization/gptq
creating /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/quantization/squeezellm

Emitting ninja build file /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)

[1/12] /data01/rhino_xsg/software/cuda-12.1/bin/nvcc -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/TH -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/THC -I/data01/rhino_xsg/software/cuda-12.1/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include/python3.10 -c -c /vllm-0.3.0/csrc/cuda_utils_kernels.cu -o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/cuda_utils_kernels.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 --threads 8 -DENABLE_FP8_E5M2 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=C -D_GLIBCXX_USE_CXX11_ABI=0
[2/12] c++ -MMD -MF /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/pybind.o.d -pthread -B /data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include -fPIC -O2 -isystem /data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include -fPIC -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/TH -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/THC -I/data01/rhino_xsg/software/cuda-12.1/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include/python3.10 -c -c /vllm-0.3.0/csrc/pybind.cpp -o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/pybind.o -g -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=C -D_GLIBCXX_USE_CXX11_ABI=0
[3/12] /data01/rhino_xsg/software/cuda-12.1/bin/nvcc -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/TH -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/THC -I/data01/rhino_xsg/software/cuda-12.1/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include/python3.10 -c -c /vllm-0.3.0/csrc/layernorm_kernels.cu -o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/layernorm_kernels.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 --threads 8 -DENABLE_FP8_E5M2 -D__CUDA_NO_HALF_OPERATORS
-D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS
-D__CUDA_NO_HALF2_OPERATORS
--expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=C -D_GLIBCXX_USE_CXX11_ABI=0
[4/12] /data01/rhino_xsg/software/cuda-12.1/bin/nvcc -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/TH -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/THC -I/data01/rhino_xsg/software/cuda-12.1/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include/python3.10 -c -c /vllm-0.3.0/csrc/pos_encoding_kernels.cu -o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/pos_encoding_kernels.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 --threads 8 -DENABLE_FP8_E5M2 -D__CUDA_NO_HALF_OPERATORS
-D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=C -D_GLIBCXX_USE_CXX11_ABI=0
[5/12] /data01/rhino_xsg/software/cuda-12.1/bin/nvcc -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/TH -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/THC -I/data01/rhino_xsg/software/cuda-12.1/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include/python3.10 -c -c /vllm-0.3.0/csrc/quantization/squeezellm/quant_cuda_kernel.cu -o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/quantization/squeezellm/quant_cuda_kernel.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 --threads 8 -DENABLE_FP8_E5M2 -D__CUDA_NO_HALF_OPERATORS
-D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0
/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDATensorMethods.cuh: In member function 'T* at::Tensor::data() const [with T = __half]':
/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDATensorMethods.cuh:13:60: warning: 'T* at::Tensor::data() const [with T = c10::Half]' is deprecated: Tensor.data() is deprecated. Please use Tensor.data_ptr() instead. [-Wdeprecated-declarations]
return reinterpret_cast<__half*>(data());
^
/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
T * data() const {
^ ~~
/vllm-0.3.0/csrc/quantization/squeezellm/quant_cuda_kernel.cu: In function 'void squeezellm_gemm(at::Tensor, at::Tensor, at::Tensor, at::Tensor)':
/vllm-0.3.0/csrc/quantization/squeezellm/quant_cuda_kernel.cu:206:137: warning: 'T* at::Tensor::data() const [with T = c10::Half]' is deprecated: Tensor.data() is deprecated. Please use Tensor.data_ptr() instead. [-Wdeprecated-declarations]
vllm::squeezellm::NUQ4MatMulKernel<<<blocks, threads, 0, stream>>>(
^
/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
T * data() const {
^ ~~
/vllm-0.3.0/csrc/quantization/squeezellm/quant_cuda_kernel.cu:206:194: warning: 'T* at::Tensor::data() const [with T = c10::Half]' is deprecated: Tensor.data() is deprecated. Please use Tensor.data_ptr() instead. [-Wdeprecated-declarations]
vllm::squeezellm::NUQ4MatMulKernel<<<blocks, threads, 0, stream>>>(
^
/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
T * data() const {
^ ~~
/vllm-0.3.0/csrc/quantization/squeezellm/quant_cuda_kernel.cu:206:238: warning: 'T* at::Tensor::data() const [with T = c10::Half]' is deprecated: Tensor.data() is deprecated. Please use Tensor.data_ptr() instead. [-Wdeprecated-declarations]
vllm::squeezellm::NUQ4MatMulKernel<<<blocks, threads, 0, stream>>>(
^
/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
T * data() const {
^ ~~

[6/12] /data01/rhino_xsg/software/cuda-12.1/bin/nvcc -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/TH -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/THC -I/data01/rhino_xsg/software/cuda-12.1/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include/python3.10 -c -c /vllm-0.3.0/csrc/activation_kernels.cu -o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/activation_kernels.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 --threads 8 -DENABLE_FP8_E5M2 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=C -D_GLIBCXX_USE_CXX11_ABI=0
[7/12] /data01/rhino_xsg/software/cuda-12.1/bin/nvcc -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/TH -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/THC -I/data01/rhino_xsg/software/cuda-12.1/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include/python3.10 -c -c /vllm-0.3.0/csrc/moe_align_block_size_kernels.cu -o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/moe_align_block_size_kernels.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 --threads 8 -DENABLE_FP8_E5M2 -D__CUDA_NO_HALF_OPERATORS
-D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=C -D_GLIBCXX_USE_CXX11_ABI=0
[8/12] /data01/rhino_xsg/software/cuda-12.1/bin/nvcc -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/TH -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/THC -I/data01/rhino_xsg/software/cuda-12.1/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include/python3.10 -c -c /vllm-0.3.0/csrc/custom_all_reduce.cu -o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/custom_all_reduce.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 --threads 8 -DENABLE_FP8_E5M2 -D__CUDA_NO_HALF_OPERATORS
-D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=C -D_GLIBCXX_USE_CXX11_ABI=0
[9/12] /data01/rhino_xsg/software/cuda-12.1/bin/nvcc -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/TH -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/THC -I/data01/rhino_xsg/software/cuda-12.1/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include/python3.10 -c -c /vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu -o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/quantization/awq/gemm_kernels.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 --threads 8 -DENABLE_FP8_E5M2 -D__CUDA_NO_HALF_OPERATORS
-D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(40): warning #177-D: variable "scaling_factors_shared" was declared but never referenced
attribute((shared)) half scaling_factors_shared[128];
^
Remark: The warnings can be suppressed with "-diag-suppress "
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(41): warning #177-D: variable "zeros_shared" was declared but never referenced
attribute((shared)) half zeros_shared[128];
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(44): warning #177-D: variable "blockIdx_x" was declared but never referenced
int blockIdx_x = 0;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(58): warning #177-D: variable "ld_zero_flag" was declared but never referenced
bool ld_zero_flag = (threadIdx.y * 32 + threadIdx.x) * 8 < 128;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(272): warning #177-D: variable "scaling_factors_shared" was declared but never referenced
attribute((shared)) half scaling_factors_shared[64];
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(273): warning #177-D: variable "zeros_shared" was declared but never referenced
attribute((shared)) half zeros_shared[64];
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(277): warning #177-D: variable "blockIdx_x" was declared but never referenced
int blockIdx_x = 0;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(291): warning #177-D: variable "ld_zero_flag" was declared but never referenced
bool ld_zero_flag = (threadIdx.y * 32 + threadIdx.x) * 8 < 64;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(504): warning #177-D: variable "j_factors1" was declared but never referenced
int j_factors1 = 4;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(505): warning #177-D: variable "row_stride2" was declared but never referenced
int row_stride2 = 4;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(506): warning #177-D: variable "split_k_iters" was declared but never referenced
int split_k_iters = 1;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(512): warning #177-D: variable "B_shared_warp" was declared but never referenced
half B_shared_warp[32];
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(513): warning #177-D: variable "OC" was declared but never referenced
int OC = 512;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(24): warning #177-D: function "vllm::awq::__pack_half2" was declared but never referenced
__pack_half2(const half x, const half y) {
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(40): warning #177-D: variable "scaling_factors_shared" was declared but never referenced
attribute((shared)) half scaling_factors_shared[128];
^
Remark: The warnings can be suppressed with "-diag-suppress "
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(41): warning #177-D: variable "zeros_shared" was declared but never referenced
attribute((shared)) half zeros_shared[128];
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(44): warning #177-D: variable "blockIdx_x" was declared but never referenced
int blockIdx_x = 0;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(58): warning #177-D: variable "ld_zero_flag" was declared but never referenced
bool ld_zero_flag = (threadIdx.y * 32 + threadIdx.x) * 8 < 128;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(272): warning #177-D: variable "scaling_factors_shared" was declared but never referenced
attribute((shared)) half scaling_factors_shared[64];
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(273): warning #177-D: variable "zeros_shared" was declared but never referenced
attribute((shared)) half zeros_shared[64];
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(277): warning #177-D: variable "blockIdx_x" was declared but never referenced
int blockIdx_x = 0;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(291): warning #177-D: variable "ld_zero_flag" was declared but never referenced
bool ld_zero_flag = (threadIdx.y * 32 + threadIdx.x) * 8 < 64;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(504): warning #177-D: variable "j_factors1" was declared but never referenced
int j_factors1 = 4;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(505): warning #177-D: variable "row_stride2" was declared but never referenced
int row_stride2 = 4;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(506): warning #177-D: variable "split_k_iters" was declared but never referenced
int split_k_iters = 1;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(512): warning #177-D: variable "B_shared_warp" was declared but never referenced
half B_shared_warp[32];
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(513): warning #177-D: variable "OC" was declared but never referenced
int OC = 512;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(24): warning #177-D: function "vllm::awq::__pack_half2" was declared but never referenced
__pack_half2(const half x, const half y) {
^

[10/12] /data01/rhino_xsg/software/cuda-12.1/bin/nvcc -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/TH -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/THC -I/data01/rhino_xsg/software/cuda-12.1/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include/python3.10 -c -c /vllm-0.3.0/csrc/cache_kernels.cu -o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 --threads 8 -DENABLE_FP8_E5M2 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=C -D_GLIBCXX_USE_CXX11_ABI=0
/vllm-0.3.0/csrc/cache_kernels.cu(337): warning #550-D: variable "src_key_indices" was set but never used
int src_key_indices[unroll_factor];
^
Remark: The warnings can be suppressed with "-diag-suppress "
/vllm-0.3.0/csrc/cache_kernels.cu(338): warning #550-D: variable "src_value_indices" was set but never used
int src_value_indices[unroll_factor];
^
/vllm-0.3.0/csrc/cache_kernels.cu(337): warning #550-D: variable "src_key_indices" was set but never used
int src_key_indices[unroll_factor];
^
Remark: The warnings can be suppressed with "-diag-suppress "
/vllm-0.3.0/csrc/cache_kernels.cu(338): warning #550-D: variable "src_value_indices" was set but never used
int src_value_indices[unroll_factor];
^
[11/12] /data01/rhino_xsg/software/cuda-12.1/bin/nvcc -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/TH -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/THC -I/data01/rhino_xsg/software/cuda-12.1/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include/python3.10 -c -c /vllm-0.3.0/csrc/quantization/gptq/q_gemm.cu -o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/quantization/gptq/q_gemm.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 --threads 8 -DENABLE_FP8_E5M2 -D__CUDA_NO_HALF_OPERATORS
-D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=C -D_GLIBCXX_USE_CXX11_ABI=0
[12/12] /data01/rhino_xsg/software/cuda-12.1/bin/nvcc -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/TH -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/THC -I/data01/rhino_xsg/software/cuda-12.1/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include/python3.10 -c -c /vllm-0.3.0/csrc/attention/attention_kernels.cu -o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 --threads 8 -DENABLE_FP8_E5M2 -D__CUDA_NO_HALF_OPERATORS
-D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0

g++ -pthread -B /data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/compiler_compat -shared -Wl,-rpath,/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib -Wl,-rpath-link,/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib -L/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib -Wl,-rpath,/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib -Wl,-rpath-link,/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib -L/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/activation_kernels.o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/cuda_utils_kernels.o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/custom_all_reduce.o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/layernorm_kernels.o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/moe_align_block_size_kernels.o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/pos_encoding_kernels.o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/pybind.o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/quantization/awq/gemm_kernels.o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/quantization/gptq/q_gemm.o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/quantization/squeezellm/quant_cuda_kernel.o -L/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/lib -L/data01/rhino_xsg/software/cuda-12.1/lib64 -lcuda -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-310/vllm/_C.cpython-310-x86_64-linux-gnu.so

/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/compiler_compat/ld: cannot find -lcuda
collect2: error: ld returned 1 exit status
error: command '/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/bin/g++' failed with exit code 1

`

the g++ and gcc version is shown as follows:

`(py310_cu121_vllm) [vllm-0.3.0]$ gcc -v
Reading specs from /data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/bin/../lib/gcc/x86_64-conda-linux-gnu/8.5.0/specs
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/bin/../libexec/gcc/x86_64-conda-linux-gnu/8.5.0/lto-wrapper
Target: x86_64-conda-linux-gnu
Configured with: ../configure --prefix=/home/conda/feedstock_root/build_artifacts/gcc_compilers_1634095555540/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho --with-slibdir=/home/conda/feedstock_root/build_artifacts/gcc_compilers_1634095555540/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib --libdir=/home/conda/feedstock_root/build_artifacts/gcc_compilers_1634095555540/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib --build=x86_64-conda-linux-gnu --host=x86_64-conda-linux-gnu --target=x86_64-conda-linux-gnu --enable-default-pie --enable-languages=c,c++,fortran,objc,obj-c++ --enable-__cxa_atexit --disable-libmudflap --enable-libgomp --disable-libssp --enable-libquadmath --enable-libquadmath-support --enable-libsanitizer --enable-lto --enable-threads=posix --enable-target-optspace --enable-plugin --enable-gold --disable-nls --disable-bootstrap --disable-multilib --enable-long-long --enable-default-pie --with-sysroot=/home/conda/feedstock_root/build_artifacts/gcc_compilers_1634095555540/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/x86_64-conda-linux-gnu/sysroot --with-build-sysroot=/home/conda/feedstock_root/build_artifacts/gcc_compilers_1634095555540/_build_env/x86_64-conda-linux-gnu/sysroot --with-gxx-include-dir=/home/conda/feedstock_root/build_artifacts/gcc_compilers_1634095555540/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/x86_64-conda-linux-gnu/include/c++/8.5.0
Thread model: posix
gcc version 8.5.0 (GCC)

(py310_cu121_vllm) [ vllm-0.3.0]$ g++ -v
Reading specs from /data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/bin/../lib/gcc/x86_64-conda-linux-gnu/8.5.0/specs
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/bin/../libexec/gcc/x86_64-conda-linux-gnu/8.5.0/lto-wrapper
Target: x86_64-conda-linux-gnu
Configured with: ../configure --prefix=/home/conda/feedstock_root/build_artifacts/gcc_compilers_1634095555540/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho --with-slibdir=/home/conda/feedstock_root/build_artifacts/gcc_compilers_1634095555540/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib --libdir=/home/conda/feedstock_root/build_artifacts/gcc_compilers_1634095555540/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib --build=x86_64-conda-linux-gnu --host=x86_64-conda-linux-gnu --target=x86_64-conda-linux-gnu --enable-default-pie --enable-languages=c,c++,fortran,objc,obj-c++ --enable-__cxa_atexit --disable-libmudflap --enable-libgomp --disable-libssp --enable-libquadmath --enable-libquadmath-support --enable-libsanitizer --enable-lto --enable-threads=posix --enable-target-optspace --enable-plugin --enable-gold --disable-nls --disable-bootstrap --disable-multilib --enable-long-long --enable-default-pie --with-sysroot=/home/conda/feedstock_root/build_artifacts/gcc_compilers_1634095555540/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/x86_64-conda-linux-gnu/sysroot --with-build-sysroot=/home/conda/feedstock_root/build_artifacts/gcc_compilers_1634095555540/_build_env/x86_64-conda-linux-gnu/sysroot --with-gxx-include-dir=/home/conda/feedstock_root/build_artifacts/gcc_compilers_1634095555540/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/x86_64-conda-linux-gnu/include/c++/8.5.0
Thread model: posix
gcc version 8.5.0 (GCC) `

@leonardxie
Copy link
Author

@simon-mo

@simon-mo
Copy link
Collaborator

simon-mo commented Feb 2, 2024

Sorry I don't have specific expertise in debugging this environment issue. This looks like somehow the CUDA toolchain doesn't have the shared object libraries. Here's a list of helpful tips I think that's useful from ChatGPT. https://chat.openai.com/share/ea2e4e07-fb33-45f1-9b66-0ac976c3bb96


The error message you're encountering indicates that the linker (ld) cannot find the CUDA library (-lcuda). This usually means that the CUDA installation is not correctly recognized by your environment, or the necessary CUDA library files are not located in the directories where the linker expects them. Here are a few steps you can take to troubleshoot and possibly resolve this issue:

  1. Verify CUDA Installation: First, ensure that CUDA is correctly installed on your system. You can do this by running nvcc --version or cuda-installation-check (the specific command may vary based on your system) in the terminal to check if CUDA is recognized and to report the installed version.

  2. Check CUDA Library Path: The linker error specifically mentions that it cannot find -lcuda, which means it is looking for libcuda.so (or a similar library file) and cannot find it. Verify that this file exists in your CUDA installation directory, typically under /usr/local/cuda/lib64 or a similar path.

  3. Set Library Path Environment Variables: If the CUDA library is not in a standard location where the linker automatically searches, you may need to explicitly tell the linker where to find it. You can do this by setting the LD_LIBRARY_PATH environment variable to include the directory where libcuda.so is located. For example:

    export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

    Add this line to your .bashrc or .bash_profile (or the equivalent file for your shell) to make the change permanent.

  4. Update the Compiler Flags: If setting the environment variable doesn't solve the issue, you may need to manually specify the path to the CUDA libraries in your build command or makefile. This can often be done by adding -L/path/to/cuda/lib64 to the compiler flags to specify the directory and -lcuda to link against the CUDA library.

  5. Ensure CUDA Compatibility with Your Compiler: The error message mentions that it's using a specific version of g++ from your Anaconda environment. Make sure that this compiler version is compatible with the version of CUDA you are using. CUDA documentation typically lists compatible GCC versions.

  6. Check for Multiple CUDA Versions: If you have multiple versions of CUDA installed, there could be conflicts. Ensure that your environment variables (PATH, LD_LIBRARY_PATH, CUDA_HOME, etc.) are all pointing to the same CUDA version that you intend to use.

  7. Reconfigure and Rebuild: After making changes to your environment or installation, it's often a good idea to clean any existing build files and reconfigure your build system before attempting to compile again.

If you continue to encounter issues after trying these steps, please provide additional details about your environment (e.g., the version of CUDA installed, the operating system you're using, and the specific commands you're running that lead to this error) for more tailored advice.

@leonardxie
Copy link
Author

thanks, it works

@ronensc
Copy link
Contributor

ronensc commented Feb 6, 2024

I encountered the same error. In my case, libcuda.so is located at /usr/local/cuda/lib64/stubs/, though I'm not sure why.
In my case, exporting the following environment variable resolved the issue for me:

export LIBRARY_PATH="/usr/local/cuda/lib64/stubs:$LIBRARY_PATH"

@HuiyuanYan
Copy link

I encountered the same error. In my case, libcuda.so is located at /usr/local/cuda/lib64/stubs/, though I'm not sure why. In my case, exporting the following environment variable resolved the issue for me:

export LIBRARY_PATH="/usr/local/cuda/lib64/stubs:$LIBRARY_PATH"

Thx a lot. It also works for me. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants