-
-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
instal from sourece error: compiler_compat/ld: cannot find -lcuda #2720
Comments
Sorry I don't have specific expertise in debugging this environment issue. This looks like somehow the CUDA toolchain doesn't have the shared object libraries. Here's a list of helpful tips I think that's useful from ChatGPT. https://chat.openai.com/share/ea2e4e07-fb33-45f1-9b66-0ac976c3bb96 The error message you're encountering indicates that the linker (
If you continue to encounter issues after trying these steps, please provide additional details about your environment (e.g., the version of CUDA installed, the operating system you're using, and the specific commands you're running that lead to this error) for more tailored advice. |
thanks, it works |
I encountered the same error. In my case,
|
Thx a lot. It also works for me. :) |
Here are the errors. l failed to install the vllm0.3.0 from source. CUDA==12.1
the install log is shown belown
`(py310_cu121_vllm) [ vllm-0.3.0]$ proxychains4 python setup.py install
running install
/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:66:
running bdist_egg
running egg_info
creating vllm.egg-info
writing vllm.egg-info/PKG-INFO
writing dependency_links to vllm.egg-info/dependency_links.txt
writing requirements to vllm.egg-info/requires.txt
writing top-level names to vllm.egg-info/top_level.txt
writing manifest file 'vllm.egg-info/SOURCES.txt'
reading manifest file 'vllm.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file 'vllm.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib.linux-x86_64-cpython-310
creating build/lib.linux-x86_64-cpython-310/vllm
copying vllm/block.py -> build/lib.linux-x86_64-cpython-310/vllm
copying vllm/outputs.py -> build/lib.linux-x86_64-cpython-310/vllm
copying vllm/sampling_params.py -> build/lib.linux-x86_64-cpython-310/vllm
copying vllm/config.py -> build/lib.linux-x86_64-cpython-310/vllm
copying vllm/test_utils.py -> build/lib.linux-x86_64-cpython-310/vllm
copying vllm/utils.py -> build/lib.linux-x86_64-cpython-310/vllm
copying vllm/logger.py -> build/lib.linux-x86_64-cpython-310/vllm
copying vllm/prefix.py -> build/lib.linux-x86_64-cpython-310/vllm
copying vllm/sequence.py -> build/lib.linux-x86_64-cpython-310/vllm
copying vllm/init.py -> build/lib.linux-x86_64-cpython-310/vllm
creating build/lib.linux-x86_64-cpython-310/vllm/transformers_utils
copying vllm/transformers_utils/tokenizer.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils
copying vllm/transformers_utils/config.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils
copying vllm/transformers_utils/init.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils
creating build/lib.linux-x86_64-cpython-310/vllm/entrypoints
copying vllm/entrypoints/llm.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints
copying vllm/entrypoints/api_server.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints
copying vllm/entrypoints/init.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints
creating build/lib.linux-x86_64-cpython-310/vllm/core
copying vllm/core/scheduler.py -> build/lib.linux-x86_64-cpython-310/vllm/core
copying vllm/core/init.py -> build/lib.linux-x86_64-cpython-310/vllm/core
copying vllm/core/block_manager.py -> build/lib.linux-x86_64-cpython-310/vllm/core
copying vllm/core/policy.py -> build/lib.linux-x86_64-cpython-310/vllm/core
creating build/lib.linux-x86_64-cpython-310/vllm/lora
copying vllm/lora/models.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
copying vllm/lora/lora.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
copying vllm/lora/request.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
copying vllm/lora/layers.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
copying vllm/lora/punica.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
copying vllm/lora/worker_manager.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
copying vllm/lora/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
copying vllm/lora/init.py -> build/lib.linux-x86_64-cpython-310/vllm/lora
creating build/lib.linux-x86_64-cpython-310/vllm/worker
copying vllm/worker/model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
copying vllm/worker/cache_engine.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
copying vllm/worker/worker.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
copying vllm/worker/init.py -> build/lib.linux-x86_64-cpython-310/vllm/worker
creating build/lib.linux-x86_64-cpython-310/vllm/engine
copying vllm/engine/llm_engine.py -> build/lib.linux-x86_64-cpython-310/vllm/engine
copying vllm/engine/async_llm_engine.py -> build/lib.linux-x86_64-cpython-310/vllm/engine
copying vllm/engine/metrics.py -> build/lib.linux-x86_64-cpython-310/vllm/engine
copying vllm/engine/ray_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/engine
copying vllm/engine/init.py -> build/lib.linux-x86_64-cpython-310/vllm/engine
copying vllm/engine/arg_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/engine
creating build/lib.linux-x86_64-cpython-310/vllm/model_executor
copying vllm/model_executor/model_loader.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor
copying vllm/model_executor/sampling_metadata.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor
copying vllm/model_executor/weight_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor
copying vllm/model_executor/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor
copying vllm/model_executor/init.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor
copying vllm/model_executor/input_metadata.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor
creating build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
copying vllm/transformers_utils/configs/mpt.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
copying vllm/transformers_utils/configs/aquila.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
copying vllm/transformers_utils/configs/falcon.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
copying vllm/transformers_utils/configs/chatglm.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
copying vllm/transformers_utils/configs/qwen.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
copying vllm/transformers_utils/configs/yi.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
copying vllm/transformers_utils/configs/init.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
copying vllm/transformers_utils/configs/baichuan.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs
creating build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizers
copying vllm/transformers_utils/tokenizers/init.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizers
copying vllm/transformers_utils/tokenizers/baichuan.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizers
creating build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
copying vllm/entrypoints/openai/serving_chat.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
copying vllm/entrypoints/openai/serving_completion.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
copying vllm/entrypoints/openai/api_server.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
copying vllm/entrypoints/openai/serving_engine.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
copying vllm/entrypoints/openai/protocol.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
copying vllm/entrypoints/openai/init.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai
creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/deepseek.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/internlm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/gpt_j.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/gpt_neox.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/gpt2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/mpt.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/stablelm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/aquila.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/falcon.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/qwen2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/llama.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/phi.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/chatglm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/bloom.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/gpt_bigcode.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/decilm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/opt.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/mixtral.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/qwen.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/mistral.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/yi.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/init.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/mixtral_quant.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
copying vllm/model_executor/models/baichuan.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models
creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/parallel_utils
copying vllm/model_executor/parallel_utils/parallel_state.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/parallel_utils
copying vllm/model_executor/parallel_utils/communication_op.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/parallel_utils
copying vllm/model_executor/parallel_utils/custom_all_reduce.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/parallel_utils
copying vllm/model_executor/parallel_utils/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/parallel_utils
copying vllm/model_executor/parallel_utils/init.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/parallel_utils
creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
copying vllm/model_executor/layers/activation.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
copying vllm/model_executor/layers/fused_moe.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
copying vllm/model_executor/layers/attention.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
copying vllm/model_executor/layers/linear.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
copying vllm/model_executor/layers/rotary_embedding.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
copying vllm/model_executor/layers/vocab_parallel_embedding.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
copying vllm/model_executor/layers/sampler.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
copying vllm/model_executor/layers/layernorm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
copying vllm/model_executor/layers/init.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
copying vllm/model_executor/layers/rejection_sampler.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers
creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/triton_kernel
copying vllm/model_executor/layers/triton_kernel/prefix_prefill.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/triton_kernel
copying vllm/model_executor/layers/triton_kernel/init.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/triton_kernel
creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
copying vllm/model_executor/layers/quantization/awq.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
copying vllm/model_executor/layers/quantization/squeezellm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
copying vllm/model_executor/layers/quantization/gptq.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
copying vllm/model_executor/layers/quantization/base_config.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
copying vllm/model_executor/layers/quantization/init.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization
creating build/lib.linux-x86_64-cpython-310/tests
creating build/lib.linux-x86_64-cpython-310/tests/lora
copying tests/lora/test_punica.py -> build/lib.linux-x86_64-cpython-310/tests/lora
copying tests/lora/test_lora.py -> build/lib.linux-x86_64-cpython-310/tests/lora
copying tests/lora/test_lora_manager.py -> build/lib.linux-x86_64-cpython-310/tests/lora
copying tests/lora/test_layers.py -> build/lib.linux-x86_64-cpython-310/tests/lora
copying tests/lora/conftest.py -> build/lib.linux-x86_64-cpython-310/tests/lora
copying tests/lora/test_llama.py -> build/lib.linux-x86_64-cpython-310/tests/lora
copying tests/lora/test_utils.py -> build/lib.linux-x86_64-cpython-310/tests/lora
copying tests/lora/test_tokenizer.py -> build/lib.linux-x86_64-cpython-310/tests/lora
copying tests/lora/utils.py -> build/lib.linux-x86_64-cpython-310/tests/lora
copying tests/lora/test_worker.py -> build/lib.linux-x86_64-cpython-310/tests/lora
copying tests/lora/init.py -> build/lib.linux-x86_64-cpython-310/tests/lora
creating build/lib.linux-x86_64-cpython-310/tests/worker
copying tests/worker/test_model_runner.py -> build/lib.linux-x86_64-cpython-310/tests/worker
copying tests/worker/init.py -> build/lib.linux-x86_64-cpython-310/tests/worker
creating build/lib.linux-x86_64-cpython-310/tests/worker/spec_decode
copying tests/worker/spec_decode/test_multi_step_worker.py -> build/lib.linux-x86_64-cpython-310/tests/worker/spec_decode
copying tests/worker/spec_decode/utils.py -> build/lib.linux-x86_64-cpython-310/tests/worker/spec_decode
copying tests/worker/spec_decode/init.py -> build/lib.linux-x86_64-cpython-310/tests/worker/spec_decode
copying vllm/py.typed -> build/lib.linux-x86_64-cpython-310/vllm
running build_ext
building 'vllm._C' extension
creating /vllm-0.3.0/build/temp.linux-x86_64-cpython-310
creating /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc
creating /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/attention
creating /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/quantization
creating /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/quantization/awq
creating /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/quantization/gptq
creating /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/quantization/squeezellm
Emitting ninja build file /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/12] /data01/rhino_xsg/software/cuda-12.1/bin/nvcc -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/TH -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/THC -I/data01/rhino_xsg/software/cuda-12.1/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include/python3.10 -c -c /vllm-0.3.0/csrc/cuda_utils_kernels.cu -o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/cuda_utils_kernels.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 --threads 8 -DENABLE_FP8_E5M2 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=C -D_GLIBCXX_USE_CXX11_ABI=0
[2/12] c++ -MMD -MF /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/pybind.o.d -pthread -B /data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include -fPIC -O2 -isystem /data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include -fPIC -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/TH -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/THC -I/data01/rhino_xsg/software/cuda-12.1/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include/python3.10 -c -c /vllm-0.3.0/csrc/pybind.cpp -o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/pybind.o -g -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=C -D_GLIBCXX_USE_CXX11_ABI=0
[3/12] /data01/rhino_xsg/software/cuda-12.1/bin/nvcc -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/TH -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/THC -I/data01/rhino_xsg/software/cuda-12.1/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include/python3.10 -c -c /vllm-0.3.0/csrc/layernorm_kernels.cu -o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/layernorm_kernels.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 --threads 8 -DENABLE_FP8_E5M2 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS -D__CUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=C -D_GLIBCXX_USE_CXX11_ABI=0
[4/12] /data01/rhino_xsg/software/cuda-12.1/bin/nvcc -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/TH -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/THC -I/data01/rhino_xsg/software/cuda-12.1/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include/python3.10 -c -c /vllm-0.3.0/csrc/pos_encoding_kernels.cu -o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/pos_encoding_kernels.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 --threads 8 -DENABLE_FP8_E5M2 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=C -D_GLIBCXX_USE_CXX11_ABI=0
[5/12] /data01/rhino_xsg/software/cuda-12.1/bin/nvcc -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/TH -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/THC -I/data01/rhino_xsg/software/cuda-12.1/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include/python3.10 -c -c /vllm-0.3.0/csrc/quantization/squeezellm/quant_cuda_kernel.cu -o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/quantization/squeezellm/quant_cuda_kernel.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 --threads 8 -DENABLE_FP8_E5M2 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0
/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDATensorMethods.cuh: In member function 'T* at::Tensor::data() const [with T = __half]':
/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDATensorMethods.cuh:13:60: warning: 'T* at::Tensor::data() const [with T = c10::Half]' is deprecated: Tensor.data() is deprecated. Please use Tensor.data_ptr() instead. [-Wdeprecated-declarations]
return reinterpret_cast<__half*>(data());
^
/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
T * data() const {
^ ~~
/vllm-0.3.0/csrc/quantization/squeezellm/quant_cuda_kernel.cu: In function 'void squeezellm_gemm(at::Tensor, at::Tensor, at::Tensor, at::Tensor)':
/vllm-0.3.0/csrc/quantization/squeezellm/quant_cuda_kernel.cu:206:137: warning: 'T* at::Tensor::data() const [with T = c10::Half]' is deprecated: Tensor.data() is deprecated. Please use Tensor.data_ptr() instead. [-Wdeprecated-declarations]
vllm::squeezellm::NUQ4MatMulKernel<<<blocks, threads, 0, stream>>>(
^
/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
T * data() const {
^ ~~
/vllm-0.3.0/csrc/quantization/squeezellm/quant_cuda_kernel.cu:206:194: warning: 'T* at::Tensor::data() const [with T = c10::Half]' is deprecated: Tensor.data() is deprecated. Please use Tensor.data_ptr() instead. [-Wdeprecated-declarations]
vllm::squeezellm::NUQ4MatMulKernel<<<blocks, threads, 0, stream>>>(
^
/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
T * data() const {
^ ~~
/vllm-0.3.0/csrc/quantization/squeezellm/quant_cuda_kernel.cu:206:238: warning: 'T* at::Tensor::data() const [with T = c10::Half]' is deprecated: Tensor.data() is deprecated. Please use Tensor.data_ptr() instead. [-Wdeprecated-declarations]
vllm::squeezellm::NUQ4MatMulKernel<<<blocks, threads, 0, stream>>>(
^
/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
T * data() const {
^ ~~
[6/12] /data01/rhino_xsg/software/cuda-12.1/bin/nvcc -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/TH -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/THC -I/data01/rhino_xsg/software/cuda-12.1/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include/python3.10 -c -c /vllm-0.3.0/csrc/activation_kernels.cu -o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/activation_kernels.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 --threads 8 -DENABLE_FP8_E5M2 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=C -D_GLIBCXX_USE_CXX11_ABI=0
[7/12] /data01/rhino_xsg/software/cuda-12.1/bin/nvcc -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/TH -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/THC -I/data01/rhino_xsg/software/cuda-12.1/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include/python3.10 -c -c /vllm-0.3.0/csrc/moe_align_block_size_kernels.cu -o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/moe_align_block_size_kernels.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 --threads 8 -DENABLE_FP8_E5M2 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=C -D_GLIBCXX_USE_CXX11_ABI=0
[8/12] /data01/rhino_xsg/software/cuda-12.1/bin/nvcc -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/TH -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/THC -I/data01/rhino_xsg/software/cuda-12.1/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include/python3.10 -c -c /vllm-0.3.0/csrc/custom_all_reduce.cu -o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/custom_all_reduce.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 --threads 8 -DENABLE_FP8_E5M2 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=C -D_GLIBCXX_USE_CXX11_ABI=0
[9/12] /data01/rhino_xsg/software/cuda-12.1/bin/nvcc -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/TH -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/THC -I/data01/rhino_xsg/software/cuda-12.1/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include/python3.10 -c -c /vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu -o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/quantization/awq/gemm_kernels.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 --threads 8 -DENABLE_FP8_E5M2 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(40): warning #177-D: variable "scaling_factors_shared" was declared but never referenced
attribute((shared)) half scaling_factors_shared[128];
^
Remark: The warnings can be suppressed with "-diag-suppress "
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(41): warning #177-D: variable "zeros_shared" was declared but never referenced
attribute((shared)) half zeros_shared[128];
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(44): warning #177-D: variable "blockIdx_x" was declared but never referenced
int blockIdx_x = 0;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(58): warning #177-D: variable "ld_zero_flag" was declared but never referenced
bool ld_zero_flag = (threadIdx.y * 32 + threadIdx.x) * 8 < 128;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(272): warning #177-D: variable "scaling_factors_shared" was declared but never referenced
attribute((shared)) half scaling_factors_shared[64];
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(273): warning #177-D: variable "zeros_shared" was declared but never referenced
attribute((shared)) half zeros_shared[64];
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(277): warning #177-D: variable "blockIdx_x" was declared but never referenced
int blockIdx_x = 0;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(291): warning #177-D: variable "ld_zero_flag" was declared but never referenced
bool ld_zero_flag = (threadIdx.y * 32 + threadIdx.x) * 8 < 64;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(504): warning #177-D: variable "j_factors1" was declared but never referenced
int j_factors1 = 4;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(505): warning #177-D: variable "row_stride2" was declared but never referenced
int row_stride2 = 4;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(506): warning #177-D: variable "split_k_iters" was declared but never referenced
int split_k_iters = 1;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(512): warning #177-D: variable "B_shared_warp" was declared but never referenced
half B_shared_warp[32];
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(513): warning #177-D: variable "OC" was declared but never referenced
int OC = 512;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(24): warning #177-D: function "vllm::awq::__pack_half2" was declared but never referenced
__pack_half2(const half x, const half y) {
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(40): warning #177-D: variable "scaling_factors_shared" was declared but never referenced
attribute((shared)) half scaling_factors_shared[128];
^
Remark: The warnings can be suppressed with "-diag-suppress "
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(41): warning #177-D: variable "zeros_shared" was declared but never referenced
attribute((shared)) half zeros_shared[128];
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(44): warning #177-D: variable "blockIdx_x" was declared but never referenced
int blockIdx_x = 0;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(58): warning #177-D: variable "ld_zero_flag" was declared but never referenced
bool ld_zero_flag = (threadIdx.y * 32 + threadIdx.x) * 8 < 128;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(272): warning #177-D: variable "scaling_factors_shared" was declared but never referenced
attribute((shared)) half scaling_factors_shared[64];
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(273): warning #177-D: variable "zeros_shared" was declared but never referenced
attribute((shared)) half zeros_shared[64];
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(277): warning #177-D: variable "blockIdx_x" was declared but never referenced
int blockIdx_x = 0;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(291): warning #177-D: variable "ld_zero_flag" was declared but never referenced
bool ld_zero_flag = (threadIdx.y * 32 + threadIdx.x) * 8 < 64;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(504): warning #177-D: variable "j_factors1" was declared but never referenced
int j_factors1 = 4;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(505): warning #177-D: variable "row_stride2" was declared but never referenced
int row_stride2 = 4;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(506): warning #177-D: variable "split_k_iters" was declared but never referenced
int split_k_iters = 1;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(512): warning #177-D: variable "B_shared_warp" was declared but never referenced
half B_shared_warp[32];
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(513): warning #177-D: variable "OC" was declared but never referenced
int OC = 512;
^
/vllm-0.3.0/csrc/quantization/awq/gemm_kernels.cu(24): warning #177-D: function "vllm::awq::__pack_half2" was declared but never referenced
__pack_half2(const half x, const half y) {
^
[10/12] /data01/rhino_xsg/software/cuda-12.1/bin/nvcc -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/TH -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/THC -I/data01/rhino_xsg/software/cuda-12.1/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include/python3.10 -c -c /vllm-0.3.0/csrc/cache_kernels.cu -o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 --threads 8 -DENABLE_FP8_E5M2 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=C -D_GLIBCXX_USE_CXX11_ABI=0
/vllm-0.3.0/csrc/cache_kernels.cu(337): warning #550-D: variable "src_key_indices" was set but never used
int src_key_indices[unroll_factor];
^
Remark: The warnings can be suppressed with "-diag-suppress "
/vllm-0.3.0/csrc/cache_kernels.cu(338): warning #550-D: variable "src_value_indices" was set but never used
int src_value_indices[unroll_factor];
^
/vllm-0.3.0/csrc/cache_kernels.cu(337): warning #550-D: variable "src_key_indices" was set but never used
int src_key_indices[unroll_factor];
^
Remark: The warnings can be suppressed with "-diag-suppress "
/vllm-0.3.0/csrc/cache_kernels.cu(338): warning #550-D: variable "src_value_indices" was set but never used
int src_value_indices[unroll_factor];
^
[11/12] /data01/rhino_xsg/software/cuda-12.1/bin/nvcc -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/TH -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/THC -I/data01/rhino_xsg/software/cuda-12.1/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include/python3.10 -c -c /vllm-0.3.0/csrc/quantization/gptq/q_gemm.cu -o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/quantization/gptq/q_gemm.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 --threads 8 -DENABLE_FP8_E5M2 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=C -D_GLIBCXX_USE_CXX11_ABI=0
[12/12] /data01/rhino_xsg/software/cuda-12.1/bin/nvcc -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/TH -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/include/THC -I/data01/rhino_xsg/software/cuda-12.1/include -I/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/include/python3.10 -c -c /vllm-0.3.0/csrc/attention/attention_kernels.cu -o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O2 -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 --threads 8 -DENABLE_FP8_E5M2 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0
g++ -pthread -B /data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/compiler_compat -shared -Wl,-rpath,/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib -Wl,-rpath-link,/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib -L/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib -Wl,-rpath,/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib -Wl,-rpath-link,/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib -L/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/activation_kernels.o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/cuda_utils_kernels.o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/custom_all_reduce.o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/layernorm_kernels.o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/moe_align_block_size_kernels.o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/pos_encoding_kernels.o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/pybind.o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/quantization/awq/gemm_kernels.o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/quantization/gptq/q_gemm.o /vllm-0.3.0/build/temp.linux-x86_64-cpython-310/csrc/quantization/squeezellm/quant_cuda_kernel.o -L/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/lib/python3.10/site-packages/torch/lib -L/data01/rhino_xsg/software/cuda-12.1/lib64 -lcuda -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-310/vllm/_C.cpython-310-x86_64-linux-gnu.so
/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/compiler_compat/ld: cannot find -lcuda
collect2: error: ld returned 1 exit status
error: command '/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/bin/g++' failed with exit code 1
`
the g++ and gcc version is shown as follows:
`(py310_cu121_vllm) [vllm-0.3.0]$ gcc -v
Reading specs from /data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/bin/../lib/gcc/x86_64-conda-linux-gnu/8.5.0/specs
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/bin/../libexec/gcc/x86_64-conda-linux-gnu/8.5.0/lto-wrapper
Target: x86_64-conda-linux-gnu
Configured with: ../configure --prefix=/home/conda/feedstock_root/build_artifacts/gcc_compilers_1634095555540/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho --with-slibdir=/home/conda/feedstock_root/build_artifacts/gcc_compilers_1634095555540/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib --libdir=/home/conda/feedstock_root/build_artifacts/gcc_compilers_1634095555540/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib --build=x86_64-conda-linux-gnu --host=x86_64-conda-linux-gnu --target=x86_64-conda-linux-gnu --enable-default-pie --enable-languages=c,c++,fortran,objc,obj-c++ --enable-__cxa_atexit --disable-libmudflap --enable-libgomp --disable-libssp --enable-libquadmath --enable-libquadmath-support --enable-libsanitizer --enable-lto --enable-threads=posix --enable-target-optspace --enable-plugin --enable-gold --disable-nls --disable-bootstrap --disable-multilib --enable-long-long --enable-default-pie --with-sysroot=/home/conda/feedstock_root/build_artifacts/gcc_compilers_1634095555540/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/x86_64-conda-linux-gnu/sysroot --with-build-sysroot=/home/conda/feedstock_root/build_artifacts/gcc_compilers_1634095555540/_build_env/x86_64-conda-linux-gnu/sysroot --with-gxx-include-dir=/home/conda/feedstock_root/build_artifacts/gcc_compilers_1634095555540/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/x86_64-conda-linux-gnu/include/c++/8.5.0
Thread model: posix
gcc version 8.5.0 (GCC)
(py310_cu121_vllm) [ vllm-0.3.0]$ g++ -v
Reading specs from /data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/bin/../lib/gcc/x86_64-conda-linux-gnu/8.5.0/specs
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/data01/rhino_xsg/software/anaconda3/envs/py310_cu121_vllm/bin/../libexec/gcc/x86_64-conda-linux-gnu/8.5.0/lto-wrapper
Target: x86_64-conda-linux-gnu
Configured with: ../configure --prefix=/home/conda/feedstock_root/build_artifacts/gcc_compilers_1634095555540/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho --with-slibdir=/home/conda/feedstock_root/build_artifacts/gcc_compilers_1634095555540/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib --libdir=/home/conda/feedstock_root/build_artifacts/gcc_compilers_1634095555540/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib --build=x86_64-conda-linux-gnu --host=x86_64-conda-linux-gnu --target=x86_64-conda-linux-gnu --enable-default-pie --enable-languages=c,c++,fortran,objc,obj-c++ --enable-__cxa_atexit --disable-libmudflap --enable-libgomp --disable-libssp --enable-libquadmath --enable-libquadmath-support --enable-libsanitizer --enable-lto --enable-threads=posix --enable-target-optspace --enable-plugin --enable-gold --disable-nls --disable-bootstrap --disable-multilib --enable-long-long --enable-default-pie --with-sysroot=/home/conda/feedstock_root/build_artifacts/gcc_compilers_1634095555540/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/x86_64-conda-linux-gnu/sysroot --with-build-sysroot=/home/conda/feedstock_root/build_artifacts/gcc_compilers_1634095555540/_build_env/x86_64-conda-linux-gnu/sysroot --with-gxx-include-dir=/home/conda/feedstock_root/build_artifacts/gcc_compilers_1634095555540/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/x86_64-conda-linux-gnu/include/c++/8.5.0
Thread model: posix
gcc version 8.5.0 (GCC) `
The text was updated successfully, but these errors were encountered: