-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vllm版本为0.6.3 报错TypeError: Unexpected keyword argument 'use_beam_search' #5966
Comments
fixed |
大佬威武 |
[INFO|tokenization_utils_base.py:2470] 2024-11-11 10:50:51,372 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 报错: 解释: |
Reminder
System Info
llamafactory
version: 0.9.1.dev0Reproduction
CUDA_VISIBLE_DEVICES=0,1 python cli.py chat ../examples/inference/qwen2-0.5.yaml
yaml文件内容
model_name_or_path: /root/bei/Models/qwen/Qwen2-0___5B-Instruct/
template: qwen
infer_backend: vllm
vllm_enforce_eager: true
vllm_gpu_util: 0.8
报错信息如下
Welcome to the CLI application, use
clear
to remove the history, useexit
to exit the application.User: 你好
Assistant: [rank0]: Traceback (most recent call last):
[rank0]: File "/data/bei/LLaMA-Factory/src/cli_bei.py", line 124, in
[rank0]: main()
[rank0]: File "/data/bei/LLaMA-Factory/src/cli_bei.py", line 81, in main
[rank0]: run_chat()
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 185, in run_chat
[rank0]: for new_text in chat_model.stream_chat(messages):
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 110, in stream_chat
[rank0]: yield task.result()
[rank0]: File "/root/miniconda/envs/bei_llamaFactory/lib/python3.10/concurrent/futures/_base.py", line 458, in result
[rank0]: return self.__get_result()
[rank0]: File "/root/miniconda/envs/bei_llamaFactory/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
[rank0]: raise self._exception
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 126, in astream_chat
[rank0]: async for new_token in self.engine.stream_chat(messages, system, tools, images, videos, **input_kwargs):
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/vllm_engine.py", line 222, in stream_chat
[rank0]: generator = await self._generate(messages, system, tools, images, videos, **input_kwargs)
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/vllm_engine.py", line 143, in _generate
[rank0]: sampling_params = SamplingParams(
[rank0]: TypeError: Unexpected keyword argument 'use_beam_search'
[rank0]:[W1108 18:04:44.762380968 CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
/root/miniconda/envs/bei_llamaFactory/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
/root/miniconda/envs/bei_llamaFactory/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Expected behavior
可以完成正常推理
Others
No response
The text was updated successfully, but these errors were encountered: