vllm版本为0.6.3 报错TypeError: Unexpected keyword argument 'use_beam_search' #5966

sunbeibei-hub · 2024-11-08T10:14:11Z

Reminder

I have read the README and searched the existing issues.

System Info

llamafactory version: 0.9.1.dev0
Platform: Linux-6.5.0-35-generic-x86_64-with-glibc2.35
Python version: 3.10.15
PyTorch version: 2.4.0+cu121 (GPU)
Transformers version: 4.45.2
Datasets version: 3.1.0
Accelerate version: 1.0.1
PEFT version: 0.12.0
TRL version: 0.9.6
GPU type: NVIDIA A800-SXM4-80GB
vLLM version: dev

Reproduction

CUDA_VISIBLE_DEVICES=0,1 python cli.py chat ../examples/inference/qwen2-0.5.yaml

yaml文件内容
model_name_or_path: /root/bei/Models/qwen/Qwen2-0___5B-Instruct/
template: qwen
infer_backend: vllm
vllm_enforce_eager: true
vllm_gpu_util: 0.8

报错信息如下
Welcome to the CLI application, use clear to remove the history, use exit to exit the application.

User: 你好
Assistant: [rank0]: Traceback (most recent call last):
[rank0]: File "/data/bei/LLaMA-Factory/src/cli_bei.py", line 124, in
[rank0]: main()
[rank0]: File "/data/bei/LLaMA-Factory/src/cli_bei.py", line 81, in main
[rank0]: run_chat()
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 185, in run_chat
[rank0]: for new_text in chat_model.stream_chat(messages):
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 110, in stream_chat
[rank0]: yield task.result()
[rank0]: File "/root/miniconda/envs/bei_llamaFactory/lib/python3.10/concurrent/futures/_base.py", line 458, in result
[rank0]: return self.__get_result()
[rank0]: File "/root/miniconda/envs/bei_llamaFactory/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
[rank0]: raise self._exception
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 126, in astream_chat
[rank0]: async for new_token in self.engine.stream_chat(messages, system, tools, images, videos, **input_kwargs):
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/vllm_engine.py", line 222, in stream_chat
[rank0]: generator = await self._generate(messages, system, tools, images, videos, **input_kwargs)
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/vllm_engine.py", line 143, in _generate
[rank0]: sampling_params = SamplingParams(
[rank0]: TypeError: Unexpected keyword argument 'use_beam_search'
[rank0]:[W1108 18:04:44.762380968 CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
/root/miniconda/envs/bei_llamaFactory/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
/root/miniconda/envs/bei_llamaFactory/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

Expected behavior

可以完成正常推理

Others

No response

The text was updated successfully, but these errors were encountered:

hiyouga · 2024-11-08T15:59:07Z

fixed

sunbeibei-hub · 2024-11-11T02:54:16Z

大佬威武

sunbeibei-hub · 2024-11-11T02:56:20Z

[INFO|tokenization_utils_base.py:2470] 2024-11-11 10:50:51,372 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|2024-11-11 10:50:51] llamafactory.data.template:157 >> Replace eos token: <|im_end|>
模型参数：
···
ModelArguments(vllm_maxlen=4096, vllm_gpu_util=0.8, vllm_enforce_eager=True, vllm_max_lora_rank=32, vllm_config=None, export_dir=None, export_size=1, export_device='cpu', export_quantization_bit=None, export_quantization_dataset=None, export_quantization_nsamples=128, export_quantization_maxlen=1024, export_legacy_format=False, export_hub_model_id=None, image_resolution=512, video_resolution=128, video_fps=2.0, video_maxlen=64, quantization_method='bitsandbytes', quantization_bit=None, quantization_type='nf4', double_quantization=True, quantization_device_map=None, model_name_or_path='/root/bei/Models/qwen/Qwen2-0___5B-Instruct/', adapter_name_or_path=None, adapter_folder=None, cache_dir=None, use_fast_tokenizer=True, resize_vocab=False, split_special_tokens=False, new_special_tokens=None, model_revision='main', low_cpu_mem_usage=True, rope_scaling=None, flash_attn='auto', shift_attn=False, mixture_of_depths=None, use_unsloth=False, use_unsloth_gc=False, enable_liger_kernel=False, moe_aux_loss_coef=None, disable_gradient_checkpointing=False, upcast_layernorm=False, upcast_lmhead_output=False, train_from_scratch=False, infer_backend='vllm', offload_folder='offload', use_cache=True, infer_dtype='auto', hf_hub_token=None, ms_hub_token=None, om_hub_token=None, print_param_status=False, compute_dtype=None, device_map='auto', model_max_length=None, block_diag_attn=False)
···

报错：
Traceback (most recent call last):
File "/root/miniconda/envs/bei_llamaFactory/bin/llamafactory-cli", line 8, in
sys.exit(main())
File "/data/bei/LLaMA-Factory/src/llamafactory/cli.py", line 81, in main
run_chat()
File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 158, in run_chat
chat_model = ChatModel()
File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 55, in init
self.engine: "BaseEngine" = VllmEngine(model_args, data_args, finetuning_args, generating_args)
File "/data/bei/LLaMA-Factory/src/llamafactory/chat/vllm_engine.py", line 88, in init
engine_args.update(model_args.vllm_config)
TypeError: 'NoneType' object is not iterable

解释：
model_args中没有vllm_config字段了

github-actions bot added the pending This problem is yet to be addressed label Nov 8, 2024

hiyouga added a commit that referenced this issue Nov 8, 2024

fix #5966

8f3a322

hiyouga mentioned this issue Nov 8, 2024

[generation] fix vllm v0.6.3 #5970

Merged

2 tasks

hiyouga closed this as completed in #5970 Nov 8, 2024

hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vllm版本为0.6.3 报错TypeError: Unexpected keyword argument 'use_beam_search' #5966

vllm版本为0.6.3 报错TypeError: Unexpected keyword argument 'use_beam_search' #5966

sunbeibei-hub commented Nov 8, 2024

hiyouga commented Nov 8, 2024

sunbeibei-hub commented Nov 11, 2024

sunbeibei-hub commented Nov 11, 2024

vllm版本为0.6.3 报错TypeError: Unexpected keyword argument 'use_beam_search' #5966

vllm版本为0.6.3 报错TypeError: Unexpected keyword argument 'use_beam_search' #5966

Comments

sunbeibei-hub commented Nov 8, 2024

Reminder

System Info

Reproduction

Expected behavior

Others

hiyouga commented Nov 8, 2024

sunbeibei-hub commented Nov 11, 2024

sunbeibei-hub commented Nov 11, 2024