Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vllm版本为0.6.3 报错TypeError: Unexpected keyword argument 'use_beam_search' #5966

Closed
1 task done
sunbeibei-hub opened this issue Nov 8, 2024 · 3 comments · Fixed by #5970
Closed
1 task done
Labels
solved This problem has been already solved

Comments

@sunbeibei-hub
Copy link

Reminder

  • I have read the README and searched the existing issues.

System Info

  • llamafactory version: 0.9.1.dev0
  • Platform: Linux-6.5.0-35-generic-x86_64-with-glibc2.35
  • Python version: 3.10.15
  • PyTorch version: 2.4.0+cu121 (GPU)
  • Transformers version: 4.45.2
  • Datasets version: 3.1.0
  • Accelerate version: 1.0.1
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: NVIDIA A800-SXM4-80GB
  • vLLM version: dev

Reproduction

CUDA_VISIBLE_DEVICES=0,1 python cli.py chat ../examples/inference/qwen2-0.5.yaml

yaml文件内容
model_name_or_path: /root/bei/Models/qwen/Qwen2-0___5B-Instruct/
template: qwen
infer_backend: vllm
vllm_enforce_eager: true
vllm_gpu_util: 0.8

报错信息如下
Welcome to the CLI application, use clear to remove the history, use exit to exit the application.

User: 你好
Assistant: [rank0]: Traceback (most recent call last):
[rank0]: File "/data/bei/LLaMA-Factory/src/cli_bei.py", line 124, in
[rank0]: main()
[rank0]: File "/data/bei/LLaMA-Factory/src/cli_bei.py", line 81, in main
[rank0]: run_chat()
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 185, in run_chat
[rank0]: for new_text in chat_model.stream_chat(messages):
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 110, in stream_chat
[rank0]: yield task.result()
[rank0]: File "/root/miniconda/envs/bei_llamaFactory/lib/python3.10/concurrent/futures/_base.py", line 458, in result
[rank0]: return self.__get_result()
[rank0]: File "/root/miniconda/envs/bei_llamaFactory/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
[rank0]: raise self._exception
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 126, in astream_chat
[rank0]: async for new_token in self.engine.stream_chat(messages, system, tools, images, videos, **input_kwargs):
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/vllm_engine.py", line 222, in stream_chat
[rank0]: generator = await self._generate(messages, system, tools, images, videos, **input_kwargs)
[rank0]: File "/data/bei/LLaMA-Factory/src/llamafactory/chat/vllm_engine.py", line 143, in _generate
[rank0]: sampling_params = SamplingParams(
[rank0]: TypeError: Unexpected keyword argument 'use_beam_search'
[rank0]:[W1108 18:04:44.762380968 CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
/root/miniconda/envs/bei_llamaFactory/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
/root/miniconda/envs/bei_llamaFactory/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

Expected behavior

可以完成正常推理

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Nov 8, 2024
hiyouga added a commit that referenced this issue Nov 8, 2024
@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Nov 8, 2024
@hiyouga
Copy link
Owner

hiyouga commented Nov 8, 2024

fixed

@sunbeibei-hub
Copy link
Author

大佬威武

@sunbeibei-hub
Copy link
Author

[INFO|tokenization_utils_base.py:2470] 2024-11-11 10:50:51,372 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|2024-11-11 10:50:51] llamafactory.data.template:157 >> Replace eos token: <|im_end|>
模型参数:
···
ModelArguments(vllm_maxlen=4096, vllm_gpu_util=0.8, vllm_enforce_eager=True, vllm_max_lora_rank=32, vllm_config=None, export_dir=None, export_size=1, export_device='cpu', export_quantization_bit=None, export_quantization_dataset=None, export_quantization_nsamples=128, export_quantization_maxlen=1024, export_legacy_format=False, export_hub_model_id=None, image_resolution=512, video_resolution=128, video_fps=2.0, video_maxlen=64, quantization_method='bitsandbytes', quantization_bit=None, quantization_type='nf4', double_quantization=True, quantization_device_map=None, model_name_or_path='/root/bei/Models/qwen/Qwen2-0___5B-Instruct/', adapter_name_or_path=None, adapter_folder=None, cache_dir=None, use_fast_tokenizer=True, resize_vocab=False, split_special_tokens=False, new_special_tokens=None, model_revision='main', low_cpu_mem_usage=True, rope_scaling=None, flash_attn='auto', shift_attn=False, mixture_of_depths=None, use_unsloth=False, use_unsloth_gc=False, enable_liger_kernel=False, moe_aux_loss_coef=None, disable_gradient_checkpointing=False, upcast_layernorm=False, upcast_lmhead_output=False, train_from_scratch=False, infer_backend='vllm', offload_folder='offload', use_cache=True, infer_dtype='auto', hf_hub_token=None, ms_hub_token=None, om_hub_token=None, print_param_status=False, compute_dtype=None, device_map='auto', model_max_length=None, block_diag_attn=False)
···

报错:
Traceback (most recent call last):
File "/root/miniconda/envs/bei_llamaFactory/bin/llamafactory-cli", line 8, in
sys.exit(main())
File "/data/bei/LLaMA-Factory/src/llamafactory/cli.py", line 81, in main
run_chat()
File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 158, in run_chat
chat_model = ChatModel()
File "/data/bei/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 55, in init
self.engine: "BaseEngine" = VllmEngine(model_args, data_args, finetuning_args, generating_args)
File "/data/bei/LLaMA-Factory/src/llamafactory/chat/vllm_engine.py", line 88, in init
engine_args.update(model_args.vllm_config)
TypeError: 'NoneType' object is not iterable

解释:
model_args中没有vllm_config字段了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants