Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug when loading an engine using LoRA through LLM API #2782

Open
2 of 4 tasks
pei0033 opened this issue Feb 13, 2025 · 2 comments
Open
2 of 4 tasks

Bug when loading an engine using LoRA through LLM API #2782

pei0033 opened this issue Feb 13, 2025 · 2 comments
Assignees
Labels
bug Something isn't working Investigating LLM API/Workflow triaged Issue has been triaged by maintainers

Comments

@pei0033
Copy link
Contributor

pei0033 commented Feb 13, 2025

System Info

  • docker: nvcr.io/nvidia/tritonserver:24.12-trtllm-python-py3
  • tensorrtllm: v0.16.0

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. build trtllm engine with lora_plugin is available
trtllm-build --checkpoint_dir ckpt_hf-Llama-2-7b-chat-hf --output_dir engine_hf-Llama-2-7b-chat-hf --workers 1 --max_batch_size 1 --max_input_len 512 --max_seq_len 8192 --max_num_tokens 8192 --gather_context_logits --gather_generation_logits --gpus_per_node 8 --bert_attention_plugin auto --gpt_attention_plugin auto --gemm_plugin auto --moe_plugin auto --mamba_conv1d_plugin auto --context_fmha enable --bert_context_fmha_fp32_acc disable --kv_cache_type paged --remove_input_padding enable --reduce_fusion disable --tokens_per_block 64 --use_paged_context_fmha enable --multiple_profiles disable --paged_state enable --lora_plugin float16 --lora_target_modules attn_q attn_k attn_v attn_dense mlp_gate mlp_h_to_4h mlp_4h_to_h --max_lora_rank 64
  1. load engine with llmapi
from tensorrt_llm.llmapi import LLM
model = LLM("engine_hf-Llama-2-7b-chat-hf ", tokenizer="tokenizer_path")

Expected behavior

loading TRTLLM engine with llmapi

actual behavior

I got error like this

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/llm.py", line 164, in __init__
    raise e
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/llm.py", line 159, in __init__
    self._build_model()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/llm.py", line 409, in _build_model
    engine_config = EngineConfig.from_json_file(self._engine_dir /
                                                ^^^^^^^^^^^^^^^^^^
TypeError: unsupported operand type(s) for /: 'str' and 'str'

additional notes

https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/llmapi/llm.py#L415

I think self._engine_dir can be str, so

engine_config = EngineConfig.from_json_file(self._engine_dir /
                                            "config.json")

should be

engine_config = EngineConfig.from_json_file(Path(self._engine_dir) /
                                            "config.json")
@pei0033 pei0033 added the bug Something isn't working label Feb 13, 2025
@pei0033 pei0033 changed the title Bug when loading an engine using LoRA through LLM API" Bug when loading an engine using LoRA through LLM API Feb 13, 2025
@nv-guomingz nv-guomingz self-assigned this Feb 13, 2025
@nv-guomingz
Copy link
Collaborator

Hi @pei0033 thanks for reporting this issue.
we'll fix it ASAP.

@github-actions github-actions bot added triaged Issue has been triaged by maintainers Investigating labels Feb 13, 2025
@pei0033
Copy link
Contributor Author

pei0033 commented Feb 17, 2025

thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Investigating LLM API/Workflow triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants