Bug when loading an engine using LoRA through LLM API #2782

pei0033 · 2025-02-13T07:31:46Z

System Info

docker: nvcr.io/nvidia/tritonserver:24.12-trtllm-python-py3
tensorrtllm: v0.16.0

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

build trtllm engine with lora_plugin is available

trtllm-build --checkpoint_dir ckpt_hf-Llama-2-7b-chat-hf --output_dir engine_hf-Llama-2-7b-chat-hf --workers 1 --max_batch_size 1 --max_input_len 512 --max_seq_len 8192 --max_num_tokens 8192 --gather_context_logits --gather_generation_logits --gpus_per_node 8 --bert_attention_plugin auto --gpt_attention_plugin auto --gemm_plugin auto --moe_plugin auto --mamba_conv1d_plugin auto --context_fmha enable --bert_context_fmha_fp32_acc disable --kv_cache_type paged --remove_input_padding enable --reduce_fusion disable --tokens_per_block 64 --use_paged_context_fmha enable --multiple_profiles disable --paged_state enable --lora_plugin float16 --lora_target_modules attn_q attn_k attn_v attn_dense mlp_gate mlp_h_to_4h mlp_4h_to_h --max_lora_rank 64

load engine with llmapi

from tensorrt_llm.llmapi import LLM
model = LLM("engine_hf-Llama-2-7b-chat-hf ", tokenizer="tokenizer_path")

Expected behavior

loading TRTLLM engine with llmapi

actual behavior

I got error like this

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/llm.py", line 164, in __init__
    raise e
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/llm.py", line 159, in __init__
    self._build_model()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/llm.py", line 409, in _build_model
    engine_config = EngineConfig.from_json_file(self._engine_dir /
                                                ^^^^^^^^^^^^^^^^^^
TypeError: unsupported operand type(s) for /: 'str' and 'str'

additional notes

https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/llmapi/llm.py#L415

I think self._engine_dir can be str, so

engine_config = EngineConfig.from_json_file(self._engine_dir /
                                            "config.json")

should be

engine_config = EngineConfig.from_json_file(Path(self._engine_dir) /
                                            "config.json")

The text was updated successfully, but these errors were encountered:

nv-guomingz · 2025-02-13T11:28:37Z

Hi @pei0033 thanks for reporting this issue.
we'll fix it ASAP.

pei0033 · 2025-02-17T10:13:06Z

thank you

pei0033 added the bug Something isn't working label Feb 13, 2025

pei0033 changed the title ~~Bug when loading an engine using LoRA through LLM API"~~ Bug when loading an engine using LoRA through LLM API Feb 13, 2025

nv-guomingz added the LLM API/Workflow label Feb 13, 2025

nv-guomingz self-assigned this Feb 13, 2025

github-actions bot added triaged Issue has been triaged by maintainers Investigating labels Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug when loading an engine using LoRA through LLM API #2782

Bug when loading an engine using LoRA through LLM API #2782

pei0033 commented Feb 13, 2025

nv-guomingz commented Feb 13, 2025

pei0033 commented Feb 17, 2025

Bug when loading an engine using LoRA through LLM API #2782

Bug when loading an engine using LoRA through LLM API #2782

Comments

pei0033 commented Feb 13, 2025

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

nv-guomingz commented Feb 13, 2025

pei0033 commented Feb 17, 2025