llm-export is a tool for exporting llm models, capable of converting llm models into ONNX or MNN models.
- 🚀 Optimized the original code to support dynamic shapes
- 🚀 Optimized the original code to reduce the constant portion
- 🚀 Using OnnxSlim slim onnx model,speed up 5%; by @inisis
- 🚀 Support export lora weight to onnx or MNN model
- 🚀 MNN inference codemnn-llm
- 🚀 Onnx inference code onnx-llm, OnnxLLM
# pip install
pip install llmexport
# git install
pip install git+https://github.com/wangzhaode/llm-export@master
# local install
git clone https://github.com/wangzhaode/llm-export && cd llm-export/
pip install .
- download the model, Clone the LLM project that you want to export locally, such as: chatglm2-6b
git clone https://huggingface.co/Qwen/Qwen2-1.5B-Instruct
# If downloading from Hugging Face is slow, you can use ModelScope
git clone https://modelscope.cn/qwen/Qwen2-1.5B-Instruct.git
- test the model
# Test text
llmexport --path Qwen2-1.5B-Instruct --test "Hello"
# Test image text
llmexport --path Qwen2-VL-2B-Instruct --test "<img>https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg</img>Describe the content of the picture"
- export the model
# export chatglm2-6b to onnx
llmexport --path Qwen2-1.5B-Instruct --export onnx
# export chatglm2-6b to mnn and quant
llmexport --path Qwen2-1.5B-Instruct --export mnn --quant_bit 4 --quant_block 128
- Supports exporting the entire model as a onnx model or mnn model, use
--export onnx/mnn
- Default using onnx-slim, skip using
--skip_slim
- Support merge lora or split lora.
- Support
awq
gptq
quant.
usage: llmexport.py [-h] --path PATH [--type TYPE] [--tokenizer_path TOKENIZER_PATH] [--lora_path LORA_PATH] [--gptq_path GPTQ_PATH] [--dst_path DST_PATH]
[--verbose] [--test TEST] [--export EXPORT] [--onnx_slim] [--quant_bit QUANT_BIT] [--quant_block QUANT_BLOCK] [--lm_quant_bit LM_QUANT_BIT]
[--mnnconvert MNNCONVERT] [--ppl] [--awq] [--sym] [--tie_embed] [--lora_split]
llm_exporter
options:
-h, --help show this help message and exit
--path PATH path(`str` or `os.PathLike`):
Can be either:
- A string, the *model id* of a pretrained model like `THUDM/chatglm-6b`. [TODO]
- A path to a *directory* clone from repo like `../chatglm-6b`.
--type TYPE type(`str`, *optional*):
The pretrain llm model type.
--tokenizer_path TOKENIZER_PATH
tokenizer path, defaut is `None` mean using `--path` value.
--lora_path LORA_PATH
lora path, defaut is `None` mean not apply lora.
--gptq_path GPTQ_PATH
gptq path, defaut is `None` mean not apply gptq.
--dst_path DST_PATH export onnx/mnn model to path, defaut is `./model`.
--verbose Whether or not to print verbose.
--test TEST test model inference with query `TEST`.
--export EXPORT export model to an onnx/mnn model.
--onnx_slim Whether or not to use onnx-slim.
--quant_bit QUANT_BIT
mnn quant bit, 4 or 8, default is 4.
--quant_block QUANT_BLOCK
mnn quant block, default is 0 mean channle-wise.
--lm_quant_bit LM_QUANT_BIT
mnn lm_head quant bit, 4 or 8, default is `quant_bit`.
--mnnconvert MNNCONVERT
local mnnconvert path, if invalid, using pymnn.
--ppl Whether or not to get all logits of input tokens.
--awq Whether or not to use awq quant.
--sym Whether or not to using symmetric quant (without zeropoint), defualt is False.
--tie_embed Whether or not to using tie_embedding, defualt is False.
--lora_split Whether or not export lora split, defualt is False.