Skip to content

Latest commit

 

History

History
153 lines (140 loc) · 15.5 KB

README_en.md

File metadata and controls

153 lines (140 loc) · 15.5 KB

llm-export

中文

llm-export is a tool for exporting llm models, capable of converting llm models into ONNX or MNN models.

  • 🚀 Optimized the original code to support dynamic shapes
  • 🚀 Optimized the original code to reduce the constant portion
  • 🚀 Using OnnxSlim slim onnx model,speed up 5%; by @inisis
  • 🚀 Support export lora weight to onnx or MNN model
  • 🚀 MNN inference codemnn-llm
  • 🚀 Onnx inference code onnx-llm, OnnxLLM

Install

# pip install
pip install llmexport

# git install
pip install git+https://github.com/wangzhaode/llm-export@master

# local install
git clone https://github.com/wangzhaode/llm-export && cd llm-export/
pip install .

Usage

  1. download the model, Clone the LLM project that you want to export locally, such as: chatglm2-6b
git clone https://huggingface.co/Qwen/Qwen2-1.5B-Instruct
# If downloading from Hugging Face is slow, you can use ModelScope
git clone https://modelscope.cn/qwen/Qwen2-1.5B-Instruct.git
  1. test the model
# Test text
llmexport --path Qwen2-1.5B-Instruct --test "Hello"
# Test image text
llmexport --path Qwen2-VL-2B-Instruct  --test "<img>https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg</img>Describe the content of the picture"
  1. export the model
# export chatglm2-6b to onnx
llmexport --path Qwen2-1.5B-Instruct --export onnx
# export chatglm2-6b to mnn and quant
llmexport --path Qwen2-1.5B-Instruct --export mnn --quant_bit 4 --quant_block 128

Features

  • Supports exporting the entire model as a onnx model or mnn model, use --export onnx/mnn
  • Default using onnx-slim, skip using --skip_slim
  • Support merge lora or split lora.
  • Support awq gptq quant.

Commad Args

usage: llmexport.py [-h] --path PATH [--type TYPE] [--tokenizer_path TOKENIZER_PATH] [--lora_path LORA_PATH] [--gptq_path GPTQ_PATH] [--dst_path DST_PATH]
                    [--verbose] [--test TEST] [--export EXPORT] [--onnx_slim] [--quant_bit QUANT_BIT] [--quant_block QUANT_BLOCK] [--lm_quant_bit LM_QUANT_BIT]
                    [--mnnconvert MNNCONVERT] [--ppl] [--awq] [--sym] [--tie_embed] [--lora_split]

llm_exporter

options:
  -h, --help            show this help message and exit
  --path PATH           path(`str` or `os.PathLike`):
                        Can be either:
                        	- A string, the *model id* of a pretrained model like `THUDM/chatglm-6b`. [TODO]
                        	- A path to a *directory* clone from repo like `../chatglm-6b`.
  --type TYPE           type(`str`, *optional*):
                        	The pretrain llm model type.
  --tokenizer_path TOKENIZER_PATH
                        tokenizer path, defaut is `None` mean using `--path` value.
  --lora_path LORA_PATH
                        lora path, defaut is `None` mean not apply lora.
  --gptq_path GPTQ_PATH
                        gptq path, defaut is `None` mean not apply gptq.
  --dst_path DST_PATH   export onnx/mnn model to path, defaut is `./model`.
  --verbose             Whether or not to print verbose.
  --test TEST           test model inference with query `TEST`.
  --export EXPORT       export model to an onnx/mnn model.
  --onnx_slim           Whether or not to use onnx-slim.
  --quant_bit QUANT_BIT
                        mnn quant bit, 4 or 8, default is 4.
  --quant_block QUANT_BLOCK
                        mnn quant block, default is 0 mean channle-wise.
  --lm_quant_bit LM_QUANT_BIT
                        mnn lm_head quant bit, 4 or 8, default is `quant_bit`.
  --mnnconvert MNNCONVERT
                        local mnnconvert path, if invalid, using pymnn.
  --ppl                 Whether or not to get all logits of input tokens.
  --awq                 Whether or not to use awq quant.
  --sym                 Whether or not to using symmetric quant (without zeropoint), defualt is False.
  --tie_embed           Whether or not to using tie_embedding, defualt is False.
  --lora_split          Whether or not export lora split, defualt is False.

Model Download

Model ModelScope Hugging Face
Qwen-VL-Chat Q4_1 Q4_1
Baichuan2-7B-Chat Q4_1 Q4_1
bge-large-zh Q4_1 Q4_1
chatglm-6b Q4_1 Q4_1
chatglm2-6b Q4_1 Q4_1
chatglm3-6b Q4_1 Q4_1
codegeex2-6b Q4_1 Q4_1
deepseek-llm-7b-chat Q4_1 Q4_1
gemma-2-2b-it Q4_1 Q4_1
glm-4-9b-chat Q4_1 Q4_1
gte_sentence-embedding_multilingual-base Q4_1 Q4_1
internlm-chat-7b Q4_1 Q4_1
Llama-2-7b-chat Q4_1 Q4_1
Llama-3-8B-Instruct Q4_1 Q4_1
Llama-3.2-1B-Instruct Q4_1 Q4_1
Llama-3.2-3B-Instruct Q4_1 Q4_1
OpenELM-1_1B-Instruct Q4_1 Q4_1
OpenELM-270M-Instruct Q4_1 Q4_1
OpenELM-3B-Instruct Q8_1 Q8_1
OpenELM-450M-Instruct Q4_1 Q4_1
phi-2 Q4_1 Q4_1
qwen/Qwen-1_8B-Chat Q4_1 Q4_1
Qwen-7B-Chat Q4_1 Q4_1
Qwen1.5-0.5B-Chat Q4_1 Q4_1
Qwen1.5-1.8B-Chat Q4_1 Q4_1
Qwen1.5-4B-Chat Q4_1 Q4_1
Qwen1.5-7B-Chat Q4_1 Q4_1
Qwen2-0.5B-Instruct Q4_1 Q4_1
Qwen2-1.5B-Instruct Q4_1 Q4_1
Qwen2-7B-Instruct Q4_1 Q4_1
Qwen2-Audio-7B-Instruct Q4_1 Q4_1
Qwen2-VL-2B-Instruct Q4_1 Q4_1
Qwen2-VL-7B-Instruct Q4_1 Q4_1
Qwen2.5-0.5B-Instruct Q4_1 Q4_1
Qwen2.5-1.5B-Instruct Q4_1 Q4_1
Qwen2.5-3B-Instruct Q4_1 Q4_1
Qwen2.5-7B-Instruct Q4_1 Q4_1
Qwen2.5-Coder-1.5B-Instruct Q4_1 Q4_1
Qwen2.5-Coder-7B-Instruct Q4_1 Q4_1
Qwen2.5-Math-1.5B-Instruct Q4_1 Q4_1
Qwen2.5-Math-7B-Instruct Q4_1 Q4_1
QwQ-32B-Preview Q4_1 Q4_1
reader-lm-0.5b Q4_1 Q4_1
reader-lm-1.5b Q4_1 Q4_1
TinyLlama-1.1B-Chat-v1.0 Q4_1 Q4_1
Yi-6B-Chat Q4_1 Q4_1
MobileLLM-125M Q4_1 Q4_1
MobileLLM-350M Q4_1 Q4_1
MobileLLM-600M Q4_1 Q4_1
MobileLLM-1B Q4_1 Q4_1
SmolLM2-135M-Instruct Q4_1 Q4_1
SmolLM2-360M-Instruct Q4_1 Q4_1
SmolLM2-1.7B-Instruct Q4_1 Q4_1