Qwen2-VL-2B-Instruct使用AWQ进行quantize报错 #2935

songyang23 · 2025-01-17T06:05:52Z

Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图)
利用swift对Qwen2-VL-2B-Instruct利用LORA训练过的模型进行量化，发现使用gptq可以正常运行，使用awq会报错

| + swift export --model_type qwen2_vl --model /data/public/models/Qwen2-VL-7B-Instruct --dataset /data/public/yasong/Projects/pricing_vllm/vllm_val_v2.jsonl --quant_method awq --quant_n_samples 256 --quant_batch_size 1 --max_length 128 --quant_bits 4 --output_dir /data/public/models/Qwen2-VL-7B-Instruct_quantize_awq_int4
| run sh: /data/public/yasong/python3.10/bin/python3.10 /data/public/yasong/python3.10/lib/python3.10/site-packages/swift/cli/export.py --model_type qwen2_vl --model /data/public/models/Qwen2-VL-7B-Instruct --dataset /data/public/yasong/Projects/pricing_vllm/vllm_val_v2.jsonl --quant_method awq --quant_n_samples 256 --quant_batch_size 1 --max_length 128 --quant_bits 4 --output_dir /data/public/models/Qwen2-VL-7B-Instruct_quantize_awq_int4
| [INFO:swift] Successfully registered /data/public/yasong/python3.10/lib/python3.10/site-packages/swift/llm/dataset/data/dataset_info.json
| [INFO:swift] rank: -1, local_rank: -1, world_size: 1, local_world_size: 1
| [INFO:swift] Loading the model using model_dir: /data/public/models/Qwen2-VL-7B-Instruct
| [WARNING:swift] Please install the package: pip install "pyav" "decord" -U.
| [INFO:swift] args: ExportArguments(model='/data/public/models/Qwen2-VL-7B-Instruct', model_type='qwen2_vl', model_revision=None, task_type='causal_lm', torch_dtype=torch.float16, attn_impl=None, num_labels=None, rope_scaling=None, device_map=None, local_repo_path=None, template='qwen2_vl', system=None, max_length=128, truncation_strategy='delete', max_pixels=None, tools_prompt='react_en', padding_side='right', loss_scale='default', sequence_parallel_size=1, use_chat_template=True, template_backend='swift', dataset=['/data/public/yasong/Projects/pricing_vllm/vllm_val_v2.jsonl'], val_dataset=[], split_dataset_ratio=0.01, data_seed=42, dataset_num_proc=1, streaming=False, enable_cache=False, download_mode='reuse_dataset_if_exists', strict=False, model_name=[None, None], model_author=[None, None], custom_dataset_info=[], quant_method='awq', quant_bits=4, hqq_axis=None, bnb_4bit_compute_dtype=torch.float32, bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, bnb_4bit_quant_storage=None, max_new_tokens=None, temperature=None, top_k=None, top_p=None, repetition_penalty=None, num_beams=1, stream=False, stop_words=[], logprobs=False, ckpt_dir=None, load_dataset_config=None, lora_modules=[], tuner_backend='peft', train_type='lora', adapters=[], seed=42, model_kwargs={}, load_args=True, load_data_args=False, use_hf=False, hub_token=None, custom_register_path=[], ignore_args_error=False, use_swift_lora=False, merge_lora=False, safe_serialization=True, max_shard_size='5GB', output_dir='/data/public/models/Qwen2-VL-7B-Instruct_quantize_awq_int4', quant_n_samples=256, quant_batch_size=1, group_size=128, to_ollama=False, gguf_file=None, push_to_hub=False, hub_model_id=None, hub_private_repo=False, commit_message='update files', to_peft_format=False)
| [INFO:swift] Start time of running main: 2025-01-17 04:09:05.652498
| [INFO:swift] Global seed set to 42
| [INFO:swift] Loading the model using model_dir: /data/public/models/Qwen2-VL-7B-Instruct
| [WARNING:swift] Please install the package: pip install "pyav" "decord" -U.
| [INFO:swift] model_kwargs: {'device_map': 'cuda:0'}
| Qwen2VLRotaryEmbedding can now be fully parameterized by passing the model config through the config argument. All other arguments will be removed in v4.46
| Loading checkpoint shards: 100%|██████████| 5/5 [00:07<00:00, 1.58s/it]
| [INFO:swift] Using environment variable IMAGE_FACTOR, Setting image_factor: 8.
| [INFO:swift] Setting min_pixels: 3136. You can adjust this hyperparameter through the environment variable: MIN_PIXELS.
| [INFO:swift] Using environment variable MAX_PIXELS, Setting max_pixels: 602112.
| [INFO:swift] Setting max_ratio: 200. You can adjust this hyperparameter through the environment variable: MAX_RATIO.
| [INFO:swift] Setting video_min_pixels: 100352. You can adjust this hyperparameter through the environment variable: VIDEO_MIN_PIXELS.
| [INFO:swift] Setting video_max_pixels: 602112. You can adjust this hyperparameter through the environment variable: VIDEO_MAX_PIXELS.
| [INFO:swift] Setting video_total_pixels: 19267584. You can adjust this hyperparameter through the environment variable: VIDEO_TOTAL_PIXELS.
| [INFO:swift] Setting frame_factor: 2. You can adjust this hyperparameter through the environment variable: FRAME_FACTOR.
| [INFO:swift] Setting fps: 2.0. You can adjust this hyperparameter through the environment variable: FPS.
| [INFO:swift] Setting fps_min_frames: 4. You can adjust this hyperparameter through the environment variable: FPS_MIN_FRAMES.
| [INFO:swift] Setting fps_max_frames: 768. You can adjust this hyperparameter through the environment variable: FPS_MAX_FRAMES.
| [INFO:swift] default_system: You are a helpful assistant.
| [INFO:swift] Quantization dataset: ['/data/public/yasong/Projects/pricing_vllm/vllm_val_v2.jsonl']
| [INFO:swift] Start quantizing the model...
| Generating train split: 7341 examples [00:00, 485048.37 examples/s]
| [INFO:swift] create tmp_dir: /home/jovyan/.cache/modelscope/hub/tmp/hf_datasets-28477cdj
| Map: 100%|██████████| 7341/7341 [00:00<00:00, 25900.61 examples/s]
| [INFO:swift] quant_dataset: Dataset({
| features: ['messages', 'images'],
| num_rows: 7341
| })
| 100%|█████████▉| 255/256 [00:31<00:00, 8.51it/s][INFO:swift] Split into 2861 blocks
| 100%|██████████| 256/256 [00:31<00:00, 8.15it/s]
| AWQ: 0%| | 0/28 [00:00<?, ?it/s]
| Traceback (most recent call last):
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/swift/cli/export.py", line 5, in
| export_main()
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/swift/llm/export/export.py", line 40, in export_main
| return SwiftExport(args).main()
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/swift/llm/base.py", line 46, in main
| result = self.run()
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/swift/llm/export/export.py", line 25, in run
| quantize_model(args)
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/swift/llm/export/quant.py", line 227, in quantize_model
| QuantEngine(args).quantize()
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/swift/llm/export/quant.py", line 34, in quantize
| self.awq_model_quantize()
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/swift/llm/export/quant.py", line 151, in awq_model_quantize
| self.model.quantize(
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
| return func(*args, **kwargs)
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/awq/models/base.py", line 238, in quantize
| self.quantizer.quantize()
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/awq/quantize/quantizer.py", line 159, in quantize
| input_feat = self._get_input_feat(self.modules[i], named_linears)
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/awq/quantize/quantizer.py", line 633, in _get_input_feat
| self.inps = self._module_forward(self.inps, layer, module_kwargs)
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
| return func(*args, **kwargs)
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/awq/quantize/quantizer.py", line 256, in _module_forward
| partial_output = module(x_partial, **module_kwargs)
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
| return self._call_impl(*args, **kwargs)
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
| return forward_call(*args, **kwargs)
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 859, in forward
| hidden_states, self_attn_weights, present_key_value = self.self_attn(
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
| return self._call_impl(*args, **kwargs)
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
| return forward_call(*args, **kwargs)
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 752, in forward
| cos, sin = position_embeddings
| TypeError: cannot unpack non-iterable NoneType object

Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等)

Additional context
Add any other context about the problem here(在这里补充其他信息)
pip list
absl-py 2.1.0
accelerate 1.1.1
addict 2.4.0
aiofiles 23.2.1
aiohttp 3.9.5
aiosignal 1.3.1
aliyun-python-sdk-core 2.16.0
aliyun-python-sdk-kms 2.16.5
annotated-types 0.7.0
anyio 4.6.2.post1
async-timeout 4.0.3
attrdict 2.0.1
attrs 23.2.0
auto_gptq 0.7.1
autoawq 0.2.7.post2
autoawq_kernels 0.0.9
av 13.1.0
binpacking 1.5.2
boto3 1.35.98
botocore 1.35.98
certifi 2024.6.2
cffi 1.17.1
charset-normalizer 3.3.2
click 8.1.7
cloudpickle 3.1.0
cmake 3.31.0.1
colorama 0.4.6
coloredlogs 15.0.1
compressed-tensors 0.6.0
contourpy 1.3.1
cpm-kernels 1.0.11
crcmod 1.7
cryptography 43.0.3
cycler 0.12.1
dacite 1.8.1
datasets 3.2.0
deepseek_vl2 1.0.0 /data/public/yasong/DeepSeek-VL2-main
deepspeed 0.14.4
dill 0.3.8
diskcache 5.6.3
distro 1.9.0
dnspython 2.7.0
docstring_parser 0.16
einops 0.8.0
eventlet 0.38.2
exceptiongroup 1.2.2
fastapi 0.115.5
ffmpy 0.4.0
filelock 3.15.4
flash-attn 2.6.3
fonttools 4.54.1
frozenlist 1.4.1
fsspec 2024.5.0
future 1.0.0
gekko 1.2.1
gguf 0.10.0
gradio 5.5.0
gradio_client 1.4.2
greenlet 3.1.1
grpcio 1.64.1
h11 0.14.0
hjson 3.1.0
httpcore 1.0.6
httptools 0.6.4
httpx 0.27.2
huggingface-hub 0.27.1
humanfriendly 10.0
idna 3.7
importlib_metadata 8.5.0
interegular 0.3.3
jieba 0.42.1
Jinja2 3.1.4
jiter 0.7.1
jmespath 0.10.0
joblib 1.4.2
jsonschema 4.23.0
jsonschema-specifications 2024.10.1
kiwisolver 1.4.7
lark 1.2.2
llvmlite 0.43.0
lm-format-enforcer 0.10.6
lxml 5.3.0
Markdown 3.6
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.9.2
mdurl 0.1.2
metrics 0.3.3
mistral_common 1.4.4
modelscope 1.22.0
mpmath 1.3.0
ms-swift 3.0.2.post1
msgpack 1.1.0
msgspec 0.18.6
multidict 6.0.5
multiprocess 0.70.16
nest-asyncio 1.6.0
networkx 3.3
ninja 1.11.1.1
nltk 3.9.1
numba 0.60.0
numpy 1.26.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-ml-py 12.555.43
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.1.105
openai 1.54.4
OpenCC 1.1.9
opencv-python-headless 4.10.0.84
optimum 1.23.3
orjson 3.10.11
oss2 2.19.1
outlines 0.0.46
packaging 24.1
pandas 2.2.2
partial-json-parser 0.2.1.1.post4
PasteDeploy 3.1.0
pathlib2 2.3.7.post1
pathspec 0.5.5
peft 0.12.0
pillow 10.4.0
pip 23.0.1
portalocker 2.10.1
prometheus_client 0.21.0
prometheus-fastapi-instrumentator 7.0.0
protobuf 4.25.3
psutil 6.0.0
py-cpuinfo 9.0.0
pyairports 2.1.1
pyarrow 16.1.0
pyarrow-hotfix 0.6
pycountry 24.6.1
pycparser 2.22
pycryptodome 3.21.0
pydantic 2.9.2
pydantic_core 2.23.4
pydub 0.25.1
pyeclib 1.6.4
Pygments 2.18.0
pyparsing 3.2.0
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
python-multipart 0.0.12
pytz 2024.1
PyYAML 6.0.1
pyzmq 26.2.0
qwen-vl-utils 0.0.8
ray 2.39.0
referencing 0.35.1
regex 2024.5.15
requests 2.32.3
rich 13.9.4
rouge 1.0.1
rpds-py 0.21.0
ruff 0.7.3
s3transfer 0.10.4
sacrebleu 2.4.3
safehttpx 0.1.1
safetensors 0.4.3
scikit-learn 1.5.1
scipy 1.14.0
semantic-version 2.10.0
sentence-transformers 3.2.1
sentencepiece 0.2.0
setuptools 69.5.1
shellingham 1.5.4
shtab 1.7.1
simplejson 3.19.3
six 1.16.0
sniffio 1.3.1
sortedcontainers 2.4.0
starlette 0.41.2
swift 2.34.0
sympy 1.13.1
tabulate 0.9.0
tensorboard 2.17.0
tensorboard-data-server 0.7.2
threadpoolctl 3.5.0
tiktoken 0.7.0
timm 1.0.13
tokenizers 0.21.0
tomlkit 0.12.0
torch 2.4.0
torchaudio 2.3.1
torchvision 0.19.0
tqdm 4.66.4
transformers 4.48.0
transformers-stream-generator 0.0.5
triton 3.0.0
trl 0.11.4
typer 0.13.0
typing_extensions 4.12.2
tyro 0.8.14
tzdata 2024.1
urllib3 2.2.2
uvicorn 0.32.0
uvloop 0.21.0
vllm 0.6.3.post1
vllm-flash-attn 2.5.9.post1
watchfiles 0.24.0
websockets 12.0
Werkzeug 3.0.3
xattr 1.1.0
xformers 0.0.27.post2
xxhash 3.4.1
yarl 1.9.4
zipp 3.21.0
zstandard 0.23.0

The text was updated successfully, but these errors were encountered:

Jintao-Huang · 2025-01-17T06:07:07Z

请先使用gptq量化 vl模型

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen2-VL-2B-Instruct使用AWQ进行quantize报错 #2935

Qwen2-VL-2B-Instruct使用AWQ进行quantize报错 #2935

songyang23 commented Jan 17, 2025

Jintao-Huang commented Jan 17, 2025

Qwen2-VL-2B-Instruct使用AWQ进行quantize报错 #2935

Qwen2-VL-2B-Instruct使用AWQ进行quantize报错 #2935

Comments

songyang23 commented Jan 17, 2025

Jintao-Huang commented Jan 17, 2025