Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen2-VL-2B-Instruct使用AWQ进行quantize报错 #2935

Open
songyang23 opened this issue Jan 17, 2025 · 1 comment
Open

Qwen2-VL-2B-Instruct使用AWQ进行quantize报错 #2935

songyang23 opened this issue Jan 17, 2025 · 1 comment

Comments

@songyang23
Copy link

Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
利用swift对Qwen2-VL-2B-Instruct利用LORA训练过的模型进行量化,发现使用gptq可以正常运行,使用awq会报错

| + swift export --model_type qwen2_vl --model /data/public/models/Qwen2-VL-7B-Instruct --dataset /data/public/yasong/Projects/pricing_vllm/vllm_val_v2.jsonl --quant_method awq --quant_n_samples 256 --quant_batch_size 1 --max_length 128 --quant_bits 4 --output_dir /data/public/models/Qwen2-VL-7B-Instruct_quantize_awq_int4
| run sh: /data/public/yasong/python3.10/bin/python3.10 /data/public/yasong/python3.10/lib/python3.10/site-packages/swift/cli/export.py --model_type qwen2_vl --model /data/public/models/Qwen2-VL-7B-Instruct --dataset /data/public/yasong/Projects/pricing_vllm/vllm_val_v2.jsonl --quant_method awq --quant_n_samples 256 --quant_batch_size 1 --max_length 128 --quant_bits 4 --output_dir /data/public/models/Qwen2-VL-7B-Instruct_quantize_awq_int4
| [INFO:swift] Successfully registered /data/public/yasong/python3.10/lib/python3.10/site-packages/swift/llm/dataset/data/dataset_info.json
| [INFO:swift] rank: -1, local_rank: -1, world_size: 1, local_world_size: 1
| [INFO:swift] Loading the model using model_dir: /data/public/models/Qwen2-VL-7B-Instruct
| [WARNING:swift] Please install the package: pip install "pyav" "decord" -U.
| [INFO:swift] args: ExportArguments(model='/data/public/models/Qwen2-VL-7B-Instruct', model_type='qwen2_vl', model_revision=None, task_type='causal_lm', torch_dtype=torch.float16, attn_impl=None, num_labels=None, rope_scaling=None, device_map=None, local_repo_path=None, template='qwen2_vl', system=None, max_length=128, truncation_strategy='delete', max_pixels=None, tools_prompt='react_en', padding_side='right', loss_scale='default', sequence_parallel_size=1, use_chat_template=True, template_backend='swift', dataset=['/data/public/yasong/Projects/pricing_vllm/vllm_val_v2.jsonl'], val_dataset=[], split_dataset_ratio=0.01, data_seed=42, dataset_num_proc=1, streaming=False, enable_cache=False, download_mode='reuse_dataset_if_exists', strict=False, model_name=[None, None], model_author=[None, None], custom_dataset_info=[], quant_method='awq', quant_bits=4, hqq_axis=None, bnb_4bit_compute_dtype=torch.float32, bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, bnb_4bit_quant_storage=None, max_new_tokens=None, temperature=None, top_k=None, top_p=None, repetition_penalty=None, num_beams=1, stream=False, stop_words=[], logprobs=False, ckpt_dir=None, load_dataset_config=None, lora_modules=[], tuner_backend='peft', train_type='lora', adapters=[], seed=42, model_kwargs={}, load_args=True, load_data_args=False, use_hf=False, hub_token=None, custom_register_path=[], ignore_args_error=False, use_swift_lora=False, merge_lora=False, safe_serialization=True, max_shard_size='5GB', output_dir='/data/public/models/Qwen2-VL-7B-Instruct_quantize_awq_int4', quant_n_samples=256, quant_batch_size=1, group_size=128, to_ollama=False, gguf_file=None, push_to_hub=False, hub_model_id=None, hub_private_repo=False, commit_message='update files', to_peft_format=False)
| [INFO:swift] Start time of running main: 2025-01-17 04:09:05.652498
| [INFO:swift] Global seed set to 42
| [INFO:swift] Loading the model using model_dir: /data/public/models/Qwen2-VL-7B-Instruct
| [WARNING:swift] Please install the package: pip install "pyav" "decord" -U.
| [INFO:swift] model_kwargs: {'device_map': 'cuda:0'}
| Qwen2VLRotaryEmbedding can now be fully parameterized by passing the model config through the config argument. All other arguments will be removed in v4.46
| Loading checkpoint shards: 100%|██████████| 5/5 [00:07<00:00, 1.58s/it]
| [INFO:swift] Using environment variable IMAGE_FACTOR, Setting image_factor: 8.
| [INFO:swift] Setting min_pixels: 3136. You can adjust this hyperparameter through the environment variable: MIN_PIXELS.
| [INFO:swift] Using environment variable MAX_PIXELS, Setting max_pixels: 602112.
| [INFO:swift] Setting max_ratio: 200. You can adjust this hyperparameter through the environment variable: MAX_RATIO.
| [INFO:swift] Setting video_min_pixels: 100352. You can adjust this hyperparameter through the environment variable: VIDEO_MIN_PIXELS.
| [INFO:swift] Setting video_max_pixels: 602112. You can adjust this hyperparameter through the environment variable: VIDEO_MAX_PIXELS.
| [INFO:swift] Setting video_total_pixels: 19267584. You can adjust this hyperparameter through the environment variable: VIDEO_TOTAL_PIXELS.
| [INFO:swift] Setting frame_factor: 2. You can adjust this hyperparameter through the environment variable: FRAME_FACTOR.
| [INFO:swift] Setting fps: 2.0. You can adjust this hyperparameter through the environment variable: FPS.
| [INFO:swift] Setting fps_min_frames: 4. You can adjust this hyperparameter through the environment variable: FPS_MIN_FRAMES.
| [INFO:swift] Setting fps_max_frames: 768. You can adjust this hyperparameter through the environment variable: FPS_MAX_FRAMES.
| [INFO:swift] default_system: You are a helpful assistant.
| [INFO:swift] Quantization dataset: ['/data/public/yasong/Projects/pricing_vllm/vllm_val_v2.jsonl']
| [INFO:swift] Start quantizing the model...
| Generating train split: 7341 examples [00:00, 485048.37 examples/s]
| [INFO:swift] create tmp_dir: /home/jovyan/.cache/modelscope/hub/tmp/hf_datasets-28477cdj
| Map: 100%|██████████| 7341/7341 [00:00<00:00, 25900.61 examples/s]
| [INFO:swift] quant_dataset: Dataset({
| features: ['messages', 'images'],
| num_rows: 7341
| })
| 100%|█████████▉| 255/256 [00:31<00:00, 8.51it/s][INFO:swift] Split into 2861 blocks
| 100%|██████████| 256/256 [00:31<00:00, 8.15it/s]
| AWQ: 0%| | 0/28 [00:00<?, ?it/s]
| Traceback (most recent call last):
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/swift/cli/export.py", line 5, in
| export_main()
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/swift/llm/export/export.py", line 40, in export_main
| return SwiftExport(args).main()
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/swift/llm/base.py", line 46, in main
| result = self.run()
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/swift/llm/export/export.py", line 25, in run
| quantize_model(args)
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/swift/llm/export/quant.py", line 227, in quantize_model
| QuantEngine(args).quantize()
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/swift/llm/export/quant.py", line 34, in quantize
| self.awq_model_quantize()
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/swift/llm/export/quant.py", line 151, in awq_model_quantize
| self.model.quantize(
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
| return func(*args, **kwargs)
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/awq/models/base.py", line 238, in quantize
| self.quantizer.quantize()
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/awq/quantize/quantizer.py", line 159, in quantize
| input_feat = self._get_input_feat(self.modules[i], named_linears)
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/awq/quantize/quantizer.py", line 633, in _get_input_feat
| self.inps = self._module_forward(self.inps, layer, module_kwargs)
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
| return func(*args, **kwargs)
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/awq/quantize/quantizer.py", line 256, in _module_forward
| partial_output = module(x_partial, **module_kwargs)
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
| return self._call_impl(*args, **kwargs)
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
| return forward_call(*args, **kwargs)
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 859, in forward
| hidden_states, self_attn_weights, present_key_value = self.self_attn(
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
| return self._call_impl(*args, **kwargs)
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
| return forward_call(*args, **kwargs)
| File "/data/public/yasong/python3.10/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 752, in forward
| cos, sin = position_embeddings
| TypeError: cannot unpack non-iterable NoneType object

Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
Image

Additional context
Add any other context about the problem here(在这里补充其他信息)
pip list
absl-py 2.1.0
accelerate 1.1.1
addict 2.4.0
aiofiles 23.2.1
aiohttp 3.9.5
aiosignal 1.3.1
aliyun-python-sdk-core 2.16.0
aliyun-python-sdk-kms 2.16.5
annotated-types 0.7.0
anyio 4.6.2.post1
async-timeout 4.0.3
attrdict 2.0.1
attrs 23.2.0
auto_gptq 0.7.1
autoawq 0.2.7.post2
autoawq_kernels 0.0.9
av 13.1.0
binpacking 1.5.2
boto3 1.35.98
botocore 1.35.98
certifi 2024.6.2
cffi 1.17.1
charset-normalizer 3.3.2
click 8.1.7
cloudpickle 3.1.0
cmake 3.31.0.1
colorama 0.4.6
coloredlogs 15.0.1
compressed-tensors 0.6.0
contourpy 1.3.1
cpm-kernels 1.0.11
crcmod 1.7
cryptography 43.0.3
cycler 0.12.1
dacite 1.8.1
datasets 3.2.0
deepseek_vl2 1.0.0 /data/public/yasong/DeepSeek-VL2-main
deepspeed 0.14.4
dill 0.3.8
diskcache 5.6.3
distro 1.9.0
dnspython 2.7.0
docstring_parser 0.16
einops 0.8.0
eventlet 0.38.2
exceptiongroup 1.2.2
fastapi 0.115.5
ffmpy 0.4.0
filelock 3.15.4
flash-attn 2.6.3
fonttools 4.54.1
frozenlist 1.4.1
fsspec 2024.5.0
future 1.0.0
gekko 1.2.1
gguf 0.10.0
gradio 5.5.0
gradio_client 1.4.2
greenlet 3.1.1
grpcio 1.64.1
h11 0.14.0
hjson 3.1.0
httpcore 1.0.6
httptools 0.6.4
httpx 0.27.2
huggingface-hub 0.27.1
humanfriendly 10.0
idna 3.7
importlib_metadata 8.5.0
interegular 0.3.3
jieba 0.42.1
Jinja2 3.1.4
jiter 0.7.1
jmespath 0.10.0
joblib 1.4.2
jsonschema 4.23.0
jsonschema-specifications 2024.10.1
kiwisolver 1.4.7
lark 1.2.2
llvmlite 0.43.0
lm-format-enforcer 0.10.6
lxml 5.3.0
Markdown 3.6
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.9.2
mdurl 0.1.2
metrics 0.3.3
mistral_common 1.4.4
modelscope 1.22.0
mpmath 1.3.0
ms-swift 3.0.2.post1
msgpack 1.1.0
msgspec 0.18.6
multidict 6.0.5
multiprocess 0.70.16
nest-asyncio 1.6.0
networkx 3.3
ninja 1.11.1.1
nltk 3.9.1
numba 0.60.0
numpy 1.26.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-ml-py 12.555.43
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.1.105
openai 1.54.4
OpenCC 1.1.9
opencv-python-headless 4.10.0.84
optimum 1.23.3
orjson 3.10.11
oss2 2.19.1
outlines 0.0.46
packaging 24.1
pandas 2.2.2
partial-json-parser 0.2.1.1.post4
PasteDeploy 3.1.0
pathlib2 2.3.7.post1
pathspec 0.5.5
peft 0.12.0
pillow 10.4.0
pip 23.0.1
portalocker 2.10.1
prometheus_client 0.21.0
prometheus-fastapi-instrumentator 7.0.0
protobuf 4.25.3
psutil 6.0.0
py-cpuinfo 9.0.0
pyairports 2.1.1
pyarrow 16.1.0
pyarrow-hotfix 0.6
pycountry 24.6.1
pycparser 2.22
pycryptodome 3.21.0
pydantic 2.9.2
pydantic_core 2.23.4
pydub 0.25.1
pyeclib 1.6.4
Pygments 2.18.0
pyparsing 3.2.0
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
python-multipart 0.0.12
pytz 2024.1
PyYAML 6.0.1
pyzmq 26.2.0
qwen-vl-utils 0.0.8
ray 2.39.0
referencing 0.35.1
regex 2024.5.15
requests 2.32.3
rich 13.9.4
rouge 1.0.1
rpds-py 0.21.0
ruff 0.7.3
s3transfer 0.10.4
sacrebleu 2.4.3
safehttpx 0.1.1
safetensors 0.4.3
scikit-learn 1.5.1
scipy 1.14.0
semantic-version 2.10.0
sentence-transformers 3.2.1
sentencepiece 0.2.0
setuptools 69.5.1
shellingham 1.5.4
shtab 1.7.1
simplejson 3.19.3
six 1.16.0
sniffio 1.3.1
sortedcontainers 2.4.0
starlette 0.41.2
swift 2.34.0
sympy 1.13.1
tabulate 0.9.0
tensorboard 2.17.0
tensorboard-data-server 0.7.2
threadpoolctl 3.5.0
tiktoken 0.7.0
timm 1.0.13
tokenizers 0.21.0
tomlkit 0.12.0
torch 2.4.0
torchaudio 2.3.1
torchvision 0.19.0
tqdm 4.66.4
transformers 4.48.0
transformers-stream-generator 0.0.5
triton 3.0.0
trl 0.11.4
typer 0.13.0
typing_extensions 4.12.2
tyro 0.8.14
tzdata 2024.1
urllib3 2.2.2
uvicorn 0.32.0
uvloop 0.21.0
vllm 0.6.3.post1
vllm-flash-attn 2.5.9.post1
watchfiles 0.24.0
websockets 12.0
Werkzeug 3.0.3
xattr 1.1.0
xformers 0.0.27.post2
xxhash 3.4.1
yarl 1.9.4
zipp 3.21.0
zstandard 0.23.0

@Jintao-Huang
Copy link
Collaborator

请先使用gptq量化 vl模型

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants