Add AWQ models (pre-computed search results) #490

ymcui · 2024-01-02T08:36:01Z

AWQ (Activation-aware Weight Quantization) is an efficient quantization method for LLMs.

The pre-computed search results of our models are available (--w_bit 4 --q_group_size 128): https://huggingface.co/hfl/chinese-llama-alpaca-2-awq

Detailed usage should refer to:

The followings are several benchmarks w.r.t. PPL (lower is better) under llama.cpp.

Model	Q8_0	Q4_K	Q4_1	Q4_0
LLAMA-2-7B	9.4367 +/- 0.19841	9.5558 +/- 0.19945	9.9055 +/- 0.2064	9.7749 +/- 0.20262
LLAMA-2-7B-W4-G128	9.4825 +/- 0.19979	9.6023 +/- 0.20070	9.6019 +/- 0.19871	9.7943 +/- 0.20638
LLAMA-2-7B-16K	9.3918 +/- 0.20264	9.5362 +/- 0.20393	9.8198 +/- 0.20869	9.8385 +/- 0.20930
LLAMA-2-7B-16K-W4-G128	9.4406 +/- 0.20433	9.6051 +/- 0.20763	9.6090 +/- 0.20606	9.6826 +/- 0.20876

Model	Q8_0	Q4_K	Q4_1	Q4_0
Alpaca-2-7B	8.1665 +/- 0.11201	8.3177 +/- 0.11366	8.9491 +/- 0.12054	8.6379 +/- 0.11857
Alpaca-2-7B-W4-G128	8.2231 +/- 0.11298	8.3437 +/- 0.11456	8.4342 +/- 0.11515	8.4620 +/- 0.11681
Alpaca-2-7B-16K	8.7512 +/- 0.12241	8.9539 +/- 0.12490	9.5298 +/- 0.13157	9.6554 +/- 0.13464
Alpaca-2-7B-16K-W4-G128	8.7890 +/- 0.12288	8.9361 +/- 0.12447	8.9941 +/- 0.12498	9.0204 +/- 0.12591
Alpaca-2-7B-RLHF	8.2941 +/- 0.11139	8.4552 +/- 0.11323	9.2239 +/- 0.12269	8.7774 +/- 0.11834
Alpaca-2-7B-RLHF-W4-G128	8.3554 +/- 0.11242	8.4957 +/- 0.11406	8.6361 +/- 0.11541	8.5973 +/- 0.11602

Conclusion: If you are using Q4_1 or Q4_0 quantization in llama.cpp, please consider using AWQ.

None.

copilot:walkthrough

update AWQ models

76032f5

ymcui requested a review from iMountTai January 2, 2024 08:41

iMountTai approved these changes Jan 2, 2024

View reviewed changes

ymcui merged commit 7d9b627 into main Jan 2, 2024

ymcui deleted the awq branch January 2, 2024 09:18

Provide feedback