GPTQ quantization #78

philpax · 2023-03-26T13:58:20Z

The GGML quantization strategy works, but results in a measurable loss in quality. To address this, upstream is investigating the use of the GPTQ algorithm, which quantizes in such a way to reduce the loss: ggerganov/llama.cpp#9

It's possible that this already works if you test it with a GPTQ model and load it in as q4_1, from ggerganov/llama.cpp#9 (comment).

philpax added the issue:enhancement New feature or request label Mar 26, 2023

philpax mentioned this issue Mar 26, 2023

Good ideas from llama.cpp #15

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTQ quantization #78

GPTQ quantization #78

philpax commented Mar 26, 2023

GPTQ quantization #78

GPTQ quantization #78

Comments

philpax commented Mar 26, 2023