Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

GPTQ quantization #78

Open
philpax opened this issue Mar 26, 2023 · 0 comments
Open

GPTQ quantization #78

philpax opened this issue Mar 26, 2023 · 0 comments
Labels
issue:enhancement New feature or request

Comments

@philpax
Copy link
Collaborator

philpax commented Mar 26, 2023

The GGML quantization strategy works, but results in a measurable loss in quality. To address this, upstream is investigating the use of the GPTQ algorithm, which quantizes in such a way to reduce the loss: ggerganov/llama.cpp#9

It's possible that this already works if you test it with a GPTQ model and load it in as q4_1, from ggerganov/llama.cpp#9 (comment).

@philpax philpax added the issue:enhancement New feature or request label Mar 26, 2023
@philpax philpax mentioned this issue Mar 26, 2023
6 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
issue:enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant