GPTQ C++ Implementation Question #42

MarkSchmidty · 2023-03-13T22:03:16Z

         I had a quick glance at the GPTQ paper yesterday, but haven't dug into details yet.
Do you think it is possible to demonstrate a simple routine for performing quantization using this method?
For example, what is the most trivial way (not necessary to be optimal) to implement a function like this:
// src - input 32-bit floats
// dst - output quantized data
// n - number of input floats
void quantize_gptq(float * src, void * dst, int n);
If I can get a prototype of this and it does not look too complex, I can try to plug it in ggml.
The main challenge will be to implement it efficiently with SIMD, but I need to see some initial implementation to work on.
Originally posted by @ggerganov in ggerganov/llama.cpp#9 (comment)

@qwopqwop200 This is for a related project. I thought you might be qualified to answer the question above.

Link to original question.

The text was updated successfully, but these errors were encountered:

qwopqwop200 · 2023-03-14T01:05:33Z

I just applied GPTQ to LLaMa. I don't understand the GPTQ algorithm. So I can't answer about quantization.

qwopqwop200 closed this as completed Mar 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTQ C++ Implementation Question #42

GPTQ C++ Implementation Question #42

MarkSchmidty commented Mar 13, 2023

qwopqwop200 commented Mar 14, 2023

GPTQ C++ Implementation Question #42

GPTQ C++ Implementation Question #42

Comments

MarkSchmidty commented Mar 13, 2023

qwopqwop200 commented Mar 14, 2023