You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had a quick glance at the GPTQ paper yesterday, but haven't dug into details yet.
Do you think it is possible to demonstrate a simple routine for performing quantization using this method?
For example, what is the most trivial way (not necessary to be optimal) to implement a function like this:
// src - input 32-bit floats// dst - output quantized data// n - number of input floatsvoidquantize_gptq(float * src, void * dst, int n);
If I can get a prototype of this and it does not look too complex, I can try to plug it in ggml.
The main challenge will be to implement it efficiently with SIMD, but I need to see some initial implementation to work on. Originally posted by @ggerganov in ggerganov/llama.cpp#9 (comment)
@qwopqwop200 This is for a related project. I thought you might be qualified to answer the question above.
@qwopqwop200 This is for a related project. I thought you might be qualified to answer the question above.
Link to original question.
The text was updated successfully, but these errors were encountered: