-
Notifications
You must be signed in to change notification settings - Fork 460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I can not reproduce 7b 6.09 Wiki2 PPL. #78
Comments
Depending on the GPUs/drivers, there may be a difference in performance, which decreases as the model size increases.(IST-DASLab/gptq#1) |
Yes I get that but what command do you use to properly get that number from? benchmark or normal at the Evaluating stage? I am confused on what I should be using to properly gauge what I convert. |
@USBhost Which GPU and Nvidia driver version are you using? Maybe we can track this gpu/driver diff when other users report more scores as well. Also, your score is better: lower is better, not higher. That's a good thing right? |
GPU: RTX A6000 If you notice In the op I ran with wikitext2 vs the default? C4. So If I ran with C4 I got wikitext2 6.29, when I ran with groupsize 128 + --true-sequential I got wikitext2 6.25. |
@USBhost Looking at the code and reading the arvix paper, here are my thoughts on variance.
The code by design will produce different score every single time since the calibration point is randomized. |
But I am able to reproduce my results though. That random doesn't seem to be doing what we think it does. |
Interesting. I never tried to run the same config more than once myself since it takes forever. Are you getting the exact values down to the significant digits on repeat quantizing with same config? |
To the exact last digit. |
I have the same issue using an A6000 GPU. |
This value is obtained from GPTQ. Please ask GPTQ for details. |
Nvm. I found that the reported results could be reproduced by using |
For 7b really? @Xiuyu-Li ? new-eval does not affect Wiki2 . |
I can not seem to get that. It's ether smaller or a little bigger. Can you guys provide the command you used to get this?
In my tests I ended up getting Wiki2 6.29,6.25 and 5.9 trying different settings. Also what's the correct why to do these tests? What is the correct way to check? benchmark check or normal saving at the Evaluating stage?
python -u ../repositories/GPTQ-for-LLaMa/llama.py llama-7b wikitext2 --new-eval --wbits 4 --act-order --true-sequential --save_safetensors llama-7b-4bit.safetensors
c4-new: 7.843033313751221
ptb-new: 10.846735000610352
wikitext2: 5.92544412612915
python -u ../repositories/GPTQ-for-LLaMa/llama.py llama-7b wikitext2 --new-eval --wbits 4 --act-order --true-sequential --load llama-7b-4bit.safetensors --benchmark 2048 --check
Median: 0.0950855016708374
PPL: 6.688839912414551
max memory(MiB): 1712.3349609375
The text was updated successfully, but these errors were encountered: