You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I reviewed the Discussions, and have a new bug or useful enhancement to share.
Feature Description
The LLaMA 3 8b Scoreboard on the following link was computed against fp16. https://github.com/ggerganov/llama.cpp/tree/master/examples/perplexity
However, the model was released as bf16 weights. Is there a quantifiable negative impact on perplexity due to conversion between weight formats? Or a difference when compating perplexity against bf16 instead of fp16? It's unclear. Even a brief mention of this could bring clarity.
Motivation
Curiosity about the impact of bf16 versus fp16 on models, and subsequent training/merging.
Possible Implementation
If you have an idea as to how it can be implemented, please write a detailed description. Feel free to give links to external sources or share visuals that might be helpful to understand the details better.
The text was updated successfully, but these errors were encountered:
I did not explicitly check the effect of FP16/BF16 as an intermediary but when using them directly I basically found no relevant differences: #7150 .
And because the FP16 vs. BF16 differences seem to be much smaller than even the FP16 vs. q8_0 differences I think it's safe to just FP16 even if the original weights are BF16.
Note: if the original weights contain values larger than the max. representable FP16 value that could potentially cause issues but you would run into those anyways once you do the conversion to the final quant format.
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Feature Description
The LLaMA 3 8b Scoreboard on the following link was computed against fp16. https://github.com/ggerganov/llama.cpp/tree/master/examples/perplexity
However, the model was released as bf16 weights. Is there a quantifiable negative impact on perplexity due to conversion between weight formats? Or a difference when compating perplexity against bf16 instead of fp16? It's unclear. Even a brief mention of this could bring clarity.
Motivation
Curiosity about the impact of bf16 versus fp16 on models, and subsequent training/merging.
Possible Implementation
If you have an idea as to how it can be implemented, please write a detailed description. Feel free to give links to external sources or share visuals that might be helpful to understand the details better.
The text was updated successfully, but these errors were encountered: