-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
load bf16 directly, and some "quality of life" handling of fp32/fp16/bf16 precisions #265
Conversation
…tensors to put the layernorms at the end. the training loop seems to work ok, and the tests pass and the loss and optimization looks ok, but the gradients don't match. which can't be right. so there is a bug, but it's a bit too late in the day for me to debug right now, creating a PR and going to sleep, will fix tomorrow
I marked this from DRAFT to PR because it is technically done afaik, and possibly could merge. |
…ded them in the old order, so yeah...
… for some tensors and i don't exactly know why sad
the only new functionality technically now is that .py writes bf16 file directly, and C loads it directly, if in bf16. sadly, test_gpt2.cu I had to 3X some of the tolerances for reasons I don't understand, as this change should be a total noop, I just re-shuffled the memory around. This now makes me feel a bit uncomfortable again and like there is still some bug lurking... the layernorms remain in their old places |
…g around precisions
…ested via defines
…fp32 or bf16 or fp16. fp16 will error, though
Code to load bf16 weights directly, and also re-wire the position of tensors to put the layernorms (which are in fp32) at the end. the training loop seems to work ok, and the tests pass and the loss and optimization looks ok, but the gradients don't match. which can't be right. so there is a bug, but it's a bit too late in the day for me to debug right now, creating a PR and going to sleep, will fix tomorrow