-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducing StackLLaMA with Multi-Adapters #471
Comments
The multi-adapter fixes make the code better and everything runs but it doesn't repro StackLLaMA. Below is the regular run in brown and the multi-adapter run in yellow. The initial values are very similar, which is a good sign, but the training doesn't proceed as expected. These issues could be related to high variance e.g. #462 or the KL estimation and perhaps #423 could help.
Alternatively, the issue could be the difference in quanitization between reward model and rlhf training. Reward modelling uses a
Also, note that we do see reduced memory usage with Multi-Adapter but sadly memory spikes seem just as high ![image](https://github.com/lvwerra/trl/assets/3391297/21f5f8b8-89c1-4ca0-89ca- |
@mnoukhov Hi Michael, how has the int8 experiments gone for the reward model? Are you getting decent accuracy. |
I was running into issues with Even after all this, I'm not managing to repro the original results. There could be a regression in the codebase or it could just be too much instability. I'm still investigating but moving to a smaller scale task in the meantime |
Hi @mnoukhov , |
That is a possible problem and could be solved with a |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
Still relevant, posting so it isn't marked stale. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
Now that we've repro'd the StackLLaMA results in #401, I think it would be useful to repro StackLLaMA with the more compute-efficient Multi-Adapter setup #373 to see if we can reduce compute requirements and also as a sort of integration test for Multi-Adapter
The text was updated successfully, but these errors were encountered: