-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi Adapter Fixes #472
Multi Adapter Fixes #472
Conversation
…into add-multiple-adapters
Co-authored-by: lewtun <[email protected]>
remove extra files
use the peft config to do this more cleanly remove torch.no_grad as we now don't have any params that require grad in compute_reward_score
fixed model.score incorrect dtype - changed to pretrained_model.dtype added policy_adapter_name explicitly to __init__ refactored compute_reward_score - moved into ppo_trainer - changed to use the .forward method of your model so that accelerator.unwrap is not necessary any more - changed score / seq_cls head use same code as llamaforseqcls - made outputs a scalar per input example instead of a matrix where you need to select [-1, 0] removed unnecessary list comprehensions for `any`
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mnoukhov
Thanks a lot for the PR ! currently a CI test fails:
FAILED tests/test_ppo_trainer.py::PPOTrainerTester::test_peft_model_ppo_adapter_rm_trainer - TypeError: add_and_load_reward_modeling_adapter() missing 1 required positional argument: 'adapter_model_id'
Can you please double check? 🙏 Thanks!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
Sorry I forgot about this PR. Re-merged with main and re-opening. Should have fixed the test error |
@younesbelkada can you re-open this or should I make a new PR? |
Multi-Adapter RL, as it currently is, does not reproduce StackLLaMA results and has some design choices that can be improved. This PR fixes some basics and improves the design. This is the first step in the StackLLaMA repro #471 but I won't add more to the current PR to keep it manageable.
the main changes are
compute_reward_score
to use model.forward so it works seamlessly with accelerate in multi-gpu, this can be changed to just accelerator.unwrap_model otherwisemodel.score
layer's dtypemodel.pretrained_model
requires_grad = False
by usinginference_mode = True
in the rm_adapter_peft_config