Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi Adapter Fixes #472

Closed
wants to merge 40 commits into from

Conversation

mnoukhov
Copy link
Contributor

Multi-Adapter RL, as it currently is, does not reproduce StackLLaMA results and has some design choices that can be improved. This PR fixes some basics and improves the design. This is the first step in the StackLLaMA repro #471 but I won't add more to the current PR to keep it manageable.

the main changes are

  • add the reward adapter before calling init so that all the modules are already created
    • makes adding the reward adapter a class method
    • makes the reward adapter name and score module more explicit, setting them in the init
    • this feels like it better follows the design structure I've seen in other huggingface PretrainedModels but I'm happy to change it back
  • refactor compute_reward_score to use model.forward so it works seamlessly with accelerate in multi-gpu, this can be changed to just accelerator.unwrap_model otherwise
    • I assume that we want to parallelize
  • correctly set the model.score layer's dtype
    • otherwise fails if it isn't the same dtype as the model.pretrained_model
  • set reward model lora parameters to requires_grad = False by using inference_mode = True in the rm_adapter_peft_config

mnoukhov added 10 commits June 21, 2023 20:13
use the peft config to do this more cleanly
remove torch.no_grad as we now don't have any params that require grad
in compute_reward_score
fixed model.score incorrect dtype
- changed to pretrained_model.dtype

added policy_adapter_name explicitly to __init__

refactored compute_reward_score
- moved into ppo_trainer
- changed to use the .forward method of your model so that
  accelerator.unwrap is not necessary any more
- changed score / seq_cls head use same code as llamaforseqcls
- made outputs a scalar per input example instead of a matrix where you
  need to select [-1, 0]

removed unnecessary list comprehensions for `any`
@mnoukhov mnoukhov changed the title Multi Adapter fixes - Reproducing StackLLaMA Multi Adapter Fixes Jun 27, 2023
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mnoukhov
Thanks a lot for the PR ! currently a CI test fails:

FAILED tests/test_ppo_trainer.py::PPOTrainerTester::test_peft_model_ppo_adapter_rm_trainer - TypeError: add_and_load_reward_modeling_adapter() missing 1 required positional argument: 'adapter_model_id'

Can you please double check? 🙏 Thanks!

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@github-actions github-actions bot closed this Aug 6, 2023
@mnoukhov
Copy link
Contributor Author

Sorry I forgot about this PR. Re-merged with main and re-opening. Should have fixed the test error

@mnoukhov
Copy link
Contributor Author

@younesbelkada can you re-open this or should I make a new PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants