Multi Adapter Fixes #472

mnoukhov · 2023-06-27T18:36:09Z

Multi-Adapter RL, as it currently is, does not reproduce StackLLaMA results and has some design choices that can be improved. This PR fixes some basics and improves the design. This is the first step in the StackLLaMA repro #471 but I won't add more to the current PR to keep it manageable.

the main changes are

add the reward adapter before calling init so that all the modules are already created
- makes adding the reward adapter a class method
- makes the reward adapter name and score module more explicit, setting them in the init
- this feels like it better follows the design structure I've seen in other huggingface PretrainedModels but I'm happy to change it back
refactor compute_reward_score to use model.forward so it works seamlessly with accelerate in multi-gpu, this can be changed to just accelerator.unwrap_model otherwise
- I assume that we want to parallelize
correctly set the model.score layer's dtype
- otherwise fails if it isn't the same dtype as the model.pretrained_model
set reward model lora parameters to requires_grad = False by using inference_mode = True in the rm_adapter_peft_config

…into add-multiple-adapters

Co-authored-by: lewtun <[email protected]>

remove extra files

use the peft config to do this more cleanly remove torch.no_grad as we now don't have any params that require grad in compute_reward_score

fixed model.score incorrect dtype - changed to pretrained_model.dtype added policy_adapter_name explicitly to __init__ refactored compute_reward_score - moved into ppo_trainer - changed to use the .forward method of your model so that accelerator.unwrap is not necessary any more - changed score / seq_cls head use same code as llamaforseqcls - made outputs a scalar per input example instead of a matrix where you need to select [-1, 0] removed unnecessary list comprehensions for `any`

…rd-fixes

HuggingFaceDocBuilderDev · 2023-06-27T18:40:30Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

younesbelkada

Hi @mnoukhov
Thanks a lot for the PR ! currently a CI test fails:

FAILED tests/test_ppo_trainer.py::PPOTrainerTester::test_peft_model_ppo_adapter_rm_trainer - TypeError: add_and_load_reward_modeling_adapter() missing 1 required positional argument: 'adapter_model_id'

Can you please double check? 🙏 Thanks!

github-actions · 2023-07-28T15:05:07Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

mnoukhov · 2023-08-15T23:05:29Z

Sorry I forgot about this PR. Re-merged with main and re-opening. Should have fixed the test error

mnoukhov · 2023-08-24T20:52:58Z

@younesbelkada can you re-open this or should I make a new PR?

younesbelkada and others added 30 commits May 12, 2023 12:14

fix doc

a9dba69

adapt from suggestions

4b0fda0

working v1 multiple adapters

c2667ea

Merge branch 'main' into add-multiple-adapters

42ac826

style

8ebb377

Merge branch 'add-multiple-adapters' of https://github.com/lvwerra/trl …

d4fe6c8

…into add-multiple-adapters

style && quality

bd55cbe

oops

444b4f9

docs

946c75d

add tests and docs

380638a

add RM script

1d7e7fb

Apply suggestions from code review

8a07e5c

Co-authored-by: lewtun <[email protected]>

Update docs/source/0_abstraction_rl.mdx

08c4555

Apply suggestions from code review

4c91f5c

Update docs/source/0_abstraction_rl.mdx

6341d89

add 4bit

9654b38

replace with reward_adapter

83169be

explain break

581fc89

simple comment

d3d768a

fix llama tokenizer

3dfe97b

fixes

6da9701

fixes

0ce2012

rename

e96d6bd

Merge branch 'main' into add-multiple-adapters

e144650

quality

bbc3296

rm unneeded file

6c724ea

add disclaimer

aa101c3

multi adapter value function

43256e0

copy rl training script

6e986d7

multi_adapter_model_value

3decb99

mnoukhov added 10 commits June 21, 2023 20:13

updated with changes from latest rl training

81228b0

multi adapter value function

3ce288f

unwrap model for accelerator

b9dfede

multi adapter value function

a181189

remove extra file

5142e78

add fixes to multi adapter rl example

5d20593

remove extra files

set reward lora params to not require grad

fedd43f

use the peft config to do this more cleanly remove torch.no_grad as we now don't have any params that require grad in compute_reward_score

removed extra compute_reward_score func

497c19b

Merge branch 'main' of github.com:lvwerra/trl into multi-adapter-rewa…

83cfef0

…rd-fixes

mnoukhov changed the title ~~Multi Adapter fixes - Reproducing StackLLaMA~~ Multi Adapter Fixes Jun 27, 2023

younesbelkada reviewed Jun 27, 2023

View reviewed changes

github-actions bot closed this Aug 6, 2023

mnoukhov mentioned this pull request Nov 10, 2023

[Multi-Adapter PPO] Fix and Refactor reward model adapter #982

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi Adapter Fixes #472

Multi Adapter Fixes #472

mnoukhov commented Jun 27, 2023

HuggingFaceDocBuilderDev commented Jun 27, 2023

younesbelkada left a comment

github-actions bot commented Jul 28, 2023

mnoukhov commented Aug 15, 2023

mnoukhov commented Aug 24, 2023

Multi Adapter Fixes #472

Multi Adapter Fixes #472

Conversation

mnoukhov commented Jun 27, 2023

HuggingFaceDocBuilderDev commented Jun 27, 2023

younesbelkada left a comment

Choose a reason for hiding this comment

github-actions bot commented Jul 28, 2023

mnoukhov commented Aug 15, 2023

mnoukhov commented Aug 24, 2023