Reproducing StackLLaMA with Multi-Adapters #471

mnoukhov · 2023-06-27T17:46:31Z

Now that we've repro'd the StackLLaMA results in #401, I think it would be useful to repro StackLLaMA with the more compute-efficient Multi-Adapter setup #373 to see if we can reduce compute requirements and also as a sort of integration test for Multi-Adapter

mnoukhov · 2023-06-27T18:45:43Z

The multi-adapter fixes make the code better and everything runs but it doesn't repro StackLLaMA. Below is the regular run in brown and the multi-adapter run in yellow.

The initial values are very similar, which is a good sign, but the training doesn't proceed as expected.

These issues could be related to high variance e.g. #462 or the KL estimation and perhaps #423 could help.

Changing mini_batch_size = batch_size to see if that helps

Alternatively, the issue could be the difference in quanitization between reward model and rlhf training. Reward modelling uses a bf16 base LLaMA model whereas we do RLHF training with an int8 LLaMA model.

Retraining a reward model with int8 and running RLHF with it to see if that's the issue.

Also, note that we do see reduced memory usage with Multi-Adapter but sadly memory spikes seem just as high

![image](https://github.com/lvwerra/trl/assets/3391297/21f5f8b8-89c1-4ca0-89ca-

mnoukhov · 2023-06-29T17:22:08Z

Hey @dh2shin, if this isn't specifically related to multi-adapter, can you move the discussion back to #401

dh2shin · 2023-07-06T18:52:46Z

@mnoukhov Hi Michael, how has the int8 experiments gone for the reward model? Are you getting decent accuracy.

mnoukhov · 2023-07-17T21:52:48Z

I was running into issues with ddp and peft (#480 ) but even after getting around them, I haven't managed to make things work. The main thing I found was that you have to train a new reward model adapter on top of the int8.

Even after all this, I'm not managing to repro the original results. There could be a regression in the codebase or it could just be too much instability. I'm still investigating but moving to a smaller scale task in the meantime

Nipi64310 · 2023-07-21T05:24:00Z

Hi @mnoukhov ,
Peft's set_adapter resembles a global mode setting. For example, process A calculates to the 10th layer using the lora weight of reward. At this point, process B changes the current model's active_adapter to ppo, which causes the ppo adapter to be used for calculation from the 11th to the last layer. I'm not sure if this will happen if you use accelerate launch (deepspeed).

mnoukhov · 2023-07-25T22:28:35Z

That is a possible problem and could be solved with a self.accelerator.wait_for_everyone(), but I actually wasn't able to re-reproduce StackLLaMA with the latest codebase (without multiple adapters) so that's really the issue. I'm going to retrain an RM and try again

github-actions · 2023-08-27T15:04:37Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

mnoukhov · 2023-09-01T17:00:28Z

Still relevant, posting so it isn't marked stale.

github-actions · 2023-09-26T15:05:13Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

This was referenced Jun 27, 2023

Reproducing StackLLaMA #401

Closed

Multi Adapter Fixes #472

Closed

github-actions bot closed this as completed Nov 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing StackLLaMA with Multi-Adapters #471

Reproducing StackLLaMA with Multi-Adapters #471

mnoukhov commented Jun 27, 2023

mnoukhov commented Jun 27, 2023 •

edited

Loading

mnoukhov commented Jun 29, 2023

dh2shin commented Jul 6, 2023

mnoukhov commented Jul 17, 2023

Nipi64310 commented Jul 21, 2023 •

edited

Loading

mnoukhov commented Jul 25, 2023

github-actions bot commented Aug 27, 2023

mnoukhov commented Sep 1, 2023

github-actions bot commented Sep 26, 2023

Reproducing StackLLaMA with Multi-Adapters #471

Reproducing StackLLaMA with Multi-Adapters #471

Comments

mnoukhov commented Jun 27, 2023

mnoukhov commented Jun 27, 2023 • edited Loading

mnoukhov commented Jun 29, 2023

dh2shin commented Jul 6, 2023

mnoukhov commented Jul 17, 2023

Nipi64310 commented Jul 21, 2023 • edited Loading

mnoukhov commented Jul 25, 2023

github-actions bot commented Aug 27, 2023

mnoukhov commented Sep 1, 2023

github-actions bot commented Sep 26, 2023

mnoukhov commented Jun 27, 2023 •

edited

Loading

Nipi64310 commented Jul 21, 2023 •

edited

Loading