Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing StackLLaMA with Multi-Adapters #471

Closed
mnoukhov opened this issue Jun 27, 2023 · 9 comments
Closed

Reproducing StackLLaMA with Multi-Adapters #471

mnoukhov opened this issue Jun 27, 2023 · 9 comments

Comments

@mnoukhov
Copy link
Contributor

Now that we've repro'd the StackLLaMA results in #401, I think it would be useful to repro StackLLaMA with the more compute-efficient Multi-Adapter setup #373 to see if we can reduce compute requirements and also as a sort of integration test for Multi-Adapter

This was referenced Jun 27, 2023
@mnoukhov
Copy link
Contributor Author

mnoukhov commented Jun 27, 2023

The multi-adapter fixes make the code better and everything runs but it doesn't repro StackLLaMA. Below is the regular run in brown and the multi-adapter run in yellow.

The initial values are very similar, which is a good sign, but the training doesn't proceed as expected.

image

These issues could be related to high variance e.g. #462 or the KL estimation and perhaps #423 could help.

  • Changing mini_batch_size = batch_size to see if that helps

Alternatively, the issue could be the difference in quanitization between reward model and rlhf training. Reward modelling uses a bf16 base LLaMA model whereas we do RLHF training with an int8 LLaMA model.

  • Retraining a reward model with int8 and running RLHF with it to see if that's the issue.

Also, note that we do see reduced memory usage with Multi-Adapter but sadly memory spikes seem just as high

![image](https://github.com/lvwerra/trl/assets/3391297/21f5f8b8-89c1-4ca0-89ca-

@mnoukhov
Copy link
Contributor Author

Hey @dh2shin, if this isn't specifically related to multi-adapter, can you move the discussion back to #401

@dh2shin
Copy link

dh2shin commented Jul 6, 2023

@mnoukhov Hi Michael, how has the int8 experiments gone for the reward model? Are you getting decent accuracy.

@mnoukhov
Copy link
Contributor Author

I was running into issues with ddp and peft (#480 ) but even after getting around them, I haven't managed to make things work. The main thing I found was that you have to train a new reward model adapter on top of the int8.

Even after all this, I'm not managing to repro the original results. There could be a regression in the codebase or it could just be too much instability. I'm still investigating but moving to a smaller scale task in the meantime

@Nipi64310
Copy link

Nipi64310 commented Jul 21, 2023

Hi @mnoukhov ,
Peft's set_adapter resembles a global mode setting. For example, process A calculates to the 10th layer using the lora weight of reward. At this point, process B changes the current model's active_adapter to ppo, which causes the ppo adapter to be used for calculation from the 11th to the last layer. I'm not sure if this will happen if you use accelerate launch (deepspeed).
image

@mnoukhov
Copy link
Contributor Author

That is a possible problem and could be solved with a self.accelerator.wait_for_everyone(), but I actually wasn't able to re-reproduce StackLLaMA with the latest codebase (without multiple adapters) so that's really the issue. I'm going to retrain an RM and try again

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@mnoukhov
Copy link
Contributor Author

mnoukhov commented Sep 1, 2023

Still relevant, posting so it isn't marked stale.

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@github-actions github-actions bot closed this as completed Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants