Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full-finetune DPO single device recipe #2082

Open
Tracked by #2081
SalmanMohammadi opened this issue Nov 27, 2024 · 0 comments
Open
Tracked by #2081

Full-finetune DPO single device recipe #2082

SalmanMohammadi opened this issue Nov 27, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@SalmanMohammadi
Copy link
Collaborator

SalmanMohammadi commented Nov 27, 2024

This should be straightforward. The main issue I see coming up is with compile - similar to how we attempt to compile the reference and policy model in our single device PPO recipe. Since the SelfAttentionLayer block is inlined and shared across the models, we're going to hit recompiles due to param.requires_grad. This might be acceptable in this case, since the recompiles won't be as severe as with PPO in it's current state #2066.

We might want to offer some kind of customization around the choice of reference policy model. The only constraint I can think of here is ensuring that both of the reference and policy models share a tokenizer - otherwise users should be able to freely experiment here.

@SalmanMohammadi SalmanMohammadi self-assigned this Nov 27, 2024
@SalmanMohammadi SalmanMohammadi added the enhancement New feature or request label Nov 27, 2024
@SalmanMohammadi SalmanMohammadi removed their assignment Dec 13, 2024
@sam-pi sam-pi mentioned this issue Jan 17, 2025
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant