Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the Implementation of GRPO #2681

Open
macheng6 opened this issue Jan 29, 2025 · 1 comment
Open

About the Implementation of GRPO #2681

macheng6 opened this issue Jan 29, 2025 · 1 comment
Labels
🏋 GRPO Related to GRPO ❓ question Seeking clarification or more information

Comments

@macheng6
Copy link

Is the result of x-x.detach() all zero?

@github-actions github-actions bot added 🏋 GRPO Related to GRPO ❓ question Seeking clarification or more information labels Jan 29, 2025
@qgallouedec
Copy link
Member

So it's more a torch question, but I guess you refer to

# x - x.detach() allows for preserving gradients from x
per_token_loss = torch.exp(per_token_logps - per_token_logps.detach()) * advantages.unsqueeze(1)

Yes, the result is zero, but you can't discard it, you need the gradient, see #2565 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏋 GRPO Related to GRPO ❓ question Seeking clarification or more information
Projects
None yet
Development

No branches or pull requests

2 participants