You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After recent success in Deepseek-R1, leveraging synthetic data with RL is vital on the path to more capable models. Particularly interesting is the idea of rule-based reward signals. A pipeline to finetune an LM in camel with RL would be a great addition
Solution
No response
Alternatives
No response
Additional context
I would like to work on this issue, if its possible please assign it to me
The text was updated successfully, but these errors were encountered:
Required prerequisites
Motivation
After recent success in Deepseek-R1, leveraging synthetic data with RL is vital on the path to more capable models. Particularly interesting is the idea of rule-based reward signals. A pipeline to finetune an LM in camel with RL would be a great addition
Solution
No response
Alternatives
No response
Additional context
I would like to work on this issue, if its possible please assign it to me
The text was updated successfully, but these errors were encountered: