Support iterative GRPO #2684

howardzhou · 2025-01-29T19:37:09Z

Feature request

Hi,

The GRPO paper also mentioned an interactive version of GRPO which allows for periodic update of reference model (see Algorithm 1). This has shown good performance for cold-start models, see Figure 6. For DeepSeek-R1-Zero, since it's not SFT'ed, they very likely used this iterative version of GRPO.

Currently, in GRPO trainer, the reference model cannot be updated: https://github.com/huggingface/trl/blob/main/trl/trainer/grpo_trainer.py#L200-L209

        # Reference model
        if is_deepspeed_zero3_enabled():
            self.ref_model = AutoModelForCausalLM.from_pretrained(model_id, **model_init_kwargs)
        elif peft_config is None:
            # If PEFT configuration is not provided, create a reference model based on the initial model.
            self.ref_model = create_reference_model(model)
        else:
            # If PEFT is used, the reference model is not needed since the adapter can be disabled
            # to revert to the initial model.
            self.ref_model = None

Would be nice to support periodically updating of reference model in Trainer, e.g. after each epoch or certain steps.

Motivation

Might be important for reproducing DeepSeek-R1-Zero and DeepSeek-R1

Your contribution

ideas:

may possibly change in compute_loss function, but not seems to be an elegant way.
may need to overwrite the train function in base trainer and adding that option

The text was updated successfully, but these errors were encountered:

qgallouedec · 2025-01-29T20:51:07Z

What about using a callback? That the trainer would internally add.

shirinyamani · 2025-01-29T22:46:38Z

@qgallouedec Do you mean callbacks: Optional[list[TrainerCallback]] ?

qgallouedec · 2025-01-29T22:58:32Z

Not really. Like an arg in the GRPOConfig, let's say sync_ref_steps. And having in the init of GRPO.__init__ something like

if args.sync_ref_steps is not None:
    sync_ref_callback = SyncRefCallback(args.sync_ref_steps)
    self.add_callback(sync_ref_callback)

qgallouedec · 2025-01-29T23:03:24Z

Note that we already have such callback in trl, I think it makes sense to reuse it

howardzhou · 2025-01-29T23:04:40Z

nice! I guess you mean SyncRefModelCallback

trl/trl/trainer/callbacks.py

Line 96 in 801582e

class SyncRefModelCallback(TrainerCallback):

shirinyamani · 2025-01-29T23:15:38Z

with the current callbacks existing in trl, it would be sth like;

if args.sync_ref_model:
   self.add_callback(SyncRefModelCallback(ref_model=self.ref_model, accelerator=self.accelerator))

qgallouedec · 2025-01-29T23:15:42Z

Yep

shirinyamani · 2025-01-30T01:41:57Z

I'll take care of it!

github-actions bot added ✨ enhancement New feature or request 🏋 GRPO Related to GRPO ⚡ PEFT Related to PEFT labels Jan 29, 2025

qgallouedec assigned shirinyamani Jan 30, 2025

shirinyamani mentioned this issue Jan 30, 2025

🔁 🦈 Support iterative GRPO #2700

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support iterative GRPO #2684

Support iterative GRPO #2684

howardzhou commented Jan 29, 2025

qgallouedec commented Jan 29, 2025

shirinyamani commented Jan 29, 2025

qgallouedec commented Jan 29, 2025

qgallouedec commented Jan 29, 2025

howardzhou commented Jan 29, 2025

shirinyamani commented Jan 29, 2025

qgallouedec commented Jan 29, 2025

shirinyamani commented Jan 30, 2025

Support iterative GRPO #2684

Support iterative GRPO #2684

Comments

howardzhou commented Jan 29, 2025

Feature request

Motivation

Your contribution

qgallouedec commented Jan 29, 2025

shirinyamani commented Jan 29, 2025

qgallouedec commented Jan 29, 2025

qgallouedec commented Jan 29, 2025

howardzhou commented Jan 29, 2025

shirinyamani commented Jan 29, 2025

qgallouedec commented Jan 29, 2025

shirinyamani commented Jan 30, 2025