Skip to content

Latest commit

 

History

History
78 lines (57 loc) · 3.03 KB

diffusion_lr_scheduler.md

File metadata and controls

78 lines (57 loc) · 3.03 KB

In this experiment I wanted to see if, for Diffusion Policy trained on the PushT environment, using a (cosine) learning rate scheduler would have any benefits over using a flat learning rate throughout training.

TLDR: At least in this setting, the LR scheduler appears to help!

Here's what the LR looks like for each of my experiments (200k training steps):

There are two evaluation metrics on a per-episode basis:

  • Maximum overlap with target (seen as eval/avg_max_reward in the charts above). This ranges in [0, 1].
  • Success: whether or not the maximum overlap is at least 95%.

Here's what the overlap metric looks like (50 evaluations per point with random initializations, plot lines are smoothed):

Here's what the success rate looks like (50 evals per point with random initializations, plot lines are smoothed)

We're clearly seeing that the training run with the cosine scheduler seems to outperform the others. We also get a hint that a flat LR of 1e-4 might be a bit high.

Here are the metrics for 500 episodes worth of evaluation. For the success rate I added an extra row with confidence bounds (see beta_distribution.md for details on how I compute these).

Cosine (1e-4) Flat (1e-4) Flat (5e-4) Flat (2.5e-4)
Average max. overlap ratio 0.952 0.923 0.918 0.935
Success rate for 500 episodes (%) 56.0 39.2 33.8 36.4
Beta distribution lower/mean/upper (%) 53.8 / 56.0 / 58.2 37.1 / 39.2 / 41.4 31.8 / 33.9 / 36.0 34.3 / 36.5 / 38.6

Appendix: Reproducibility

The experiments were conducted on this branch of the LeRobot library.

The results were logged to WandB. You may view them here.

The training commands were all variations of:

python lerobot/scripts/train.py \
    hydra.job.name=$JOB_NAME \
    hydra.run.dir=/fsx/$USER/outputs/train/${now:%Y-%m-%d}/${now:%H-%M-%S}_${JOB_NAME} \
    env=pusht \
    policy=diffusion \
    training.save_checkpoint=true \
    training.offline_steps=200000 \
    training.save_freq=20000 \
    training.eval_freq=10000 \
    training.log_freq=50 \
    training.lr_scheduler=null \
    eval.n_episodes=50 \
    eval.batch_size=50 \
    wandb.enable=true \
    wandb.disable_artifact=true \
    device=cuda \
    use_amp=true

where I've used:

  • training.lr_scheduler=cosine training.lr=1e-4
  • training.lr_scheduler=null training.lr=1e-4
  • training.lr_scheduler=null training.lr=5e-5
  • training.lr_scheduler=null training.lr=2.5e-5

For the 500 evaluations used to produce the table, I sped things up a bit by using DDIM with 10 inference steps:

python lerobot/scripts/eval.py \
    -p checkpoint_dir/last/pretrained_model \
    eval.n_episodes=500 \
    eval.batch_size=50 \
    eval.use_async_envs=true \
    policy.noise_scheduler_type=DDIM \
    policy.num_inference_steps=10