Simplify configs #550

aliberts · 2024-12-05T11:30:14Z

Blockers

What this does

This PR removes Hydra in favor of Draccus.

This brings significant changes to the codebase regarding how the configurations are built, saved, loaded and used. Most of the commands previously used won't work anymore but hopefully, you'll need to make minimal changes to most of them to make them work.

Overview

Configurations are now defined in the code through dataclasses rather than being defined in yaml files. The two main configuration classes are TrainPipelineConfig and EvalPipelineConfig. Similarly to yaml files previously, their code is heavily commented and is meant to be read in order to understand the options or see the default values.

Reading the updated examples/4_train_policy_with_script.md is a great way to have an overview of how this new config system works.

We've updated the following scripts, for which commands used before won't work:

lerobot/scripts/train.py
lerobot/scripts/eval.py
lerobot/scripts/control_robot.py
lerobot/scripts/visualize_image_transforms.py

Here are a few examples of commands before/after the changes.

Training Diffusion Policy on PushT - before

python lerobot/scripts/train.py \
    hydra.run.dir=outputs/train/diffusion_pusht \
    policy=diffusion \
    dataset_repo_id=lerobot/pusht \
    env=pusht \
    training.offline_steps=200000 \
    training.save_freq=20000 \
    training.eval_freq=20000 \
    eval.n_episodes=50 \
    wandb.enable=true \
    device=cuda

Training Diffusion Policy on PushT - after

python lerobot/scripts/train.py \
  --output_dir=outputs/train/diffusion_pusht \
  --policy.type=diffusion \
  --dataset.repo_id=lerobot/pusht \
  --env.type=pusht \
  --seed=100000 \
  --batch_size=64 \
  --offline.steps=200000 \
  --eval_freq=20000 \
  --save_freq=20000 \
  --wandb.enable=true \
  --device=cuda

Few things to note:

Some options were not present before and must be explicitly passed now. For example, --batch_size=64. This is because with the previous system, the batch_size value was included in the diffusion.yaml, which was implicitly selected with policy=diffusion. Now, the default batch_size is 8 and is independent of policy selection. Same idea for --seed here.
To select a policy or an environment, we now use the special argument .type. Read more about this here.
All options now include a -- prefix.

Evaluating ACT on Aloha Transfer Cube - before

python lerobot/scripts/eval.py \
  -p lerobot/act_aloha_sim_transfer_cube_human \
  eval.n_episodes=10 \
  eval.batch_size=10

Evaluating ACT on Aloha Transfer Cube - after

python lerobot/scripts/eval.py \
  --policy.path=lerobot/act_aloha_sim_transfer_cube_human \
  --env.type=aloha \
  --env.task=TransferCube-v0 \
  --eval.n_episodes=10 \
  --eval.batch_size=10

Running inference of a pretrained model on a SO-100 robot - before

python lerobot/scripts/control_robot.py record \
  --robot-path lerobot/configs/robot/so100.yaml \
  --fps 30 \
  --repo-id ${HF_USER}/eval_act_so100_lego \
  --single-task "Grasp a lego block and put it in the bin." \
  --tags tutorial \
  --warmup-time-s 1 \
  --episode-time-s 30 \
  --reset-time-s 30 \
  --num-episodes 10 \
  --push-to-hub 1 \
  -p outputs/train/act_so100_lego/checkpoints/last/pretrained_model

Running inference of a pretrained model on a SO-100 robot - after

python lerobot/scripts/control_robot.py \
  --robot.type=so100 \
  --control.type=record \
  --control.fps=30 \
  --control.repo_id=${HF_USER}/eval_act_so100_lego \
  --control.single_task="Grasp a lego block and put it in the bin." \
  --control.tags='["tutorial"]' \
  --control.warmup_time_s=1 \
  --control.episode_time_s=30 \
  --control.reset_time_s=30 \
  --control.num_episodes=10 \
  --control.push_to_hub=true \
  --control.policy.path=outputs/train/act_so100_lego/checkpoints/last/pretrained_model

Note that for the following scripts, we didn't update the argument parsing as we didn't feel they needed to. Therefore, these are mainly unaffected by these changes and the commands that worked before should still work with these:

lerobot/scripts/visualize_dataset.py
lerobot/scripts/visualize_dataset_html.py
lerobot/common/robot_devices/cameras/intelrealsense.py
lerobot/common/robot_devices/cameras/opencv.py
lerobot/scripts/configure_motor.py

Motivation

Our previous system for configurations had several limitations:

There was no dynamic link between the features of a dataset or an environment and a policy. This meant that whenever you needed to train on a different set of features from those hardcoded in the config, you needed to hack the config files in order to do so, which was confusing, cumbersome and error prone. In fact, we had to write a whole tutorial on how to do that.
Having configuration entirely defined in yaml files means that their deserialization can sometimes lead to unpredictable, or make configuration errors harder to spot. Moreover, the namespace/dictionaries returned have very little IDE support (things like autocomplete, jump-to-def, etc.)
While Hydra composition can be a powerful feature, it does come with a lot of complexity and the learning curve can be steep.
Overall goal is to simplify the workflow in the different use cases (training, evaluation, recording etc.) and make the scripts easier to use.

Changes

Adds config dataclasses for the scripts.
Moves a lot of the config validation logic that was previously in the scripts into the __post_init__ of these classes.
Adds a custom @parser.wrap() decorator similar to @draccus.wrap() to preprocess command line arguments to enable .path arguments (for policies only for now, e.g. --policy.path)
Replaces all Hydra function calls with their Draccus/custom wrapper/direct config instantiation counterpart:

# wrapper
- @hydra.main(version_base="1.2", config_name="default", config_path="../configs")
+ @parser.wrap()

# parser
- cfg = init_hydra_config(hydra_cfg_path, config_overrides)
+ cfg = draccus.parse(TrainPipelineConfig, config_path=config_path, args=cli_args)

# direct class instantiation
- cfg = init_hydra_config()
+ cfg = TrainPipelineConfig()

Adds HubMixin: A custom implementation of huggingface_hub.ModelHubMixin to better fit our needs (mostly, being able to serialize/deserialize using Draccus).
Adds PreTrainedConfig which policies config classes inherit from. Inspired by transformers.PretrainedConfig, this class will now manage a few common things amongst policy configs and will harmonize their interface.
Similiarly, adds PreTrainedPolicy which policy classes inherit from.
Removes the Policy Protocol in favor of directly using PreTrainedPolicy.
Link input/output shapes of policies to datasets:
- parse_features_from_dataset
- parse_features_from_env
Harmonizes optimizers and schedulers configs with OptimizerConfig and LRSchedulerConfig which create (through the build method) standard optimizers and schedulers from torch.optim or diffusers.optimization whenever possible, and custom implementation when needed.
Additionally, adds a use_policy_training_preset option (true by default) in the training config to allow fro selecting an optimizer/scheduler preset that comes with each policy config. Additionally, each PreTrainedPolicy implements get_optim_params() which returns a dict of parameters specific to that policy to be used by the optimizer (this is only used when use_policy_training_preset is true). This adresses the issues discussed in Move function make_optimizer_and_scheduler to policy #401
last symlink in checkpoints now points to the last checkpoint with a relative path (path was absolute before) which makes it easier to move things around.

TODO in future PR:

control_sim_robot.py -> we won't do it in this PR
Handle MultiLeRobotDataset

How it was tested

This PR allows to enable back a number of tests, including integration tests. Some datasets are still pulled from the hub by the tests but much less since can select a single episode since the datasets v2.

We will refactor the tests in a further PR to make them easier to write/maintain/scale. Notably, we can now remove most of tests/data (we'll still keep backwards compatibility test artifacts but that's okay since they're lightweight).

How to checkout & try? (for the reviewer)

Try out the new version of examples/4_train_policy_with_script.md

dlwh · 2024-12-13T05:25:49Z

lerobot/configs/default.py

+                    "takes precedence.",
+                )
+            # Use the checkpoint config instead of the provided config (but keep `resume` parameter).
+            self = checkpoint_cfg


pretty sure this doesn't do what you want?

Indeed that doesn't work at all, I will handle that part once I'm done with the rest of the config (I'm on the policies right now, which is quite a big chunk).
Thanks for the heads up!

…_30_remove_hydra

Co-authored-by: Simon Alibert <[email protected]>

This reverts commit aa65bb7.

…_30_remove_hydra

tc-huang

Hello @aliberts,
I noticed a few potential typos while reading examples/4_train_policy_with_script.md:

equiped → equipped (line 2)
dictionnaries → dictionaries (line 26)
exemple → example (line 45)

I hope this is helpful！

examples/4_train_policy_with_script.md

Co-authored-by: HUANG TZU-CHUN <[email protected]>

Cadene

A thorough shallow review

Makefile

examples/advanced/2_calculate_validation_loss.py

lerobot/common/policies/pretrained.py

lerobot/common/robot_devices/robots/manipulator.py

lerobot/scripts/control_sim_robot.py

lerobot/scripts/eval.py

lerobot/scripts/push_pretrained.py

Co-authored-by: Remi <[email protected]>

…ggingface/lerobot into user/aliberts/2024_11_30_remove_hydra

Co-authored-by: Remi <[email protected]> Co-authored-by: HUANG TZU-CHUN <[email protected]>

aliberts self-assigned this Dec 5, 2024

aliberts added 🔄 Refactor 🔧 Config labels Dec 5, 2024

dlwh reviewed Dec 13, 2024

View reviewed changes

aliberts added 13 commits January 6, 2025 18:16

Add draccus, create MainConfig

Unverified

The email in this signature doesn’t match the committer email.

GPG key ID: 88CC3F27732845AD

Learn about vigilant mode

3d509aa

WIP refactor train.py and ACT

Unverified

The email in this signature doesn’t match the committer email.

GPG key ID: 88CC3F27732845AD

Learn about vigilant mode

82f197b

Add policies training presets

bed1ec3

Update diffusion policy

0ab28eb

Add pusht and xarm env configs

a82e004

Update tdmpc

d2ca27a

Update vqbet

250e380

Fix poetry relax

d8ad763

Add feature types to envs

928a417

Add EvalPipelineConfig, parse features from envs

b5f3287

Add custom parser

72e84f2

Update pretrained loading mechanisms

f6443d9

Add dependency fixes & lock update

Loading
Loading status checks…

06b604b

aliberts force-pushed the user/aliberts/2024_11_30_remove_hydra branch from 87d92f9 to 06b604b Compare January 6, 2025 17:18

aliberts and others added 12 commits January 6, 2025 22:09

Fix pretrained_path

Loading
Loading status checks…

4a4ef9b

Refactor envs, remove RealEnv

68463a3

Fix typo

2bdf1d2

Enable end-to-end tests

Loading
Loading status checks…

9c6edc2

Fix Makefile

Loading
Loading status checks…

a29a1f1

Log eval config

Loading
Loading status checks…

d83a94c

Fix end-to-end tests

Loading
Loading status checks…

26eef6e

Merge remote-tracking branch 'origin/main' into user/aliberts/2024_11…

e2508f7

…_30_remove_hydra

Remove amp & add resume test

Loading
Loading status checks…

b799e02

Speed-up tests

Loading
Loading status checks…

6c5667a

Fix poetry relax

Loading
Loading status checks…

af96b04

Remove config yaml for robot devices (#594)

Loading
Loading status checks…

4261c5a

Co-authored-by: Simon Alibert <[email protected]>

aliberts and others added 13 commits January 28, 2025 09:25

Add exceptions

Loading
Loading status checks…

ad458b6

Update factories docstrings

3551dc5

Update validations

Loading
Loading status checks…

d86fc23

Remove deprecated config files

Loading
Loading status checks…

940e9d8

Update robot examples with draccus commands (#654)

Loading
Loading status checks…

214083f

Add pusht hack

Loading
Loading status checks…

aa65bb7

Fix

Loading
Loading status checks…

584691c

Fix logging

Loading
Loading status checks…

bac217c

Fix logging

742848b

Fix policy factory

Loading
Loading status checks…

cb18417

Simplify config validation logic

Loading
Loading status checks…

9f85df2

Update example 4

Loading
Loading status checks…

693810f

Revert "Add pusht hack"

Loading
Loading status checks…

03da0a8

This reverts commit aa65bb7.

aliberts marked this pull request as ready for review January 29, 2025 15:24

Cadene and others added 3 commits January 29, 2025 20:47

Fix --control.policy.path (#662)

Loading
Loading status checks…

840b980

Fix train real world (#664)

Loading
Loading status checks…

2666041

Merge remote-tracking branch 'origin/main' into user/aliberts/2024_11…

Loading
Loading status checks…

17029a0

…_30_remove_hydra

tc-huang reviewed Jan 31, 2025

View reviewed changes

examples/4_train_policy_with_script.md Outdated Show resolved Hide resolved

examples/4_train_policy_with_script.md Outdated Show resolved Hide resolved

examples/4_train_policy_with_script.md Outdated Show resolved Hide resolved

aliberts and others added 2 commits January 31, 2025 09:37

Fix torch 2.6 load()

Loading
Loading status checks…

ca1bee7

Apply suggestions from code review

Loading
Loading status checks…

58adcbe

Co-authored-by: HUANG TZU-CHUN <[email protected]>

Cadene approved these changes Jan 31, 2025

View reviewed changes

aliberts and others added 4 commits January 31, 2025 12:05

Fix examples

61b5fe2

Apply suggestions from code review

Loading
Loading status checks…

02cea75

Co-authored-by: Remi <[email protected]>

Merge branch 'user/aliberts/2024_11_30_remove_hydra' of github.com:hu…

f2a20f2

…ggingface/lerobot into user/aliberts/2024_11_30_remove_hydra

Add code review suggestions

Loading
Loading status checks…

71aee4e

aliberts merged commit 3c0a209 into main Jan 31, 2025
7 checks passed

aliberts deleted the user/aliberts/2024_11_30_remove_hydra branch January 31, 2025 12:57

menhguin pushed a commit to menhguin/lerobot that referenced this pull request Feb 9, 2025

Simplify configs (huggingface#550)

e9fb49c

Co-authored-by: Remi <[email protected]> Co-authored-by: HUANG TZU-CHUN <[email protected]>

JIy3AHKO pushed a commit to vertix/lerobot that referenced this pull request Feb 27, 2025

Simplify configs (huggingface#550)

1f4f48a

Co-authored-by: Remi <[email protected]> Co-authored-by: HUANG TZU-CHUN <[email protected]>

genemerewether mentioned this pull request Mar 3, 2025

Policy factory leads to all non-output features being used as inputs #802

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify configs #550

Simplify configs #550

aliberts commented Dec 5, 2024 •

edited

Loading

dlwh Dec 13, 2024

aliberts Dec 16, 2024

tc-huang left a comment

Cadene left a comment

Simplify configs #550

Simplify configs #550

Conversation

aliberts commented Dec 5, 2024 • edited Loading

Blockers

What this does

Overview

Motivation

Changes

TODO in future PR:

How it was tested

How to checkout & try? (for the reviewer)

dlwh Dec 13, 2024

Choose a reason for hiding this comment

aliberts Dec 16, 2024

Choose a reason for hiding this comment

tc-huang left a comment

Choose a reason for hiding this comment

Cadene left a comment

Choose a reason for hiding this comment

aliberts commented Dec 5, 2024 •

edited

Loading