-
Notifications
You must be signed in to change notification settings - Fork 205
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Feature] SAC + reverse forward curriculum learning (#365)
* support using first env state and replaying actions on the GPU for CPU-GPU transfer * work * add offline buffer, better logging * work * Update sac.py * fixes * working baseline for difficult pick cube task (due to static requirement) * partial reset in GPU sim support + partial reset support for RFCL + SAC * bug fix * bug fixes * bug fixes * bug fixes * Update README.md * add citations, docs * docs * docs * reorganize code, note forward curr not done yet. * fixes * weighted trajectory sampling
- Loading branch information
1 parent
993c56f
commit 7033cb2
Showing
21 changed files
with
1,083 additions
and
57 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,12 @@ | ||
# Workflows | ||
|
||
We provide a number of tuned baselines/workflows for robot learning and training autonomous policies to solve robotics tasks. These span learning from demonstrations/imitation learning and reinforcement learning. | ||
|
||
This is still a WIP but we plan to upload as many pretrained checkpoints, training curves, etc. for all solvable tasks (some need much more advanced techniques to solve) online for people to research and compare with. | ||
|
||
```{toctree} | ||
:titlesonly: | ||
:glob: | ||
* | ||
learning_from_demos/index | ||
reinforcement_learning/index | ||
``` |
This file was deleted.
Oops, something went wrong.
24 changes: 24 additions & 0 deletions
24
docs/source/user_guide/workflows/learning_from_demos/index.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# Learning from Demonstrations / Imitation Learning | ||
|
||
We provide a number of different baselines spanning different categories of learning from demonstrations research: Behavior Cloning / Supervised Learning, Offline Reinforcement Learning, and Online Learning from Demonstrations. | ||
|
||
As part of these baselines we establish a few standard learning from demonstration benchmarks that cover a wide range of difficulty (easy to solve for verification but not saturated) and diversity in types of demonstrations (human collected, motion planning collected, neural net policy generated) | ||
|
||
**Behavior Cloning Baselines** | ||
| Baseline | Code | Results | | ||
| ---------------------------------- | ---- | ------- | | ||
| Standard Behavior Cloning (BC) | WIP | WIP | | ||
| Diffusion Policy (DP) | WIP | WIP | | ||
| Action Chunk Transformers (ACT) | WIP | WIP | | ||
|
||
|
||
**Online Learning from Demonstrations Baselines** | ||
|
||
| Baseline | Code | Results | Paper | | ||
| --------------------------------------------------- | ----------------------------------------------------------------------------------- | ------- | ---------------------------------------- | | ||
| SAC+Reverse Forward Curriculum Learning (SAC+RFCL)* | [Link](https://github.com/haosulab/ManiSkill/blob/main/examples/baselines/sac-rfcl) | WIP | [Link](https://arxiv.org/abs/2405.03379) | | ||
| Reinforcement Learning from Prior Data (RLPD) | WIP | WIP | [Link](https://arxiv.org/abs/2302.02948) | | ||
| SAC + Demos (SAC+Demos) | WIP | N/A | | | ||
|
||
|
||
\* - This indicates the baseline uses environment state reset |
This file was deleted.
Oops, something went wrong.
15 changes: 15 additions & 0 deletions
15
docs/source/user_guide/workflows/reinforcement_learning/index.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# Reinforcement Learning (WIP) | ||
|
||
We provide a number of different baselines that learn from rewards. For RL baselines that leverage demonstrations see the [learning from demos section](../learning_from_demos/) | ||
|
||
As part of these baselines we establish a few reinforcement learning benchmarks that cover a wide range of difficulties (easy to solve for verification but not saturated) and diversity in types of robotics task, including but not limited to classic control, dextrous manipulation, table-top manipulation, mobile manipulation etc. | ||
|
||
|
||
Online Reinforcement Learning Baselines | ||
|
||
| Baseline | Code | Results | Paper | | ||
| ------------------------------------------------------------------- | ------------------------------------------------------------------------------ | ------- | ---------------------------------------- | | ||
| Proximal Policy Optimization (PPO) | [Link](https://github.com/haosulab/ManiSkill/blob/main/examples/baselines/ppo) | WIP | [Link](http://arxiv.org/abs/1707.06347) | | ||
| Soft Actor Critic (SAC) | [Link](https://github.com/haosulab/ManiSkill/blob/main/examples/baselines/sac) | WIP | [Link](https://arxiv.org/abs/1801.01290) | | ||
| Temporal Difference Learning for Model Predictive Control (TD-MPC2) | WIP | WIP | [Link](https://arxiv.org/abs/2310.16828) | | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
runs |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# Reverse Forward Curriculum Learning | ||
|
||
Fast offline/online imitation learning in simulation based on "Reverse Forward Curriculum Learning for Extreme Sample and Demo Efficiency in Reinforcement Learning (ICLR 2024)". Code adapted from https://github.com/StoneT2000/rfcl/ | ||
|
||
Currently this code only works with environments that do not have geometry variations between parallel environments (e.g. PickCube). | ||
|
||
Code has been tested and working on the following environments: PickCube-v1 | ||
|
||
This implementation currently does not include the forward curriculum. | ||
|
||
## Download and Process Dataset | ||
|
||
Download demonstrations for a desired task e.g. PickCube-v1 | ||
```bash | ||
python -m mani_skill.utils.download_demo "PickCube-v1" | ||
``` | ||
|
||
Process the demonstrations in preparation for the imitation learning workflow | ||
```bash | ||
python -m mani_skill.trajectory.replay_trajectory \ | ||
--traj-path ~/.maniskill/demos/PickCube-v1/teleop/trajectory.h5 \ | ||
--use-first-env-state -b "gpu" \ | ||
-c pd_joint_delta_pos -o state \ | ||
--save-traj | ||
``` | ||
|
||
## Train | ||
|
||
```bash | ||
python sac_rfcl.py --env_id="PickCube-v1" \ | ||
--num_envs=16 --training_freq=32 --utd=0.5 --buffer_size=1_000_000 \ | ||
--total_timesteps=1_000_000 --eval_freq=25_000 \ | ||
--dataset_path=~/.maniskill/demos/PickCube-v1/teleop/trajectory.state.pd_joint_delta_pos.h5 \ | ||
--num-demos=5 --seed=2 --save_train_video_freq=15 | ||
``` | ||
|
||
|
||
## Additional Notes about Implementation | ||
|
||
For SAC with RFCL, we always bootstrap on truncated/done. | ||
|
||
## Citation | ||
|
||
If you use this baseline please cite the following | ||
``` | ||
@article{tao2024rfcl, | ||
title={Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in RL}, | ||
author={Tao, Stone and Shukla, Arth and Chan, Tse-kai and Su, Hao}, | ||
booktitle = {International Conference on Learning Representations (ICLR)}, | ||
year={2024} | ||
} | ||
``` |
Oops, something went wrong.