Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MuJoCo Robotics Envs HER+TQC trained agents #71

Merged
merged 5 commits into from
Mar 15, 2021
Merged

Add MuJoCo Robotics Envs HER+TQC trained agents #71

merged 5 commits into from
Mar 15, 2021

Conversation

sgillen
Copy link
Contributor

@sgillen sgillen commented Mar 12, 2021

I did get around to this eventually :P.

Adding trained agents for her + sac on the mujoco robotics environments.
I left in the best_model too, this only matters for FetchSlide, where the best agents gets around 50% success, compared to 20% for the latest. The other three environments all get to 100%. I think this roughly matches the results from the original HER paper with DDPG.

Description

Updated hyperparams to the her.yml file, and added trained agents to the rl-trained-agents submodule.

Checklist:

  • [ x] I've read the CONTRIBUTION guide (required)
  • [ N/A ??? ] I have updated the changelog accordingly (required).
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.
  • I have reformatted the code using make format (required)
  • I have checked the codestyle using make check-codestyle and make lint (required)
  • I have ensured make pytest and make type both pass. (required)

Note: we are using a maximum length of 127 characters per line


This change is Reviewable

@sgillen
Copy link
Contributor Author

sgillen commented Mar 12, 2021

Looks like your checks are failing because I committed to the rl-trained-agents submodule, let me know if there is another way I should structure this commit.

@araffin
Copy link
Member

araffin commented Mar 12, 2021

Hello,
thanks for the PR =) and sorry for the lack of clarity, next time would be better to discuss everything before in an issue (so we don't do the job twice ;)).

To sum up what should be done:

  • please use master version of the rl zoo (I recently merged most of the models there and updated the hyperparameters for HER)
  • please do not change the hyperparameters and use TQC (from SB3 contrib) for training. The only hyperparameter you may change is the n_timesteps for FetchSlide (probably 2e6 is more appropriate)
  • you need to do a first PR here: https://github.com/DLR-RM/rl-trained-agents (where the agents are stored) and it should contain the trained agent, the monitor file and the evaluation file, all created automatically by the zoo. You can delete the best model to save some spaces too.
  • the changelog is here:https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/CHANGELOG.md

last thing, once your PR in the rl-trained-agents repo is merged, please run python -m utils.benchmark (cf readme) to update the benchmark file.

@sgillen
Copy link
Contributor Author

sgillen commented Mar 12, 2021

OK, that mostly makes sense.

please use master version of the rl zoo (I recently merged most of the models there and updated the hyperparameters for HER)

Maybe I'm confused, but this is a PR for the master branch? I cloned the latest (after you merged the models) and reran her with the params I had (mostly found from the original rl-baselines-zoo). I chose those because when I tried whichever params where in her last week all tasks except FetchReach failed (as in, ran fine but never got good reward on the tasks). But I have some time today so I'll run the current params with the current sb3 and make a new PR with the steps you detailed above if it works.

Just to check, it looks like on the current master of rl-baselines3-zoo the her hyperparams use tqc for every Fetch* env except FetchReach which uses sac. Is this intentional?

@araffin
Copy link
Member

araffin commented Mar 12, 2021

Maybe I'm confused, but this is a PR for the master branch?

sorry, I read it too quickly. That's fine then ;)

I chose those because when I tried whichever params where in her last week all tasks except FetchReach failed

The one that are there (TQC + 3 layers + some additional custom params) I remember testing them in a google colab and it worked back then (in around 4e5 timesteps for Pick and Place) (but I don't have a proper license anymore...).

Just to check, it looks like on the current master of rl-baselines3-zoo the her hyperparams use tqc for every Fetch* env except FetchReach which uses sac. Is this intentional?

Yes, TQC is SAC + Distributional RL. FetchReach is super simple to solve (in 5 minutes normally), so the algorithm choice does not matter much here.

@araffin
Copy link
Member

araffin commented Mar 13, 2021

Last thing I forgot to mentioned (but I think you are already doing it): you should use master version of SB3 (1.0rc2)

EDIT: it should normally change nothing, it mostly for consistency

@araffin
Copy link
Member

araffin commented Mar 13, 2021

More importantly: what version of python/gym/Mujoco are you using? (we should also document that somewhere)
Please use python 3.6 if possible (as we can use custom objects to load the trained model in python 3.8+ but the other way around there is a pickle incompatibility...)

@sgillen
Copy link
Contributor Author

sgillen commented Mar 13, 2021

Ok, look like everything worked well this time, but I want to re-run FetchSlide with more time.

python==3.6.10
gym==0.18.0
mujoco_py==1.50.1.0

Which corresponds to mujoco 1.5 (not 2.0) due to openai/gym#1541. Not sure that's relevant here but I've just been using mujoco 1.5 ever since, never needed any of the new features. gym was downgraded as well to accommodate this.

And stable-baselines3 is at the latest commit from the github on master (which corresponds to 1.0rc2)

Like I said I am going to re run at least FetchSlide, let me know soon if you want to changes any of the above. I have a mujoco 2.0 install / key that works just fine too, and changing the rest is easy. When that finishes (probably another whole day...) I will open the pull request in rl-trained-agents/. Do you want the PR here to be a new one? or should I just overwrite the commits here and keep this PR?

@araffin
Copy link
Member

araffin commented Mar 13, 2021

Ok, look like everything worked well this time, but I want to re-run FetchSlide with more time.

Good to hear =)
Yes, FetchSlide is the hardest task. Don't hesitate to double or even more the budget (1e6 is pretty small compared to what was used in the paper).

python==3.6.10
gym==0.18.0

perfect

Which corresponds to mujoco 1.5 (not 2.0) due to openai/gym#1541. Not sure that's relevant here but I've just been using mujoco 1.5 ever since, never needed any of the new features. gym was downgraded as well to accommodate this.

yes, I'm aware of that issue (but I think it did not see much change when I was using mujoco 2 with robotics envs).
So, mujoco version does not matter much, what matter is to document which one we used ;)

Do you want the PR here to be a new one? or should I just overwrite the commits here and keep this PR?

Let's keep that one

@araffin
Copy link
Member

araffin commented Mar 13, 2021

I just realized you will need to temporary comment out this line for updating the benchmark file: https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/utils/benchmark.py#L47

(this is because we don't have a mujoco license on the CI server).

@araffin araffin changed the title Add mujoco robotics her+sac agents and hyper params Add MuJoCo Robotics Envs HER+TQC trained agents Mar 13, 2021
@@ -91,7 +95,6 @@ and also allow users to have access to pretrained agents.*
|qrdqn|BeamRiderNoFrameskip-v4 | 17122.941| 10769.997|10M | 596483| 17|
|qrdqn|BreakoutNoFrameskip-v4 | 393.600| 79.828|10M | 579711| 40|
|qrdqn|CartPole-v1 | 500.000| 0.000|50k | 150000| 300|
|qrdqn|EnduroNoFrameskip-v4 | 3231.200| 1311.801|10M | 585728| 5|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like I forgot to push the benchmark log of that one... (I will do that soon)
Please do the same for the robotics envs (files are in logs/benchmark/, you may need git add -f for that)

@araffin
Copy link
Member

araffin commented Mar 15, 2021

Apart from the missing entry in changelog (and the missing benchmark log) LGTM =)
(I will push the QR-DQN logs soon and then also update the README)

@sgillen
Copy link
Contributor Author

sgillen commented Mar 15, 2021

Ok, probably didn't need to be three commits on my part but there you go...

I took a look at the changelog and wasn't entirely sure where / what to add, figure it's faster for you to add something in then for me to ask what.

@araffin
Copy link
Member

araffin commented Mar 15, 2021

Ok, probably didn't need to be three commits on my part but there you go...

No worry, commits will be squashed at the end.

I took a look at the changelog and wasn't entirely sure where / what to add, figure it's faster for you to add something in then for me to ask what.

I'll do that.

I also plotted the training success rate (which should be higher at test time) using:

python scripts/plot_train.py -a her -e Fetch -y success -f rl-trained-agents/ -w 500

Training_Success_Rate

FetchReach

Copy link
Member

@araffin araffin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks =)

@araffin araffin merged commit bca831b into DLR-RM:master Mar 15, 2021
@caishanglei
Copy link

How are the parameters of the pickplace environment using TQC+HER set? Why do I not work during training? Are the hyperparameters in the initial download code already optimal?

@ArashVahabpour
Copy link

ArashVahabpour commented Aug 20, 2021

can someone guide me pls how I can enjoy the pretrained model?

@Miffyli
Copy link
Collaborator

Miffyli commented Aug 20, 2021

I will let @araffin answer these, but he is currently on vacation, so please give him some time :)

@ArashVahabpour
Copy link

thanks for your prompt response. I have some urgency for using this... if someone can help in the mean time I will be very thankful!

@Miffyli
Copy link
Collaborator

Miffyli commented Aug 20, 2021

Seems like these instructions should work out of the box: https://github.com/DLR-RM/rl-baselines3-zoo#enjoy-a-trained-agent

@ArashVahabpour
Copy link

ArashVahabpour commented Aug 20, 2021

the issue is that HER is not an algorithm from sb2 onwards
but @araffin has stored some trained models for HER solving Fetch problems.

@Miffyli
Copy link
Collaborator

Miffyli commented Aug 20, 2021

the issue is that HER is not an algorithm from sb2 onwards

HER is included in SB3 and TQC is in contrib

@ArashVahabpour
Copy link

@Miffyli thanks so much. I think HER has some problem and @araffin should clarify in documentations.
But I got TQC working. For reference of others:
python enjoy.py --algo tqc --env FetchPickAndPlace-v1

@araffin
Copy link
Member

araffin commented Aug 23, 2021

I think HER has some problem and @araffin should clarify in documentations.

Could you elaborate?

the issue is that HER is not an algorithm from sb2 onwards

yes, it is documented both in SB3 changelog and in the HER hyperparameter file: "# NOTE: STARTING WITH SB3 >= 1.1.0, because HER is now HerReplayBuffer,"

@ArashVahabpour
Copy link

ArashVahabpour commented Aug 24, 2021

I think HER has some problem and @araffin should clarify in documentations.

Could you elaborate?

the issue is that HER is not an algorithm from sb2 onwards

yes, it is documented both in SB3 changelog and in the HER hyperparameter file: "# NOTE: STARTING WITH SB3 >= 1.1.0, because HER is now HerReplayBuffer,"

Sure, I meant I couldn't find a set of input arguments by which I could load and enjoy the trained weights stored in "her" directory. Could you clarify for everyone's reference. Thanks!

@araffin
Copy link
Member

araffin commented Aug 24, 2021

yes, I probably need to update the README.
But as a general rule, all the available agents are listed in https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/benchmark.md

I'm also thinking about removing the her folder...

@matinmoezzi
Copy link

Hi,
Could you please share the training command? I am wondering if there is a way to increase the training speed and utilize all CPU cores.
There is a command in the OpenAI baselines repo using MPI:
mpirun -np 19 python -m baselines.run --num_env=2 --alg=her
This command uses 19 physical CPU cores.

@araffin
Copy link
Member

araffin commented Mar 22, 2023

Could you please share the training command?

https://stable-baselines3.readthedocs.io/en/master/modules/her.html#how-to-replicate-the-results

I am wondering if there is a way to increase the training speed and utilize all CPU cores.

See DLR-RM/stable-baselines3#704 (comment)
We don't offer mpi acceleration, but if you use SBX (see comment) with latest SB3 master version, you should be able to have up to a 3x speed boost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants