-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MuJoCo Robotics Envs HER+TQC trained agents #71
Conversation
Looks like your checks are failing because I committed to the rl-trained-agents submodule, let me know if there is another way I should structure this commit. |
Hello, To sum up what should be done:
last thing, once your PR in the rl-trained-agents repo is merged, please run |
OK, that mostly makes sense.
Maybe I'm confused, but this is a PR for the master branch? I cloned the latest (after you merged the models) and reran her with the params I had (mostly found from the original rl-baselines-zoo). I chose those because when I tried whichever params where in her last week all tasks except FetchReach failed (as in, ran fine but never got good reward on the tasks). But I have some time today so I'll run the current params with the current sb3 and make a new PR with the steps you detailed above if it works. Just to check, it looks like on the current master of rl-baselines3-zoo the her hyperparams use tqc for every Fetch* env except FetchReach which uses sac. Is this intentional? |
sorry, I read it too quickly. That's fine then ;)
The one that are there (TQC + 3 layers + some additional custom params) I remember testing them in a google colab and it worked back then (in around 4e5 timesteps for Pick and Place) (but I don't have a proper license anymore...).
Yes, TQC is SAC + Distributional RL. FetchReach is super simple to solve (in 5 minutes normally), so the algorithm choice does not matter much here. |
Last thing I forgot to mentioned (but I think you are already doing it): you should use master version of SB3 (1.0rc2) EDIT: it should normally change nothing, it mostly for consistency |
More importantly: what version of python/gym/Mujoco are you using? (we should also document that somewhere) |
Ok, look like everything worked well this time, but I want to re-run FetchSlide with more time. python==3.6.10 Which corresponds to mujoco 1.5 (not 2.0) due to openai/gym#1541. Not sure that's relevant here but I've just been using mujoco 1.5 ever since, never needed any of the new features. gym was downgraded as well to accommodate this. And stable-baselines3 is at the latest commit from the github on master (which corresponds to 1.0rc2) Like I said I am going to re run at least FetchSlide, let me know soon if you want to changes any of the above. I have a mujoco 2.0 install / key that works just fine too, and changing the rest is easy. When that finishes (probably another whole day...) I will open the pull request in rl-trained-agents/. Do you want the PR here to be a new one? or should I just overwrite the commits here and keep this PR? |
Good to hear =)
perfect
yes, I'm aware of that issue (but I think it did not see much change when I was using mujoco 2 with robotics envs).
Let's keep that one |
I just realized you will need to temporary comment out this line for updating the benchmark file: https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/utils/benchmark.py#L47 (this is because we don't have a mujoco license on the CI server). |
@@ -91,7 +95,6 @@ and also allow users to have access to pretrained agents.* | |||
|qrdqn|BeamRiderNoFrameskip-v4 | 17122.941| 10769.997|10M | 596483| 17| | |||
|qrdqn|BreakoutNoFrameskip-v4 | 393.600| 79.828|10M | 579711| 40| | |||
|qrdqn|CartPole-v1 | 500.000| 0.000|50k | 150000| 300| | |||
|qrdqn|EnduroNoFrameskip-v4 | 3231.200| 1311.801|10M | 585728| 5| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like I forgot to push the benchmark log of that one... (I will do that soon)
Please do the same for the robotics envs (files are in logs/benchmark/
, you may need git add -f
for that)
Apart from the missing entry in changelog (and the missing benchmark log) LGTM =) |
Ok, probably didn't need to be three commits on my part but there you go... I took a look at the changelog and wasn't entirely sure where / what to add, figure it's faster for you to add something in then for me to ask what. |
No worry, commits will be squashed at the end.
I'll do that. I also plotted the training success rate (which should be higher at test time) using:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks =)
How are the parameters of the pickplace environment using TQC+HER set? Why do I not work during training? Are the hyperparameters in the initial download code already optimal? |
can someone guide me pls how I can |
I will let @araffin answer these, but he is currently on vacation, so please give him some time :) |
thanks for your prompt response. I have some urgency for using this... if someone can help in the mean time I will be very thankful! |
Seems like these instructions should work out of the box: https://github.com/DLR-RM/rl-baselines3-zoo#enjoy-a-trained-agent |
the issue is that HER is not an algorithm from sb2 onwards |
Could you elaborate?
yes, it is documented both in SB3 changelog and in the HER hyperparameter file: "# NOTE: STARTING WITH SB3 >= 1.1.0, because HER is now HerReplayBuffer," |
Sure, I meant I couldn't find a set of input arguments by which I could load and |
yes, I probably need to update the README. I'm also thinking about removing the |
Hi, |
https://stable-baselines3.readthedocs.io/en/master/modules/her.html#how-to-replicate-the-results
See DLR-RM/stable-baselines3#704 (comment) |
I did get around to this eventually :P.
Adding trained agents for her + sac on the mujoco robotics environments.
I left in the best_model too, this only matters for FetchSlide, where the best agents gets around 50% success, compared to 20% for the latest. The other three environments all get to 100%. I think this roughly matches the results from the original HER paper with DDPG.
Description
Updated hyperparams to the her.yml file, and added trained agents to the rl-trained-agents submodule.
Checklist:
make format
(required)make check-codestyle
andmake lint
(required)make pytest
andmake type
both pass. (required)Note: we are using a maximum length of 127 characters per line
This change is