Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HopperMuJoCoEnv #54

Open
giarcieri opened this issue Jul 13, 2020 · 7 comments
Open

HopperMuJoCoEnv #54

giarcieri opened this issue Jul 13, 2020 · 7 comments

Comments

@giarcieri
Copy link
Contributor

Hi and thank you very much for your work,

I would like to use the MuJoCo implementation of Hopper, which has obs_dim=11 and action_dim=3. However, when making the environment with gym.make('HopperMuJoCoEnv-v0'), it returns the PyBullet implementation (HopperPyBulletEnv-v0) which has obs_dim=15 and action_dim=3. Is it possible to access the MuJoCo implementation (https://github.com/benelot/pybullet-gym/blob/master/pybulletgym/envs/mujoco/envs/locomotion/hopper_env.py)?

@kae1506
Copy link

kae1506 commented Jan 24, 2021

This repo is for mujoco envs TRANSFERRED into pybullet envs. Mujoco is not free, and you will not be able to do mujoco with this repo

@giarcieri
Copy link
Contributor Author

Hi,

I know that and I referred to the Mujoco implementations of pybullet -> https://github.com/benelot/pybullet-gym/blob/master/pybulletgym/envs/mujoco/envs/locomotion/hopper_env.py. Indeed, pybullet implements both its own imlementation and the "Mujoco copies" for most of the environments. Here, the problem was that the Mujoco implementation above returned the pybullet implementation. After having written this post, I found out the source of the problem. In the second line of hopper_env.py is imported the roboschool locomotor instead of the Mujoco one. Solving the issue is as simple as it follows:

  1. open https://github.com/benelot/pybullet-gym/blob/master/pybulletgym/envs/mujoco/envs/locomotion/hopper_env.py

  2. replace the second line "from pybulletgym.envs.roboschool.robots.locomotors import Hopper" with "from pybulletgym.envs.mujoco.robots.locomotors import Hopper"

Hope this will help someone else.

@benelot
Copy link
Owner

benelot commented Jan 25, 2021

Hi,

I know that and I referred to the Mujoco implementations of pybullet -> https://github.com/benelot/pybullet-gym/blob/master/pybulletgym/envs/mujoco/envs/locomotion/hopper_env.py. Indeed, pybullet implements both its own imlementation and the "Mujoco copies" for most of the environments. Here, the problem was that the Mujoco implementation above returned the pybullet implementation. After having written this post, I found out the source of the problem. In the second line of hopper_env.py is imported the roboschool locomotor instead of the Mujoco one. Solving the issue is as simple as it follows:

  1. open https://github.com/benelot/pybullet-gym/blob/master/pybulletgym/envs/mujoco/envs/locomotion/hopper_env.py
  2. replace the second line "from pybulletgym.envs.roboschool.robots.locomotors import Hopper" with "from pybulletgym.envs.mujoco.robots.locomotors import Hopper"

Hope this will help someone else.

Hello @giarcieri
Thanks for this suggestion, you are correct and there is a wrong import. Do you mind to make a pullrequest for it? You can even make this change directly here on the repository via browser by editing the code using the small pencil button when opening the code file. Thanks for testing the envs btw. Unfortunately, I can not yet guarantee their proper functionality. In your experience, do they work well for your purpose?

@giarcieri
Copy link
Contributor Author

Hi @benelot!
I made the pullrequest, sorry for the delay!
Your environments are great! For my Master thesis project, I performed a benchmark in model-based RL where I assessed the performance of several models on 8 different environments, and 7 environments are from PyBullet! Basically, I have run my RL algorithms onto your envs for 3 months without break, and I never had problems.
I published the code and summarized the results in this repo: https://github.com/giarcieri/Assessing-the-Influence-of-Models-on-the-Performance-of-Reinforcement-Learning-Algorithms. Please let me know what you think, if you will have a look into it! I also hope to traslate my thesis into a paper some time in the future and try to publish it.

@benelot
Copy link
Owner

benelot commented Feb 3, 2021

Thanks, merged!

@benelot benelot closed this as completed Feb 3, 2021
@benelot benelot reopened this Feb 3, 2021
@benelot
Copy link
Owner

benelot commented Feb 3, 2021

Hi @benelot!
I made the pullrequest, sorry for the delay!
Your environments are great! For my Master thesis project, I performed a benchmark in model-based RL where I assessed the performance of several models on 8 different environments, and 7 environments are from PyBullet! Basically, I have run my RL algorithms onto your envs for 3 months without break, and I never had problems.
I published the code and summarized the results in this repo: https://github.com/giarcieri/Assessing-the-Influence-of-Models-on-the-Performance-of-Reinforcement-Learning-Algorithms. Please let me know what you think, if you will have a look into it! I also hope to traslate my thesis into a paper some time in the future and try to publish it.

Hello @giarcieri, thank you for your work, it definitely serves as a test to my environments. If you are up to it, I would love to have some baselines for my repositories in the tests folder. This would include saving some of your trained agents and loading them in the test and run it, check for some average performance score and add it to the test to only pass if it stays above the performance score. Like that it would be clear that all of them work perfectly.

I also never checked if the performances can be compared to the baselines on the actual mujoco envs, they are just as close as possible in observations and actions. As you correctly found out, in some cases, we are still missing the correspondance in the observation, and instead zeroes were added to comply with the number of observations. That is definitely something to investigate in the future and I am sorry if it caused some inconvenences!

@giarcieri
Copy link
Contributor Author

Hi @benelot,
sure, I would like to help you! Unfortunately I have no longer access to the trained models, but you can have a look at the performance achieved in each env in this folder https://github.com/giarcieri/Assessing-the-Influence-of-Models-on-the-Performance-of-Reinforcement-Learning-Algorithms/tree/master/rewards/images. The performance reported is the average of 5 different seeds (42-46). Please note that in my model comparison, I was interested in comparing the sample efficiency of model-based RL algorithms, not the asymptotic performance. This means that the performance would have been higher in some envs (the locomotion ones) if I had trained the models longer. Precisely, all the algorithms converged quickly (i.e. in few episodes) to optimal performance in InvertedPendulumMuJoCoEnv, InvertedDoublePendulumMuJoCoEnv and ReacherPyBulletEnv. They made some progress in HalfCheetahMuJoCoEnv and HopperMuJoCoEnv. Finally, all the models struggled to make progress in Walker2DMuJoCoEnv and AntMuJoCoEnv.
I really don't know if the performances can be compared to the baselines on the actual mujoco envs because I didn't have the mujoco licence and I could not test the same trained models on the two env versions. I empirically observed that the performance on the 3 balance tasks looks similar to what I found in the literature. Regarding the locomotion ones, it is impossible to say because in all papers that I know which applied model-based algorithms to mujoco envs, the authors personally modified the envs with custom reward function and often even changing the dynamics. I also talked about this problem in my thesis and the reasons why in model-based RL custom rewards functions are needed. Indeed, I adopted and justified the reward functions listed here https://github.com/giarcieri/Assessing-the-Influence-of-Models-on-the-Performance-of-Reinforcement-Learning-Algorithms/blob/master/cost_functions.py (most of them come from this paper https://arxiv.org/abs/1907.02057, which also uses some of your envs), but I didn' t change anything in the dynamics.
Finally, if it can help you, I can send you my thesis where I list all hyperparameters I set in my evaluation in order to make the performance 100% reproducible.
Don't be sorry, your envs are great and I have to thank you a lot for letting me make an excellent thesis even though I had no mujoco license!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants