-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple resets before stepping makes observations junk #397
Comments
If I followed well your code, the method that you call multiple times is In general, considering how you structured your environment (the intended way 😉) you should try to avoid any I think that a possible fix would be moving this randomization to... the randomizer, that is where it should belong. The randomizer, instead, does have access to the So, to recap, if I am right, you can solve by moving this logic after these lines. |
Hello @diegoferigo, Thank you so much for the very thorough answer. It has helped me understand the intended structure a lot better. I just changed my environment to have the reset in the randomizer instead. However, the weird observation values still happen when there are multiple resets before stepping happens. When I change the location of the second reset to be after the epoch like this, The issue does not persist. It seems anytime there are 2 resets before the first step is the only time this issue happens. It isn't a very bad bug (except being hard to find). I have a few follow ups about how to structure the environments to work on a real robot / the recommended way for me to implement my own scenarIO back end for my robot. However, I will move this over to the github discussions. |
Strange behavior, I'm not really sure who to blame :) I tried on my setup that is based on Ignition Fortress + our Scriptimport gym
import time
import functools
from gym_ignition.utils import logger
from gym_bb import randomizers
from gym_ignition.utils.typing import Action, Reward, Observation
env_id = "Monopod-Gazebo-v1"
def make_env_from_id(env_id: str, **kwargs) -> gym.Env:
import gym
import gym_bb
return gym.make(env_id, **kwargs)
make_env = functools.partial(make_env_from_id, env_id=env_id)
env = randomizers.monopod.MonopodEnvRandomizer(env=make_env)
env.seed(42)
# Try to reset multiple times
print(env.reset())
print(env.reset())
print(env.reset())
print(env.reset())
print(env.reset())
print(env.reset())
print(env.reset())
print(env.reset())
print(env.reset())
print(env.reset())
which seems ok, right? |
@diegoferigo Yes that seems correct. I just tried that same script on my setup and got the same results. However after modifying the script a bit I found the minimum example to reproduce the bad behaviour. import gym
import functools
from gym_bb import randomizers
env_id = "Monopod-v1"
def make_env_from_id(env_id: str, **kwargs) -> gym.Env:
import gym
import gym_bb
return gym.make(env_id, **kwargs)
make_env = functools.partial(make_env_from_id, env_id=env_id)
env = randomizers.monopod.MonopodEnvRandomizer(
env=make_env, reward_class_name='BalancingV1')
env.seed(42)
# Try to reset multiple times
action = env.action_space.sample()
print(env.reset())
print(env.step(action))
print(env.reset())
print(env.reset())
print(env.step(action))
print(env.reset())
print(env.step(action))
print(env.step(action)) which gave this output
I am very confused with what is happening here. |
I updated to v1.3.0 and ignition fortress and the results are worse. I can not render the enviroment because of #402 to make sure everything is running okay still but after running the above script again on the new version I got the results,
|
I created a clean ubuntu focal system by executing the following commands in a docker container # Start the container with: docker run -it ubuntu:focal bash
apt update
export IGNITION_DISTRIBUTION="fortress"
export IGNITION_DEFAULT_CHANNEL="stable"
apt install virtualenv wget lsb-release gnupg2 git
echo "deb http://packages.osrfoundation.org/gazebo/ubuntu-${IGNITION_DEFAULT_CHANNEL} `lsb_release -cs` main" > \
/etc/apt/sources.list.d/gazebo-${IGNITION_DEFAULT_CHANNEL}.list
wget http://packages.osrfoundation.org/gazebo.key -qO - | apt-key add -
apt update
apt install ignition-fortress
virtualenv /tmp/venv
source /tmp/venv/bin/activate
pip install -U pip
pip install git+https://github.com/Baesian-Balancer/gym-bb
pip install ipython
pip install -U "gym-ignition==1.3.0" "scenario==1.3.0"
sed -i "s|from . import monitor|# from . import monitor|g" /tmp/venv/lib/python3.8/site-packages/gym_bb/__init__.py And then executing the script (running it multiple times yield reproducible results): import gym
import time
import functools
from gym_ignition.utils import logger
from gym_bb import randomizers
from gym_ignition.utils.typing import Action, Reward, Observation
env_id = "Monopod-Gazebo-v1"
def make_env_from_id(env_id: str, **kwargs) -> gym.Env:
import gym
import gym_bb
return gym.make(env_id, **kwargs)
make_env = functools.partial(make_env_from_id, env_id=env_id)
env = randomizers.monopod.MonopodEnvRandomizer(env=make_env)
env.seed(42)
# Try to reset multiple times
action = env.action_space.sample()
print(env.reset())
print(env.step(action))
print(env.reset())
print(env.reset())
print("===>")
print(env.step(action))
print("<===")
print(env.reset())
print(env.step(action))
print(env.step(action)) Output:
I couldn't visualize the environment from the container I created on the fly, but the simulation is indeed exploding. I suspect it depends on the randomized state from which the model is initialized. Are you sure there are no configuration in which the model is initialized penetrating the ground? Of course, in this scenario, it would receive a huge reaction force and the simulation makes sense that it explodes. After this look, it seems that it does not depend on gym-ignition / scenario, rather the implementation of the environment. |
I have tried isolating the problem using the above idea making the monopod 100% impossible to penetrate the ground. I also have a new version of our environment which has completely changed a lot of the code base from the current implementation including no reset randomization that still has this issue. The extra confusing part is that when you remove the extra reset everything is fine again, no matter how many episodes of training you do. This makes me believe that it isn't clipping due to the randomizer or something with the main logic of the environment. There must be some weird underlying condition that gets changed with the order of resets.. I have dug into my code base pretty thoroughly and can't find the culprit. I think we should close this issue for now and if I find the cause I will followup in this same thread. :) Thank you as always @diegoferigo |
Sure, feel free to open this issue again if needed. Closing. |
Description:
When multiple resets happen before the initial step the observation values become junk.
Example of bad observations:
Example of expected observations:
Steps to reproduce
Note gym_bb is just our custom environment of gym-ignition. The repo containing the code can be found here.
https://github.com/Baesian-Balancer/gym-bb
Additional context
Multiple resets before stepping makes observations junk
Environment
The text was updated successfully, but these errors were encountered: