-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Using images to train DDPG+HER agent #287
Comments
Hello, If you want to work with images directly (which I do not recommend anyway, better to extract an intermediate representation first), you will need to update the a quick example: import numpy as np
import gym
from gym import spaces
from stable_baselines3 import DDPG, HER
class CustomEnv(gym.GoalEnv):
"""Custom Environment that follows gym interface"""
metadata = {"render.modes": ["human"]}
def __init__(self):
super(CustomEnv, self).__init__()
self.action_space = spaces.Box(low=-1, high=1, shape=(2,), dtype=np.float32)
N_CHANNELS = 1
HEIGHT = 64
WIDTH = 64
obs_shape = (N_CHANNELS, HEIGHT, WIDTH)
# obs_shape = (HEIGHT, WIDTH, N_CHANNELS)
# Example for using image as input (can be channel-first or channel-last):
self.observation_space = spaces.Dict(
{
"observation": spaces.Box(low=0, high=255, shape=obs_shape, dtype=np.uint8),
"achieved_goal": spaces.Box(low=0, high=255, shape=obs_shape, dtype=np.uint8),
"desired_goal": spaces.Box(low=0, high=255, shape=obs_shape, dtype=np.uint8),
}
)
def step(self, action):
reward = 0.0
done = False
return self.observation_space.sample(), reward, done, {}
def compute_reward(self, achieved_goal, desired_goal, info):
return np.zeros((len(achieved_goal),))
def reset(self):
return self.observation_space.sample()
def render(self, mode="human"):
pass
model = HER("MlpPolicy", CustomEnv(), DDPG, verbose=1, buffer_size=1000, learning_starts=100, max_episode_length=100)
model.learn(50000) |
The refactored HER is available here: #351 |
Hi @araffin |
Hello,
I’m trying to train a DDPG+HER agent that interacts with a custom environment and that takes in input the RGB image of the environment.
From what I understand, in the previous version of stable baselines only 1D observation spaces were supported in HER (as indicated also in HERGoalEnvWrapper), thus excluding image observations
In this new version of stable baselines, I see no evident assertion against using 2D spaces but in ObsDictWrapper, the observation and goal dimensions are taken from the first dimension shape only
Question
Is it possible to train a DDPG+HER agent from images using the implementation of stable baselines 3?
The text was updated successfully, but these errors were encountered: