Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: unable to learn MountainCarContinuous-v0 #2038

Closed
5 tasks done
tesla-cat opened this issue Nov 9, 2024 · 0 comments
Closed
5 tasks done

[Bug]: unable to learn MountainCarContinuous-v0 #2038

tesla-cat opened this issue Nov 9, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@tesla-cat
Copy link

🐛 Bug

  • the ep_rew_mean doesnt improve much with MountainCarContinuous-v0 using PPO

To Reproduce

import gymnasium as gym
from stable_baselines3 import PPO

env = gym.make("MountainCarContinuous-v0", render_mode="human")

model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10_000)

vec_env = model.get_env()
obs = vec_env.reset()
for i in range(1000):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, done, info = vec_env.step(action)
    vec_env.render()
    # VecEnv resets automatically
    # if done:
    #   obs = env.reset()

env.close()

Relevant log output / Error message

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 999      |
|    ep_rew_mean     | -51.2    |
| time/              |          |
|    fps             | 28       |
|    iterations      | 1        |
|    time_elapsed    | 72       |
|    total_timesteps | 2048     |
---------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 999         |
|    ep_rew_mean          | -49.8       |
| time/                   |             |
|    fps                  | 28          |
|    iterations           | 2           |
|    time_elapsed         | 143         |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.007122389 |
|    clip_fraction        | 0.0244      |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.37       |
|    explained_variance   | -0.00725    |
|    learning_rate        | 0.0003      |
|    loss                 | 0.000856    |
|    n_updates            | 10          |
|    policy_gradient_loss | -0.0116     |
|    std                  | 0.916       |
|    value_loss           | 0.0665      |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 999          |
|    ep_rew_mean          | -48          |
| time/                   |              |
|    fps                  | 28           |
|    iterations           | 3            |
|    time_elapsed         | 213          |
|    total_timesteps      | 6144         |
| train/                  |              |
|    approx_kl            | 0.0068389685 |
|    clip_fraction        | 0.026        |
|    clip_range           | 0.2          |
|    entropy_loss         | -1.29        |
|    explained_variance   | 0.0406       |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0138      |
|    n_updates            | 20           |
|    policy_gradient_loss | -0.0107      |
|    std                  | 0.844        |
|    value_loss           | 0.0323       |
------------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 999        |
|    ep_rew_mean          | -46.4      |
| time/                   |            |
|    fps                  | 28         |
|    iterations           | 4          |
|    time_elapsed         | 284        |
|    total_timesteps      | 8192       |
| train/                  |            |
|    approx_kl            | 0.00781654 |
|    clip_fraction        | 0.03       |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.2       |
|    explained_variance   | -0.129     |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0158    |
|    n_updates            | 30         |
|    policy_gradient_loss | -0.0134    |
|    std                  | 0.775      |
|    value_loss           | 0.0194     |
|
|    clip_fraction        | 0.039       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.12       |
|    explained_variance   | -0.0127     |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0329     |
|    n_updates            | 40          |
|    policy_gradient_loss | -0.0155     |
|    std                  | 0.709       |
|    value_loss           | 0.0156      |
-----------------------------------------

System Info

No response

Checklist

  • My issue does not relate to a custom gym environment. (Use the custom gym env template instead)
  • I have checked that there is no similar issue in the repo
  • I have read the documentation
  • I have provided a minimal and working example to reproduce the bug
  • I've used the markdown code blocks for both code and stack traces.
@tesla-cat tesla-cat added the bug Something isn't working label Nov 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant