[Bug]: unable to learn MountainCarContinuous-v0 #2038

tesla-cat · 2024-11-09T10:05:54Z

🐛 Bug

the ep_rew_mean doesnt improve much with MountainCarContinuous-v0 using PPO

To Reproduce

import gymnasium as gym
from stable_baselines3 import PPO

env = gym.make("MountainCarContinuous-v0", render_mode="human")

model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10_000)

vec_env = model.get_env()
obs = vec_env.reset()
for i in range(1000):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, done, info = vec_env.step(action)
    vec_env.render()
    # VecEnv resets automatically
    # if done:
    #   obs = env.reset()

env.close()

Relevant log output / Error message

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 999      |
|    ep_rew_mean     | -51.2    |
| time/              |          |
|    fps             | 28       |
|    iterations      | 1        |
|    time_elapsed    | 72       |
|    total_timesteps | 2048     |
---------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 999         |
|    ep_rew_mean          | -49.8       |
| time/                   |             |
|    fps                  | 28          |
|    iterations           | 2           |
|    time_elapsed         | 143         |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.007122389 |
|    clip_fraction        | 0.0244      |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.37       |
|    explained_variance   | -0.00725    |
|    learning_rate        | 0.0003      |
|    loss                 | 0.000856    |
|    n_updates            | 10          |
|    policy_gradient_loss | -0.0116     |
|    std                  | 0.916       |
|    value_loss           | 0.0665      |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 999          |
|    ep_rew_mean          | -48          |
| time/                   |              |
|    fps                  | 28           |
|    iterations           | 3            |
|    time_elapsed         | 213          |
|    total_timesteps      | 6144         |
| train/                  |              |
|    approx_kl            | 0.0068389685 |
|    clip_fraction        | 0.026        |
|    clip_range           | 0.2          |
|    entropy_loss         | -1.29        |
|    explained_variance   | 0.0406       |
|    learning_rate        | 0.0003       |
|    loss                 | -0.0138      |
|    n_updates            | 20           |
|    policy_gradient_loss | -0.0107      |
|    std                  | 0.844        |
|    value_loss           | 0.0323       |
------------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 999        |
|    ep_rew_mean          | -46.4      |
| time/                   |            |
|    fps                  | 28         |
|    iterations           | 4          |
|    time_elapsed         | 284        |
|    total_timesteps      | 8192       |
| train/                  |            |
|    approx_kl            | 0.00781654 |
|    clip_fraction        | 0.03       |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.2       |
|    explained_variance   | -0.129     |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0158    |
|    n_updates            | 30         |
|    policy_gradient_loss | -0.0134    |
|    std                  | 0.775      |
|    value_loss           | 0.0194     |
|
|    clip_fraction        | 0.039       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.12       |
|    explained_variance   | -0.0127     |
|    learning_rate        | 0.0003      |
|    loss                 | -0.0329     |
|    n_updates            | 40          |
|    policy_gradient_loss | -0.0155     |
|    std                  | 0.709       |
|    value_loss           | 0.0156      |
-----------------------------------------

System Info

No response

Checklist

My issue does not relate to a custom gym environment. (Use the custom gym env template instead)
I have checked that there is no similar issue in the repo
I have read the documentation
I have provided a minimal and working example to reproduce the bug
I've used the markdown code blocks for both code and stack traces.

The text was updated successfully, but these errors were encountered:

tesla-cat added the bug Something isn't working label Nov 9, 2024

tesla-cat closed this as completed Nov 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: unable to learn MountainCarContinuous-v0 #2038

[Bug]: unable to learn MountainCarContinuous-v0 #2038

tesla-cat commented Nov 9, 2024

[Bug]: unable to learn MountainCarContinuous-v0 #2038

[Bug]: unable to learn MountainCarContinuous-v0 #2038

Comments

tesla-cat commented Nov 9, 2024

🐛 Bug

To Reproduce

Relevant log output / Error message

System Info

Checklist