PPO and multi dimensional actions spaces #251

luigiannelli · 2021-04-27T10:45:52Z

::PPOPolicy always returns a scalar, also if it is defined for multi dimensional actions:

julia> policy(inner_env)
-0.6246452f0

That's a bug in https://github.com/JuliaReinforcementLearning/ReinforcementLearningZoo.jl/blob/022c1fd433911dcaedf3ff38b4bfb6351c544497/src/algorithms/policy_gradient/ppo.jl#L176
Previously that code only supports single dimension action space.

The text was updated successfully, but these errors were encountered:

findmyway · 2021-04-27T13:20:39Z

Not sure whether @albheim would be interested in fixing this bug 🤔
If not, I may come to fix it after #252 .

albheim · 2021-04-27T13:28:09Z

Woops, miss by me. I use a MultiThreadEnv for my own research experiments and used that to test against so this one slipped by untested. Can absolutely have a look, seems like it should be a rather quick fix.

albheim · 2021-04-27T15:29:12Z

Seems it was not as simple as I thought, and I might need some input on how to best solve it.

Basically it comes down to that if we do not use a MultiThreadEnv with PPO it will dispatch to the normal _run method. PPO uses EnrichedActions which did not work immidiately in that loop. First problem was trajectory updates, but after changing update!(trajectory...) to dispatch on any AbstractEnv that part worked. The larger problem is than when we run env(action) where env is some AbstractEnv or AbstractEnvWrapper and action is an EnrichedAction there seems to be multiple options for dispatch which are similarly specific I guess so it doesn't know which one to use.

ERROR: MethodError: (::ActionTransformedEnv{typeof(identity), var"#14#18"{Float64, Float64}, PendulumEnv{ClosedInterval{Float64}, Float32, StableRNGs.LehmerRNG}})(::ReinforcementLearningZoo.EnrichedAction{Vector{Float32}, NamedTuple{(:action_log_prob,), Tuple{Float32}}}) is ambiguous. Candidates:
  (env::AbstractEnv)(action::ReinforcementLearningZoo.EnrichedAction) in ReinforcementLearningZoo at /home/ubuntu/.julia/dev/ReinforcementLearningZoo/src/patch.jl:20
  (env::ActionTransformedEnv)(action, args...; kwargs...) in ReinforcementLearningEnvironments at /home/ubuntu/.julia/dev/ReinforcementLearningEnvironments/src/environments/wrappers/ActionTransformedEnv.jl:37
  (env::AbstractEnvWrapper)(args...; kwargs...) in ReinforcementLearningEnvironments at /home/ubuntu/.julia/dev/ReinforcementLearningEnvironments/src/environments/wrappers/wrappers.jl:9
Possible fix, define
  (::ActionTransformedEnv)(::ReinforcementLearningZoo.EnrichedAction)

We would probably like to dispatch to the first one in the list since that just calls the env with the actual action. But I'm not sure how one would go about forcing julia to use that one in a nice way.

Any suggestions welcome, otherwise I will have a look at this at some later point and try to figure out how one should work with the dispatch in those cases.

findmyway · 2021-04-27T15:34:05Z

When applying PPO on general environments (the one wrapped in the MultiThreadEnv), we simply return the action instead of EnrichedActions. EnrichedActions will only be useful during training.

albheim · 2021-04-27T15:45:18Z

Oh, so we assume that training will always be done with MultiThreadEnv, and if called with other envs it is just for eval. Seems reasonable, then it should be much easier.

findmyway · 2021-04-27T15:54:39Z

Oh, so we assume that training will always be done with MultiThreadEnv, and if called with other envs it is just for eval. Seems reasonable, then it should be much easier.

Exactly

findmyway added the bug Something isn't working label Apr 27, 2021

albheim mentioned this issue Apr 27, 2021

Fix #251, ppo multidim action eval JuliaReinforcementLearning/ReinforcementLearningZoo.jl#177

Merged

findmyway closed this as completed Apr 28, 2021

findmyway pushed a commit that referenced this issue May 3, 2021

Fix #251, ppo multidim action eval (#177)

5946ccc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO and multi dimensional actions spaces #251

PPO and multi dimensional actions spaces #251

luigiannelli commented Apr 27, 2021

findmyway commented Apr 27, 2021

albheim commented Apr 27, 2021

albheim commented Apr 27, 2021

findmyway commented Apr 27, 2021

albheim commented Apr 27, 2021

findmyway commented Apr 27, 2021

PPO and multi dimensional actions spaces #251

PPO and multi dimensional actions spaces #251

Comments

luigiannelli commented Apr 27, 2021

findmyway commented Apr 27, 2021

albheim commented Apr 27, 2021

albheim commented Apr 27, 2021

findmyway commented Apr 27, 2021

albheim commented Apr 27, 2021

findmyway commented Apr 27, 2021