-
-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PPO and multi dimensional actions spaces #251
Comments
Woops, miss by me. I use a MultiThreadEnv for my own research experiments and used that to test against so this one slipped by untested. Can absolutely have a look, seems like it should be a rather quick fix. |
Seems it was not as simple as I thought, and I might need some input on how to best solve it. Basically it comes down to that if we do not use a ERROR: MethodError: (::ActionTransformedEnv{typeof(identity), var"#14#18"{Float64, Float64}, PendulumEnv{ClosedInterval{Float64}, Float32, StableRNGs.LehmerRNG}})(::ReinforcementLearningZoo.EnrichedAction{Vector{Float32}, NamedTuple{(:action_log_prob,), Tuple{Float32}}}) is ambiguous. Candidates:
(env::AbstractEnv)(action::ReinforcementLearningZoo.EnrichedAction) in ReinforcementLearningZoo at /home/ubuntu/.julia/dev/ReinforcementLearningZoo/src/patch.jl:20
(env::ActionTransformedEnv)(action, args...; kwargs...) in ReinforcementLearningEnvironments at /home/ubuntu/.julia/dev/ReinforcementLearningEnvironments/src/environments/wrappers/ActionTransformedEnv.jl:37
(env::AbstractEnvWrapper)(args...; kwargs...) in ReinforcementLearningEnvironments at /home/ubuntu/.julia/dev/ReinforcementLearningEnvironments/src/environments/wrappers/wrappers.jl:9
Possible fix, define
(::ActionTransformedEnv)(::ReinforcementLearningZoo.EnrichedAction) We would probably like to dispatch to the first one in the list since that just calls the env with the actual action. But I'm not sure how one would go about forcing julia to use that one in a nice way. Any suggestions welcome, otherwise I will have a look at this at some later point and try to figure out how one should work with the dispatch in those cases. |
When applying PPO on general environments (the one wrapped in the MultiThreadEnv), we simply return the action instead of |
Oh, so we assume that training will always be done with |
Exactly |
::PPOPolicy
always returns a scalar, also if it is defined for multi dimensional actions:That's a bug in https://github.com/JuliaReinforcementLearning/ReinforcementLearningZoo.jl/blob/022c1fd433911dcaedf3ff38b4bfb6351c544497/src/algorithms/policy_gradient/ppo.jl#L176
Previously that code only supports single dimension action space.
The text was updated successfully, but these errors were encountered: