Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPO and multi dimensional actions spaces #251

Closed
luigiannelli opened this issue Apr 27, 2021 · 6 comments
Closed

PPO and multi dimensional actions spaces #251

luigiannelli opened this issue Apr 27, 2021 · 6 comments
Labels
bug Something isn't working

Comments

@luigiannelli
Copy link

::PPOPolicy always returns a scalar, also if it is defined for multi dimensional actions:

julia> policy(inner_env)
-0.6246452f0

That's a bug in https://github.com/JuliaReinforcementLearning/ReinforcementLearningZoo.jl/blob/022c1fd433911dcaedf3ff38b4bfb6351c544497/src/algorithms/policy_gradient/ppo.jl#L176
Previously that code only supports single dimension action space.

@findmyway findmyway added the bug Something isn't working label Apr 27, 2021
@findmyway
Copy link
Member

Not sure whether @albheim would be interested in fixing this bug 🤔
If not, I may come to fix it after #252 .

@albheim
Copy link
Member

albheim commented Apr 27, 2021

Woops, miss by me. I use a MultiThreadEnv for my own research experiments and used that to test against so this one slipped by untested. Can absolutely have a look, seems like it should be a rather quick fix.

@albheim
Copy link
Member

albheim commented Apr 27, 2021

Seems it was not as simple as I thought, and I might need some input on how to best solve it.

Basically it comes down to that if we do not use a MultiThreadEnv with PPO it will dispatch to the normal _run method. PPO uses EnrichedActions which did not work immidiately in that loop. First problem was trajectory updates, but after changing update!(trajectory...) to dispatch on any AbstractEnv that part worked. The larger problem is than when we run env(action) where env is some AbstractEnv or AbstractEnvWrapper and action is an EnrichedAction there seems to be multiple options for dispatch which are similarly specific I guess so it doesn't know which one to use.

ERROR: MethodError: (::ActionTransformedEnv{typeof(identity), var"#14#18"{Float64, Float64}, PendulumEnv{ClosedInterval{Float64}, Float32, StableRNGs.LehmerRNG}})(::ReinforcementLearningZoo.EnrichedAction{Vector{Float32}, NamedTuple{(:action_log_prob,), Tuple{Float32}}}) is ambiguous. Candidates:
  (env::AbstractEnv)(action::ReinforcementLearningZoo.EnrichedAction) in ReinforcementLearningZoo at /home/ubuntu/.julia/dev/ReinforcementLearningZoo/src/patch.jl:20
  (env::ActionTransformedEnv)(action, args...; kwargs...) in ReinforcementLearningEnvironments at /home/ubuntu/.julia/dev/ReinforcementLearningEnvironments/src/environments/wrappers/ActionTransformedEnv.jl:37
  (env::AbstractEnvWrapper)(args...; kwargs...) in ReinforcementLearningEnvironments at /home/ubuntu/.julia/dev/ReinforcementLearningEnvironments/src/environments/wrappers/wrappers.jl:9
Possible fix, define
  (::ActionTransformedEnv)(::ReinforcementLearningZoo.EnrichedAction)

We would probably like to dispatch to the first one in the list since that just calls the env with the actual action. But I'm not sure how one would go about forcing julia to use that one in a nice way.

Any suggestions welcome, otherwise I will have a look at this at some later point and try to figure out how one should work with the dispatch in those cases.

@findmyway
Copy link
Member

When applying PPO on general environments (the one wrapped in the MultiThreadEnv), we simply return the action instead of EnrichedActions. EnrichedActions will only be useful during training.

@albheim
Copy link
Member

albheim commented Apr 27, 2021

Oh, so we assume that training will always be done with MultiThreadEnv, and if called with other envs it is just for eval. Seems reasonable, then it should be much easier.

@findmyway
Copy link
Member

Oh, so we assume that training will always be done with MultiThreadEnv, and if called with other envs it is just for eval. Seems reasonable, then it should be much easier.

Exactly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants