Fix bug in multi action ppo #169

albheim · 2021-04-14T08:57:08Z

As mentioned in JuliaReinforcementLearning/ReinforcementLearning.jl#234 we had some strange behaviour. This was from the log_p`_a having a different size than log_p, (1, N) compared to (N,), creating a matrix for the ratio which was then reduced down.

I also found that the entropy loss was not defined to work with multi dimensional actions correctly, it was missing a multiple of how many dimensions there were in the actions space. Then it was also missing a division by 2 in one of the terms compared to how the entropy is defined on https://en.wikipedia.org/wiki/Multivariate_normal_distribution#Differential_entropy

findmyway

👍

src/algorithms/policy_gradient/ppo.jl

findmyway · 2021-04-14T09:25:26Z

Feel free to merge it first. The CI error is caused by a breaking change in GridWorlds.jl

Co-authored-by: Jun Tian <[email protected]>

albheim · 2021-04-14T09:28:09Z

Cool, will do that then. I might also try to add some environment that has a multidimensional actions space so we can have a test for the algorithms that handles that. But I'll leave that for some later time.

albheim · 2021-04-14T09:32:25Z

What type of merge do you usually use? Merge commit, squash and merge or rebase and merge?

findmyway · 2021-04-14T09:58:31Z

In most cases, I'd squash and merge.

Remove dimension in log_pa, fix entropy for multi

38036ce

findmyway approved these changes Apr 14, 2021

View reviewed changes

src/algorithms/policy_gradient/ppo.jl Outdated Show resolved Hide resolved

Update src/algorithms/policy_gradient/ppo.jl

000a419

Co-authored-by: Jun Tian <[email protected]>

albheim merged commit 2f28cbc into master Apr 14, 2021

albheim deleted the albheim_ppo_multiaction_fix branch April 14, 2021 09:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug in multi action ppo #169

Fix bug in multi action ppo #169

albheim commented Apr 14, 2021

findmyway left a comment

findmyway commented Apr 14, 2021

albheim commented Apr 14, 2021

albheim commented Apr 14, 2021

findmyway commented Apr 14, 2021

Fix bug in multi action ppo #169

Fix bug in multi action ppo #169

Conversation

albheim commented Apr 14, 2021

findmyway left a comment

Choose a reason for hiding this comment

findmyway commented Apr 14, 2021

albheim commented Apr 14, 2021

albheim commented Apr 14, 2021

findmyway commented Apr 14, 2021