Skip to content
This repository has been archived by the owner on May 6, 2021. It is now read-only.

Fix bug in multi action ppo #169

Merged
merged 2 commits into from
Apr 14, 2021
Merged

Fix bug in multi action ppo #169

merged 2 commits into from
Apr 14, 2021

Conversation

albheim
Copy link
Member

@albheim albheim commented Apr 14, 2021

As mentioned in JuliaReinforcementLearning/ReinforcementLearning.jl#234 we had some strange behaviour. This was from the log_p`_a having a different size than log_p, (1, N) compared to (N,), creating a matrix for the ratio which was then reduced down.

I also found that the entropy loss was not defined to work with multi dimensional actions correctly, it was missing a multiple of how many dimensions there were in the actions space. Then it was also missing a division by 2 in one of the terms compared to how the entropy is defined on https://en.wikipedia.org/wiki/Multivariate_normal_distribution#Differential_entropy

Copy link
Member

@findmyway findmyway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@findmyway
Copy link
Member

Feel free to merge it first. The CI error is caused by a breaking change in GridWorlds.jl

@albheim
Copy link
Member Author

albheim commented Apr 14, 2021

Cool, will do that then. I might also try to add some environment that has a multidimensional actions space so we can have a test for the algorithms that handles that. But I'll leave that for some later time.

@albheim
Copy link
Member Author

albheim commented Apr 14, 2021

What type of merge do you usually use? Merge commit, squash and merge or rebase and merge?

@findmyway
Copy link
Member

In most cases, I'd squash and merge.

@albheim albheim merged commit 2f28cbc into master Apr 14, 2021
@albheim albheim deleted the albheim_ppo_multiaction_fix branch April 14, 2021 09:59
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants