Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix formulas in PPO hw #293

Merged
merged 2 commits into from
Sep 10, 2019
Merged

Fix formulas in PPO hw #293

merged 2 commits into from
Sep 10, 2019

Conversation

mknbv
Copy link
Collaborator

@mknbv mknbv commented Sep 6, 2019

Closes #248.

@mknbv mknbv requested a review from dniku September 6, 2019 18:15
@review-notebook-app
Copy link

Check out this pull request on ReviewNB: https://app.reviewnb.com/yandexdataschool/Practical_RL/pull/293

You'll be able to see notebook diffs and discuss changes. Powered by ReviewNB.

@dniku
Copy link
Collaborator

dniku commented Sep 6, 2019

We've started to use this script for cleaning up notebook metadata to avoid introducing irrelevant changes in randomly in PRs. Could you run that script on the notebook?

Copy link
Collaborator

@dniku dniku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to both amendments:

  1. T - 1T for lengths of ranges from 0 to T - 1
  2. max-min b/c that is the original formulation of the loss.

@dniku dniku merged commit fc053ff into master Sep 10, 2019
@dniku dniku deleted the ppo-hw-fix branch September 10, 2019 23:03
yhn112 pushed a commit that referenced this pull request Jan 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(Probably) mistake in PPO policy loss formula
2 participants