Skip to content

Commit

Permalink
Fix formulas in PPO hw (#293)
Browse files Browse the repository at this point in the history
  • Loading branch information
mknbv authored and yhn112 committed Jan 24, 2020
1 parent 085b5e7 commit 92a744d
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions week09_policy_II/ppo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -400,27 +400,27 @@
"modifies the typical policy gradient loss in the following way:\n",
"\n",
"$$\n",
"L_{\\pi} = \\frac{1}{T-1}\\sum_{l=0}^{T-1}\n",
"L_{\\pi} = \\frac{1}{T}\\sum_{l=0}^{T-1}\n",
"\\frac{\\pi_\\theta(a_{t+l}|s_{t+l})}{\\pi_\\theta^{\\text{old}}(a_{t+l}|s_{t+l})}\n",
"A^{\\mathrm{GAE}(\\gamma,\\lambda)}_{t+l}\\\\\n",
"L_{\\pi}^{\\text{clipped}} = \\frac{1}{T-1}\\sum_{l=0}^{T-1}\\mathrm{clip}\\left(\n",
"L_{\\pi}^{\\text{clipped}} = \\frac{1}{T}\\sum_{l=0}^{T-1}\\mathrm{clip}\\left(\n",
"\\frac{\\pi_\\theta(a_{t+l}|s_{t+l})}{\\pi_{\\theta^{\\text{old}}}(a_{t+l}|s_{t+l})}\n",
"\\cdot A^{\\mathrm{GAE(\\gamma, \\lambda)}}_{t+l},\n",
"1 - \\text{cliprange}, 1 + \\text{cliprange}\\right)\\\\\n",
"L_{\\text{policy}} = \\max\\left(L_\\pi, L_{\\pi}^{\\text{clipped}}\\right).\n",
"L_{\\text{policy}} = -\\min\\left(L_\\pi, L_{\\pi}^{\\text{clipped}}\\right).\n",
"$$\n",
"\n",
"Additionally, the value loss is modified in the following way:\n",
"\n",
"$$\n",
"L_V = \\frac{1}{T-1}\\sum_{l=0}^{T-1}(V_\\theta(s_{t+l}) - \\hat{V}(s_{t+l}))^2\\\\\n",
"L_{V}^{\\text{clipped}} = \\frac{1}{T-1}\\sum_{l=0}^{T-1}\n",
"L_V = \\frac{1}{T}\\sum_{l=0}^{T-1}(V_\\theta(s_{t+l}) - \\hat{V}(s_{t+l}))^2\\\\\n",
"L_{V}^{\\text{clipped}} = \\frac{1}{T}\\sum_{l=0}^{T-1}\n",
"V_{\\theta^{\\text{old}}}(s_{t+l}) +\n",
"\\text{clip}\\left(\n",
"V_\\theta(s_{t+l}) - V_{\\theta^\\text{old}}(s_{t+l}),\n",
"-\\text{cliprange}, \\text{cliprange}\n",
"\\right)\\\\\n",
"L_{\\text{value}} = \\max\\left(L_V, L_V^{\\text{clipped}}\\right).\n",
"L_{\\text{value}} = -\\min\\left(L_V, L_V^{\\text{clipped}}\\right).\n",
"$$"
]
},
Expand Down

0 comments on commit 92a744d

Please sign in to comment.