You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thanks for the code, while I tried to re-implement the program, I find that there is one step to normalize value function vf here . It's implementated by v_predict = v(s; \theta) * (1-/gamma) and critic update is implemented by min_\theta [v(s; \theta) * (1-/gamma) - v_estimate ]^2.
Is there any reason to normalize Value functions output, I tested to remove the normalization term and rescaled learning rate(by 1-gamma), looks there is no problem in HalfCheetah-v2.
It holds similar performance with original version.
Best,
The text was updated successfully, but these errors were encountered:
im-Kitsch
changed the title
Normalization of vf
Why Normalization of vf
Jun 15, 2022
the value scaling is just mainly a convention, i generally like to keep things normalized between 0 and 1. Training should work just as well without the normalization, but it might just need some tuning for the other hyper parameters like the stepsize.
Hello,
thanks for the code, while I tried to re-implement the program, I find that there is one step to normalize value function vf here . It's implementated by
v_predict = v(s; \theta) * (1-/gamma)
and critic update is implemented bymin_\theta [v(s; \theta) * (1-/gamma) - v_estimate ]^2
.Is there any reason to normalize Value functions output, I tested to remove the normalization term and rescaled learning rate(by 1-gamma), looks there is no problem in HalfCheetah-v2.
It holds similar performance with original version.
Best,
The text was updated successfully, but these errors were encountered: