-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible error in critic update in SAC-AE algorithm #162
Comments
Hi @Cerphilly , thanks for pointing this out! I agree that this might be an error (not sure about the impact of this though). So, would you suggest something like the following? q_grad = tape.gradient(td_loss_q1 + td_loss_q2, self._encoder.trainable_variables + self.qf1.trainable_variables + self.qf2.trainable_variables)
self.qf_optimizer.apply_gradients(
zip(q_grad, self._encoder.trainable_variables + self.qf1.trainable_variables + self.qf2.trainable_variables)) The above code just sums up the two TD losses and computes the gradients of it. |
Thanks for the quick response!
and it seemed to achieve higher performance in RAD. |
In SAC-AE algorithm, critic1 and 2 are updated as the following:
However, as encoder is optimized with q1 before q2 + encoder optimization, td_loss_q2 and q2_grad are inconsistent. Thus I believe q2_grad have to be calculated before optimizing qf1 and encoder.
The text was updated successfully, but these errors were encountered: