-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TRPO #40
Add TRPO #40
Conversation
WIP - Trust Region Policy Algorithm Currently the Hessian vector product is not working (see inline comments for more detail)
Adding no_grad block for the line search Additional assert in the conjugate solver to help debugging
- Adding ActorCriticPolicy.get_distribution - Using the Distribution object to compute the KL divergence - Checking for objective improvement in the line search - Moving magic numbers to instance variables
Improving numerical stability of the conjugate gradient algorithm Critic updates
Changes around the alpha of the line search Adding TRPO to __init__ files
Thanks for the PR. Once this is done, you should focus on documentation and test and I will run experiments on pybullet envs + atari games ;) |
- renaming cg_solver to conjugate_gradient_solver and renaming parameter Avp_fun to matrix_vector_dot_func + docstring - extra comments + better variable names in trpo.py - defining a method for the hessian vector product instead of an inline function - fix registering correct policies for TRPO and using correct policy base in constructor
- refactoring sb3_contrib.common.policies to reuse as much code as possible from sb3
- get_distribution will be added directly to the SB3 version of ActorCriticPolicy, this commit reflects this
Could you remove protection on your master branch so I can push changes? (next time, please use another branch ;)) |
Here are the results using the SB2 Hyper-parameters (I'll update the PR on zoo with the parameters used):
|
thanks =), could you also add here learning curve/results comparison with SB2? (plot scripts are included in the zoo, i can help with them if needed) |
Hi, Could you start adding the documentation page + more tests ? Btw, I'm thinking about renaming some variables (related to backtracking line search) so we are more consistent with other implementations, but this is just details... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks =)
@cyprienc I think it's time to move TRPO to SB3 =)! |
@araffin sure, will do. |
Description
This PR adds TRPO: https://arxiv.org/abs/1502.05477
It's still a work in progress (see TODO list below)
Context
closes #38
Types of changes
Checklist:
make format
(required)make check-codestyle
andmake lint
(required)make pytest
andmake type
both pass. (required)MuJoCo 1M Benchmark
Mujoco v2.1.0
v3 envs