-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Soft Actor-Critic (SAC) #326
Conversation
for more information, see https://pre-commit.ci
fix rtd ?
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, I just left I few details/suggestions that are not directly related to the SAC implementation.
def policy(self, state): | ||
assert self.cont_policy is not None | ||
state = np.array([state]) | ||
state = torch.FloatTensor(state).to(self.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit, but there is some built-in Pytorch method to convert a np array to a torch tensor (torch.as_tensor
, torch.from_numpy
). I think torch.as_tensor
allow you to specify the device directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions Waris, I tried here to be consistent with the other implementations of Rlberry, FloatTensor
was used in some of them, but so was torch.from_numpy
. That's why I kept it. Is there any difference in terms of performance otherwise ?
""" | ||
|
||
# Convert the state to a torch.Tensor if it's not already | ||
state = torch.FloatTensor(state).to(self.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same suggestion here
Added SACAgent to docu
added next line
reverted
It seems that something is broken with the doc in the sac branch. When trying to change the api.rst , it breaks readthedocs compilation. So I reverted. But otherwise, algorithmically speaking, SAC looks good to me. |
for more information, see https://pre-commit.ci
Description
This PR introduces SAC, following the original article, for continuous action spaces.
The current implementation was evaluated on gym Pendulum and Mujoco Hopper (v2), "solving" both environments.
Todo
Original articles:
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Composable Deep Reinforcement Learning for Robotic Manipulation
Soft Actor-Critic Algorithms and Applications
Reference resources:
haarnoja/sac
haarnoja/softqlearning
openai/spinningup
vwxyzjn/cleanrl
DLR-RM/stable-baselines3
Checklist
To check :
black --check examples rlberry *py
flake8 --select F401,F405,D410,D411,D412 --exclude=rlberry/check_packages.py --per-file-ignores="init.py:F401",