Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Soft Actor-Critic (SAC) #326

Merged
merged 38 commits into from
Jul 24, 2023
Merged

Add Soft Actor-Critic (SAC) #326

merged 38 commits into from
Jul 24, 2023

Conversation

brahimdriss
Copy link
Collaborator

@brahimdriss brahimdriss commented Jun 29, 2023

Description

This PR introduces SAC, following the original article, for continuous action spaces.
The current implementation was evaluated on gym Pendulum and Mujoco Hopper (v2), "solving" both environments.

Todo

  • Switch to gymnasium
  • Add seeding
  • Handle multiple fitting
  • Add benchmarks (Mujoco and classic gym control)

Original articles:
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Composable Deep Reinforcement Learning for Robotic Manipulation
Soft Actor-Critic Algorithms and Applications

Reference resources:
haarnoja/sac
haarnoja/softqlearning
openai/spinningup
vwxyzjn/cleanrl
DLR-RM/stable-baselines3

Checklist

  • My code follows the style guideline
    To check :
    black --check examples rlberry *py
    flake8 --select F401,F405,D410,D411,D412 --exclude=rlberry/check_packages.py --per-file-ignores="init.py:F401",
  • I have commented my code, particularly in hard-to-understand areas,
  • I have made corresponding changes to the documentation,
  • I have added tests that prove my fix is effective or that my feature works,
  • New and existing unit tests pass locally with my changes,
  • If updated the changelog if necessary,
  • I have set the label "ready for review" and the checks are all green.

@brahimdriss
Copy link
Collaborator Author

brahimdriss commented Jun 30, 2023

Pendulum results on 10 runs. Not sure about the style of the plot, but it still can be changed later.

sac_pendulum

MuJoCo results are next.

@brahimdriss
Copy link
Collaborator Author

brahimdriss commented Jul 3, 2023

MuJoCo results:

Hopper-v2 : 1M timesteps - 5 runs
Training episodic return : 2247.09 ± 929.56
Walker-v2: 1M timesteps - 5 runs
Training episodic return : 4217.55 ± 417.02
HalfCheetah-v2: 1M timesteps - 5 runs
Training episodic return : 6393.02 ± 657.12

sac_hopper_v2
sac_walker_v2
sac_halfcheetah_v2

@brahimdriss brahimdriss requested a review from riccardodv July 4, 2023 14:51
@riccardodv riccardodv requested a review from AleShi94 July 4, 2023 14:57
Copy link
Collaborator

@riiswa riiswa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, I just left I few details/suggestions that are not directly related to the SAC implementation.

def policy(self, state):
assert self.cont_policy is not None
state = np.array([state])
state = torch.FloatTensor(state).to(self.device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit, but there is some built-in Pytorch method to convert a np array to a torch tensor (torch.as_tensor, torch.from_numpy). I think torch.as_tensor allow you to specify the device directly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions Waris, I tried here to be consistent with the other implementations of Rlberry, FloatTensor was used in some of them, but so was torch.from_numpy. That's why I kept it. Is there any difference in terms of performance otherwise ?

"""

# Convert the state to a torch.Tensor if it's not already
state = torch.FloatTensor(state).to(self.device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion here

@KohlerHECTOR KohlerHECTOR added the Marathon To do during Marathon label Jul 13, 2023
@KohlerHECTOR
Copy link
Collaborator

It seems that something is broken with the doc in the sac branch. When trying to change the api.rst , it breaks readthedocs compilation. So I reverted. But otherwise, algorithmically speaking, SAC looks good to me.

@brahimdriss brahimdriss changed the title [WIP] Add Soft Actor-Critic (SAC) Add Soft Actor-Critic (SAC) Jul 24, 2023
@brahimdriss brahimdriss merged commit 72d88d5 into rlberry-py:main Jul 24, 2023
@brahimdriss brahimdriss self-assigned this Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Marathon To do during Marathon ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants