Add Soft Actor-Critic (SAC) #326

brahimdriss · 2023-06-29T15:28:38Z

Description

This PR introduces SAC, following the original article, for continuous action spaces.
The current implementation was evaluated on gym Pendulum and Mujoco Hopper (v2), "solving" both environments.

Todo

Switch to gymnasium
Add seeding
Handle multiple fitting
Add benchmarks (Mujoco and classic gym control)

Original articles:
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Composable Deep Reinforcement Learning for Robotic Manipulation
Soft Actor-Critic Algorithms and Applications

Reference resources:
haarnoja/sac
haarnoja/softqlearning
openai/spinningup
vwxyzjn/cleanrl
DLR-RM/stable-baselines3

Checklist

My code follows the style guideline
To check :
black --check examples rlberry *py
flake8 --select F401,F405,D410,D411,D412 --exclude=rlberry/check_packages.py --per-file-ignores="init.py:F401",
I have commented my code, particularly in hard-to-understand areas,
I have made corresponding changes to the documentation,
I have added tests that prove my fix is effective or that my feature works,
New and existing unit tests pass locally with my changes,
If updated the changelog if necessary,
I have set the label "ready for review" and the checks are all green.

for more information, see https://pre-commit.ci

fix rtd ?

for more information, see https://pre-commit.ci

brahimdriss · 2023-06-30T15:36:21Z

Pendulum results on 10 runs. Not sure about the style of the plot, but it still can be changed later.

MuJoCo results are next.

brahimdriss · 2023-07-03T09:09:27Z

MuJoCo results:

Hopper-v2 : 1M timesteps - 5 runs
Training episodic return : 2247.09 ± 929.56
Walker-v2: 1M timesteps - 5 runs
Training episodic return : 4217.55 ± 417.02
HalfCheetah-v2: 1M timesteps - 5 runs
Training episodic return : 6393.02 ± 657.12

riiswa

Nice work, I just left I few details/suggestions that are not directly related to the SAC implementation.

riiswa · 2023-07-10T13:02:16Z

rlberry/agents/torch/sac/sac.py

+    def policy(self, state):
+        assert self.cont_policy is not None
+        state = np.array([state])
+        state = torch.FloatTensor(state).to(self.device)


Just a nit, but there is some built-in Pytorch method to convert a np array to a torch tensor (torch.as_tensor, torch.from_numpy). I think torch.as_tensor allow you to specify the device directly.

Thanks for the suggestions Waris, I tried here to be consistent with the other implementations of Rlberry, FloatTensor was used in some of them, but so was torch.from_numpy. That's why I kept it. Is there any difference in terms of performance otherwise ?

rlberry/agents/torch/sac/sac.py

riiswa · 2023-07-10T13:06:22Z

rlberry/agents/torch/sac/sac.py

+        """
+
+        # Convert the state to a torch.Tensor if it's not already
+        state = torch.FloatTensor(state).to(self.device)


Same suggestion here

rlberry/agents/torch/sac/sac.py

Added SACAgent to docu

added next line

reverted

KohlerHECTOR · 2023-07-24T13:16:08Z

It seems that something is broken with the doc in the sac branch. When trying to change the api.rst , it breaks readthedocs compilation. So I reverted. But otherwise, algorithmically speaking, SAC looks good to me.

This reverts commit 5ed8121.

for more information, see https://pre-commit.ci

brahimdriss and others added 9 commits June 29, 2023 16:02

Added sac early version (gym 0.21)

f63704d

[pre-commit.ci] auto fixes from pre-commit.com hooks

95f8b53

for more information, see https://pre-commit.ci

fix rtd ?

49e574a

Merge pull request #1 from TimotheeMathieu/doc

ea87a05

fix rtd ?

Moved sac to gymnasium

31e597a

Merge branch 'sac' of https://github.com/brahimdriss/rlberry into sac

9997f15

Updated sac demo

ed3214c

[pre-commit.ci] auto fixes from pre-commit.com hooks

4692f05

for more information, see https://pre-commit.ci

Some docstring updates

98cb7ec

brahimdriss added 5 commits July 3, 2023 11:43

Added demo_SAC docstring

d6e0637

Fix logging

e45de74

Merge branch 'sac' of https://github.com/brahimdriss/rlberry into sac

aad4791

Added tests for continus actions agents

9a14784

Fixed some attribute names, passes rlberry tests

7657390

JulienT01 added the ready for review label Jul 4, 2023

CI

3c3c1d6

brahimdriss requested a review from riccardodv July 4, 2023 14:51

riccardodv requested a review from AleShi94 July 4, 2023 14:57

brahimdriss added 7 commits July 7, 2023 11:30

Added test_sac to tests

a726e2d

Removed old experimental folder, updated changelog

be14eb0

Added test with learning

604a0e1

Fixed test params

7667f8e

Clean docstring

21b4d5c

Add sac to hp opt test

18d8eed

Add alpha test for sac

5ed8121

riiswa reviewed Jul 10, 2023

View reviewed changes

Changed variable name (Pascal to snake case)

9dfd91a

KohlerHECTOR added the Marathon To do during Marathon label Jul 13, 2023

KohlerHECTOR added 6 commits July 24, 2023 14:14

Update api.rst

456c6f5

Added SACAgent to docu

Update api.rst

a83b831

Merge branch 'main' into sac

bb6b479

Update api.rst

1516183

added next line

Update api.rst

e9c9f60

Update api.rst

47c9865

reverted

brahimdriss added 2 commits July 24, 2023 15:32

Revert "Add alpha test for sac"

f812aec

This reverts commit 5ed8121.

Merge branch 'sac' of https://github.com/brahimdriss/rlberry into sac

d12cdd8

brahimdriss force-pushed the sac branch from d12cdd8 to 9dfd91a Compare July 24, 2023 13:49

brahimdriss added 5 commits July 24, 2023 16:28

Changelog

030e7b6

Changelog

d6efcbe

Doc fix attempt

78cb8ee

Remove rtd change

8aa654a

Update documentation

99e2ccb

brahimdriss changed the title ~~[WIP] Add Soft Actor-Critic (SAC)~~ Add Soft Actor-Critic (SAC) Jul 24, 2023

brahimdriss and others added 2 commits July 24, 2023 18:08

Merge branch 'main' into sac

a70e492

[pre-commit.ci] auto fixes from pre-commit.com hooks

16ec4b6

for more information, see https://pre-commit.ci

brahimdriss merged commit 72d88d5 into rlberry-py:main Jul 24, 2023

brahimdriss self-assigned this Jul 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Soft Actor-Critic (SAC) #326

Add Soft Actor-Critic (SAC) #326

brahimdriss commented Jun 29, 2023 •

edited

Loading

brahimdriss commented Jun 30, 2023 •

edited

Loading

brahimdriss commented Jul 3, 2023 •

edited

Loading

riiswa left a comment

riiswa Jul 10, 2023

brahimdriss Jul 11, 2023

riiswa Jul 10, 2023

KohlerHECTOR commented Jul 24, 2023

Add Soft Actor-Critic (SAC) #326

Add Soft Actor-Critic (SAC) #326

Conversation

brahimdriss commented Jun 29, 2023 • edited Loading

Description

Todo

Checklist

brahimdriss commented Jun 30, 2023 • edited Loading

brahimdriss commented Jul 3, 2023 • edited Loading

MuJoCo results:

riiswa left a comment

Choose a reason for hiding this comment

riiswa Jul 10, 2023

Choose a reason for hiding this comment

brahimdriss Jul 11, 2023

Choose a reason for hiding this comment

riiswa Jul 10, 2023

Choose a reason for hiding this comment

KohlerHECTOR commented Jul 24, 2023

brahimdriss commented Jun 29, 2023 •

edited

Loading

brahimdriss commented Jun 30, 2023 •

edited

Loading

brahimdriss commented Jul 3, 2023 •

edited

Loading