Feature/recurrent and multiple trainer MAPPO #326

DriesSmit · 2021-10-29T08:25:10Z

What?

Add recurrent training capabilities to MAPPO. Migrate MAPPO to now use the centralised variable source. This allows for training using multiple trainers.

Why?

This update allows for the use of MAPPO in the recurrent training setting and with multiple trainers.

How?

Adapted the MAPPO trainer to work in the recurrent setting. Added a recurrent executor to MAPPO. Migrated the old MAPPO code to use the new scaling version of Mava.

Extra

.

KaleabTessera

Thanks so much @DriesSmit 👐 This is really great! Just see my minor comments.

mava/systems/tf/maddpg/execution.py

mava/systems/tf/mappo/builder.py

mava/systems/tf/mappo/execution.py

mava/systems/tf/mappo/system.py

mava/systems/tf/mappo/training.py

mava/systems/tf/mappo/builder.py

…op's setup.

arnupretorius

Thanks @DriesSmit! 👍 Did a quick smoke review. Left minor comments. Happy with the benchmarking on this and will be useful for comparing with Jax systems.

examples/tf/debugging/simple_spread/recurrent/state_based/run_mappo.py

mava/components/tf/architectures/centralised.py

mava/components/tf/architectures/state_based.py

KaleabTessera

Thanks @DriesSmit , great work on getting this in 🔥 👐 🔥

Just a minor commont on adding doc strings for fix_sampler.

mava/utils/sort_utils.py

arnupretorius

Thanks Dries! 🔥

DriesSmit added 13 commits September 27, 2021 10:41

Add rough recurrent code for MAPPO.

a535e3d

Save progress.

0c6405b

Save recurrent PPO progress.

71f4dc6

Recurrent PPO is running.

cc42e98

Small fixes.

1ae02b5

Recurrent MAPPO trains of the debugging environment!

ac46176

Small fix.

1e4aaae

Save changes.

fa2b43c

Add code to MAD4PG.

a2bbc86

Ready to run 2 vs 2 xray_attention.

345fe1b

Decrease queue size.

a25e977

Merge develop.

3c8a32a

PPO seems to be training and running.

792ea3c

DriesSmit requested review from arnupretorius and KaleabTessera October 29, 2021 08:25

Add multiple trainers PPO example.

f1006c1

DriesSmit self-assigned this Oct 31, 2021

DriesSmit added the enhancement New feature or request label Oct 31, 2021

DriesSmit and others added 6 commits November 19, 2021 10:14

Merge branch 'develop' into feature/recurrent-mappo

cbf9cda

Merge develop.

5b44c5b

Fix PPO example.

c33e32c

Fix embed_spec bug.

ef77441

Fix mypy issues.

fb1c43c

Fix mypy issue.

58b68f9

KaleabTessera reviewed Dec 10, 2021

View reviewed changes

mava/systems/tf/mappo/builder.py Outdated Show resolved Hide resolved

KaleabTessera reviewed Dec 10, 2021

View reviewed changes

mava/systems/tf/mappo/builder.py Outdated Show resolved Hide resolved

KaleabTessera and others added 3 commits December 13, 2021 16:37

Merge branch 'develop' into feature/recurrent-mappo

153e0e5

Address some of the PR comments.

e4a343e

Add termination condition to MA-PPO.

d7f8ba5

DriesSmit and others added 8 commits April 4, 2022 08:28

Merge branch 'develop' into feature/recurrent-mappo

0adc026

fix: Remove redundant statement.

df45468

fix: Remove redundant statement.

49f8534

feat: Small improvement.

34c538c

fix: PPO training for networks with Categorical heads.

7d411dc

fix: Small fix to dataset shuffler.

25f3ea1

fix: Remove print statement.

dbb5797

Small fixes to trainer variable client and Hyperparameter settings.

a257819

RuanJohn added benchmark in progress and removed benchmark in progress labels Apr 13, 2022

sash-a and others added 5 commits April 13, 2022 10:26

feat: added multiple network fix

8e25670

feat: Small updates to hyperparameters. Moving system closer to devel…

a88de00

…op's setup.

merge: Merge changes.

072771b

fix: Small training fixes.

2851235

fix: Big bugfix in MAPPO trainer code setup.

1674154

RuanJohn added benchmark in progress and removed benchmark in progress labels Apr 14, 2022

DriesSmit and others added 2 commits April 19, 2022 09:46

Remove variable update inside mappo trainer _step code.

b8d2738

Merge branch 'develop' into feature/recurrent-mappo

7aac022

RuanJohn added benchmark in progress and removed benchmark in progress labels Apr 20, 2022

arnupretorius previously approved these changes Apr 20, 2022

View reviewed changes

KaleabTessera previously approved these changes Apr 20, 2022

View reviewed changes

mava/utils/sort_utils.py Show resolved Hide resolved

fix: Address PR comments.

60eb054

DriesSmit dismissed stale reviews from KaleabTessera and arnupretorius via 60eb054 April 21, 2022 09:20

arnupretorius approved these changes Apr 21, 2022

View reviewed changes

KaleabTessera approved these changes Apr 21, 2022

View reviewed changes

DriesSmit merged commit 5877138 into develop Apr 21, 2022

DriesSmit deleted the feature/recurrent-mappo branch April 21, 2022 09:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/recurrent and multiple trainer MAPPO #326

Feature/recurrent and multiple trainer MAPPO #326

DriesSmit commented Oct 29, 2021

KaleabTessera left a comment

arnupretorius left a comment

KaleabTessera left a comment

arnupretorius left a comment

Feature/recurrent and multiple trainer MAPPO #326

Feature/recurrent and multiple trainer MAPPO #326

Conversation

DriesSmit commented Oct 29, 2021

What?

Why?

How?

Extra

KaleabTessera left a comment

Choose a reason for hiding this comment

arnupretorius left a comment

Choose a reason for hiding this comment

KaleabTessera left a comment

Choose a reason for hiding this comment

arnupretorius left a comment

Choose a reason for hiding this comment