Feat: sebulba rec ippo #1142

SimonDuToit · 2024-11-18T16:14:22Z

Sebulba implementation of recurrent IPPO.

OmaymaMahjoub

Overall the system looks correct and reasonable. Well done Simon! I just kept minor requests :)

mava/configs/default/rec_ippo_sebulba.yaml

mava/systems/ppo/sebulba/rec_ippo.py

OmaymaMahjoub · 2024-12-11T12:51:56Z

mava/systems/ppo/sebulba/rec_ippo.py

@@ -0,0 +1,910 @@
+# Copyright 2022 InstaDeep Ltd. All rights reserved.


If you can update the typings in Pipeline mava/utils/sebulba.py to be Union[PPOTransition, RNNPPOTransition]

This causes errors in the pre-commit. For now I changed both sebulba systems to use the MavaTransition type-var but this is probably a temporary solution.

Can you please make an issue for this. I think the best solution is to make a protocol with the all the common things in a transition (actions, obs, done, reward). The challenge is that named tuples don't seem to work with protocols so we'd likely need to switch to a flax/chex dataclass

mava/systems/ppo/sebulba/rec_ippo.py

sash-a

Looks great! Pretty much good to go except a few minor style changes to bring it up to date with the latest PPO changes that went in at the end of last year

mava/systems/ppo/sebulba/rec_ippo.py

…nto feat/sebulba_rec_ippo

sash-a

Final minor comments

sash-a · 2025-01-15T08:57:08Z

mava/systems/ppo/sebulba/rec_ippo.py

+        (config.arch.num_envs, num_agents), config.network.hidden_state_dim
+    )
+    hstates = HiddenStates(init_policy_hstate, init_critic_hstate)
+    hstates_tpu = tree.map(move_to_device, hstates)


device_put already tree maps

Suggested change

hstates_tpu = tree.map(move_to_device, hstates)

hstates_tpu = move_to_device(hstates)

sash-a · 2025-01-15T08:57:25Z

mava/systems/ppo/sebulba/rec_ippo.py

+                obs_tpu = tree.map(move_to_device, timestep.observation)
+                last_dones = tree.map(move_to_device, dones)


Suggested change

obs_tpu = tree.map(move_to_device, timestep.observation)

last_dones = tree.map(move_to_device, dones)

obs_tpu = move_to_device(timestep.observation)

last_dones = move_to_device(dones)

SimonDuToit added 2 commits November 18, 2024 18:09

recurrent ippo

1259b01

linting

dbf837c

SimonDuToit requested review from RuanJohn, sash-a, OmaymaMahjoub, WiemKhlifi and Louay-Ben-nessir as code owners November 18, 2024 16:14

pull-request-size bot added the size/XL label Nov 18, 2024

OmaymaMahjoub requested changes Dec 11, 2024

View reviewed changes

OmaymaMahjoub assigned SimonDuToit Dec 11, 2024

SimonDuToit and others added 4 commits January 7, 2025 14:17

Merge branch 'develop' into feat/sebulba_rec_ippo

a5fb284

cleanup and compatibility

d1cf014

unified transition type

c5c9a36

Merge branch 'develop' into feat/sebulba_rec_ippo

1fe33f6

sash-a requested changes Jan 9, 2025

View reviewed changes

SimonDuToit added 2 commits January 9, 2025 16:57

review suggestions

fc8c929

Merge branch 'feat/sebulba_rec_ippo' of github.com:instadeepai/Mava i…

0bd9984

…nto feat/sebulba_rec_ippo

sash-a reviewed Jan 15, 2025

View reviewed changes

more review suggestions

3cf9b2a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: sebulba rec ippo #1142

Feat: sebulba rec ippo #1142

SimonDuToit commented Nov 18, 2024 •

edited

Loading

OmaymaMahjoub left a comment

OmaymaMahjoub Dec 11, 2024

SimonDuToit Jan 7, 2025

sash-a Jan 9, 2025

sash-a left a comment

sash-a left a comment

sash-a Jan 15, 2025

sash-a Jan 15, 2025

		@@ -0,0 +1,910 @@
		# Copyright 2022 InstaDeep Ltd. All rights reserved.

	hstates_tpu = tree.map(move_to_device, hstates)
	hstates_tpu = move_to_device(hstates)

		obs_tpu = tree.map(move_to_device, timestep.observation)
		last_dones = tree.map(move_to_device, dones)

Feat: sebulba rec ippo #1142

Are you sure you want to change the base?

Feat: sebulba rec ippo #1142

Conversation

SimonDuToit commented Nov 18, 2024 • edited Loading

OmaymaMahjoub left a comment

Choose a reason for hiding this comment

OmaymaMahjoub Dec 11, 2024

Choose a reason for hiding this comment

SimonDuToit Jan 7, 2025

Choose a reason for hiding this comment

sash-a Jan 9, 2025

Choose a reason for hiding this comment

sash-a left a comment

Choose a reason for hiding this comment

sash-a left a comment

Choose a reason for hiding this comment

sash-a Jan 15, 2025

Choose a reason for hiding this comment

sash-a Jan 15, 2025

Choose a reason for hiding this comment

SimonDuToit commented Nov 18, 2024 •

edited

Loading