New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Feat Sebulba recurrent IQL #1148

Open

Louay-Ben-nessir wants to merge 10 commits into develop from feat-sebulba-rec-iql

Contributor

Louay-Ben-nessir commented Dec 4, 2024

What?

A recurrent IQL implementation using the Sebulba architecture.

Why?

Offline Sebulba base and non-jax envs in Mava.

How?

Mixed the Sebulba structure from PPO with the learner code from Anakin IQL.

Louay-Ben-nessir added 4 commits

November 18, 2024 09:47


          feat: inital iql

23f5d0c


          fix: concat of trajs from diffrent actors

ee4834f


          fix: deadlock caused by deleting when buffer is full

7e44d15


          fix: major changes to the ratelimiter configs and a separate buffer p…

6c8452f

…er acotr

pull-request-size bot added the size/XXL label

Louay-Ben-nessir self-assigned this

Louay-Ben-nessir and others added 2 commits

January 4, 2025 20:13


          docs: minor comment chnage

834b528


          Merge branch 'develop' into feat-sebulba-rec-iql

6ab5197

Louay-Ben-nessir marked this pull request as ready for review

January 4, 2025 19:57

Louay-Ben-nessir requested review from RuanJohn, sash-a, OmaymaMahjoub, WiemKhlifi and SimonDuToit as code owners

January 4, 2025 19:57

Louay-Ben-nessir mentioned this pull request

Feat: c envs support #1152

Draft

sash-a requested changes

View reviewed changes

Contributor

sash-a left a comment

I've looked through everything except the system file and it looks good, Sebulba utils especially! Just some relatively minor style changes

mava/configs/system/q_learning/rec_iql.yaml Outdated Show resolved Hide resolved

mava/systems/q_learning/types.py Outdated Show resolved Hide resolved

mava/utils/config.py Outdated

Comment on lines 38 to 43

+                  # PPO specifique check
+                  if "num_minibatches" in config.system:
+                      assert num_eval_samples % config.system.num_minibatches == 0, (
+                          f"Number of training samples per evaluator ({num_eval_samples})"
+                          + f"must be divisible by num_minibatches ({config.system.num_minibatches})."
+                      )

Contributor

sash-a Jan 7, 2025

A thought on this, maybe we can split these up into multiple methods e.g check_num_updates, check_num_envs etc. Then have a check_sebulba_config_ppo, check_anakin_config_ppo and a check_sebulba_config_iql which will use the relevant methods?

Contributor Author

Louay-Ben-nessir Jan 22, 2025

I split it into base_sebulba_checks and ppo_sebulba_checks. Any more splits feel excessive 🤔

mava/utils/sebulba.py


		# todo: remove the ppo dependencies when we make sebulba for other systems

Contributor

sash-a Jan 7, 2025

This is a good point though, maybe there's something we can do about it 🤔

Maybe a protocol like that has action, obs, reward, not sure if there's any other common attributes?

mava/utils/sebulba.py Outdated Show resolved Hide resolved

mava/utils/sebulba.py Outdated Show resolved Hide resolved

mava/utils/sebulba.py Outdated Show resolved Hide resolved

mava/utils/sebulba.py Outdated

Comment on lines 210 to 212

+                  def __init__(
+                      self, samples_per_insert: float, min_size_to_sample: int, min_diff: float, max_diff: float
+                  ):

Contributor

sash-a Jan 7, 2025

Can we please add a good doc string here please 🙏

mava/utils/sebulba.py Outdated Show resolved Hide resolved

mava/wrappers/gym.py

Comment on lines +275 to +277

+                      terminated = np.repeat(
+                          terminated[..., np.newaxis], repeats=self.num_agents, axis=-1
+                      )  # (B,) --> (B, N)

Contributor

sash-a Jan 7, 2025

Does this already happen for smax and lbf?


          Merge branch 'develop' into feat-sebulba-rec-iql

312c280

sash-a requested changes

View reviewed changes

Contributor

sash-a left a comment

Great work here! Really minor changes required. Happy to merge this pending some benchmarks

mava/systems/q_learning/sebulba/rec_iql.py Outdated Show resolved Hide resolved

mava/systems/q_learning/sebulba/rec_iql.py

+                                  next_timestep = env.step(cpu_action)
+                              # Prepare the transation
+                              terminal = (1 - timestep.discount[..., 0, jnp.newaxis]).astype(bool)

Contributor

sash-a Jan 8, 2025

Are you sure we want to remove the agent dim here?

Contributor Author

Louay-Ben-nessir Jan 22, 2025

The dones flag is removed here and added back in the scannedRNN. We could either modify the RNN to handle the dones flag with or without the agent dimension or standardize it by keeping the agent dimension across all scripts. 🤔

mava/systems/q_learning/sebulba/rec_iql.py

+                              target: Array,
+                          ) -> Tuple[Array, Metrics]:
+                              # axes switched here to scan over time
+                              hidden_state, obs_term_or_trunc = prep_inputs_to_scannedrnn(obs, term_or_trunc)

Contributor

sash-a Jan 8, 2025

A general comment, I think this would be a lot easier to read if we used done to mean term_or_trunc which I think is a reasonable thing. Would have to make the change in anakin also though :/

mava/systems/q_learning/sebulba/rec_iql.py Outdated Show resolved Hide resolved

mava/systems/q_learning/sebulba/rec_iql.py

+                      """
+                      eps = jnp.maximum(
+                          config.system.eps_min, 1 - (t / config.system.eps_decay) * (1 - config.system.eps_min)

Contributor

sash-a Jan 8, 2025

Would be nice if we could set a different decay per actor, although I think that's out of scope for this PR. Maybe if you could make an issue to add in some of the ape-X DQN features that would be great

mava/systems/q_learning/sebulba/rec_iql.py Outdated Show resolved Hide resolved

mava/configs/system/q_learning/rec_iql.yaml Outdated Show resolved Hide resolved

mava/systems/q_learning/sebulba/rec_iql.py Outdated Show resolved Hide resolved

mava/systems/q_learning/sebulba/rec_iql.py Outdated Show resolved Hide resolved

mava/systems/q_learning/sebulba/rec_iql.py Outdated Show resolved Hide resolved

sash-a reviewed

View reviewed changes

mava/systems/q_learning/sebulba/rec_iql.py Outdated Show resolved Hide resolved

Louay-Ben-nessir added 3 commits

January 22, 2025 12:15


          feat: changed file structer, removed the threadlifetime and made mino…

93b51b0

…r chanage


          chore: pre-commit

65aecb6


          chore: minor changes

e16e64d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

sash-a sash-a requested changes

RuanJohn Awaiting requested review from RuanJohn RuanJohn is a code owner

OmaymaMahjoub Awaiting requested review from OmaymaMahjoub OmaymaMahjoub is a code owner

WiemKhlifi Awaiting requested review from WiemKhlifi WiemKhlifi is a code owner

SimonDuToit Awaiting requested review from SimonDuToit SimonDuToit is a code owner

Requested changes must be addressed to merge this pull request.

Labels