instadeepai · arnupretorius · Feb 11, 2022 · Jan 19, 2022 · Jan 19, 2022 · Jan 20, 2022
@@ -12,10 +12,10 @@ A clear and concise description of what the bug is.
 
 ### To Reproduce
 Steps to reproduce the behavior:
-1. 
-2. 
-3. 
-4. 
+1.
+2.
+3.
+4.
 
 ### Expected behavior
 A clear and concise description of what you expected to happen.

@@ -6,7 +6,7 @@ ARG record
 # Ensure no installs try launch interactive screen
 ARG DEBIAN_FRONTEND=noninteractive
 # Update packages
-RUN apt-get update -y && apt-get install -y python3-pip && apt-get install -y python3-venv
+RUN apt-get update --fix-missing -y && apt-get install -y python3-pip && apt-get install -y python3-venv
 # Update python path
 RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.8 10 &&\
     rm -rf /root/.cache && apt-get clean

@@ -15,7 +15,7 @@ To read more about the motivation behind Mava, please see our [blog post][blog],
 
 <hr>
 
-👋 **UPDATE**: The team has been hard at work over the past few months to improve Mava's systems performance, stability and robustness. These efforts include extensively benchmarking system implementations, fixing bugs and profiling performance and speed. The culmination of this work will be reflected in our next stable release. However, during this period, we have learned a lot about what works and what doesn't. In particular, our current base system design allows for a decent amount of flexibility but quickly becomes difficult to maintain with growing signatures and system constructors as additional modules get added. Our class designs are also overly reliant on wrappers and inheritance which do not scale as well as we would like with increases in system complexity. Furthermore, our original motivation for choosing Tensorflow 2 (TF2) as our deep learning backend was to align with Acme's large repository of RL abstractions and tools for TF2. These were very useful for initially building our systems. But since then, we have found TF2 less performant and flexible than we desire given alternative frameworks. Acme has also affirmed their support of Jax underlying much of the DeepMind RL ecosystem. Therefore, in the coming months, following our stable release, **we plan to rollout a more modular and flexible build system specifically for Jax-based systems.** Please note that all TF2-based systems using the old build system will be maintained during the rollout. However, once a stable Jax release has been made with the new build system, Mava will only support a single DL backend, namely Jax, and we will begin to deprecate all TF2 systems and building support. That said, we will make sure to communicate clearly and often during the migration from TF2 to Jax. 
+👋 **UPDATE**: The team has been hard at work over the past few months to improve Mava's systems performance, stability and robustness. These efforts include extensively benchmarking system implementations, fixing bugs and profiling performance and speed. The culmination of this work will be reflected in our next stable release. However, during this period, we have learned a lot about what works and what doesn't. In particular, our current base system design allows for a decent amount of flexibility but quickly becomes difficult to maintain with growing signatures and system constructors as additional modules get added. Our class designs are also overly reliant on wrappers and inheritance which do not scale as well as we would like with increases in system complexity. Furthermore, our original motivation for choosing Tensorflow 2 (TF2) as our deep learning backend was to align with Acme's large repository of RL abstractions and tools for TF2. These were very useful for initially building our systems. But since then, we have found TF2 less performant and flexible than we desire given alternative frameworks. Acme has also affirmed their support of Jax underlying much of the DeepMind RL ecosystem. Therefore, in the coming months, following our stable release, **we plan to rollout a more modular and flexible build system specifically for Jax-based systems.** Please note that all TF2-based systems using the old build system will be maintained during the rollout. However, once a stable Jax release has been made with the new build system, Mava will only support a single DL backend, namely Jax, and we will begin to deprecate all TF2 systems and building support. That said, we will make sure to communicate clearly and often during the migration from TF2 to Jax.
 
 <hr>
 
@@ -65,12 +65,12 @@ For details on how to add your own environment, see [here](https://github.com/in
 
 | **Name**         | **Recurrent**      | **Continuous** | **Discrete**  | **Centralised training**  | **Multi Processing**   |
 | ------------------- | ------------------ | ------------------ | ------------------ | ------------------- | ------------------- |
-| MADQN   | :heavy_check_mark: | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
+| MADQN   | :heavy_check_mark: | :x: | :heavy_check_mark: | :x: | :heavy_check_mark: |
 | MADDPG  | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark:       | :heavy_check_mark:  | :heavy_check_mark: |
 | MAD4PG   | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark:  | :heavy_check_mark: |
 | MAPPO   | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
-| VDN   | :x: | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
-| QMIX   | :x: | :x: | :heavy_check_mark:                | :heavy_check_mark: | :heavy_check_mark: |
+| VDN   | :heavy_check_mark: | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
+| QMIX   | :heavy_check_mark: | :x: | :heavy_check_mark:                | :heavy_check_mark: | :heavy_check_mark: |
 
 As we develop Mava further, we aim to have all systems well tested on a wide variety of environments.
 

@@ -428,4 +428,4 @@
 MTAw
 ">
   Your browser does not support the video tag.
-</video>
+</video>
@@ -1272,4 +1272,4 @@
 dAAAACWpdG9vAAAAHWRhdGEAAAABAAAAAExhdmY1OC4yOS4xMDA=
 ">
   Your browser does not support the video tag.
-</video>
+</video>
@@ -62,29 +62,16 @@ We also include a number of systems running on discrete action space environment
       - *Feedforward*
         - [decentralised][debug_madqn_ff_dec], [decentralised lr scheduling][debug_madqn_ff_dec_lr_schedule] (***using lr schedule***), [decentralised custom lr scheduling][debug_madqn_ff_dec_custom_lr_schedule] (***using custom lr schedule***) and [decentralised custom epsilon decay scheduling][debug_madqn_ff_dec_custom_eps_schedule] (***using configurable epsilon scheduling***).
       - *Recurrent*
-        - [decentralised][debug_madqn_rec_dec] and [decentralised with coms][debug_madqn_rec_dec_coms] (***using a system with communication***).
-
-  -   **QMIX**:
-      a QMIX system running on the discrete action space simple_spread MPE environment.
-      - *Feedforward* [decentralised][debug_qmix_ff_dec].
+        - [decentralised][debug_madqn_rec_dec].
 
   -   **VDN**:
       a VDN system running on the discrete action space simple_spread MPE environment.
-      - *Feedforward* [decentralised][debug_vdn_ff_dec].
-
-  -   **DIAL**:
-      a DIAL system running on the discrete action space simple_spread MPE environment.
-      - *Recurrent* [decentralised][debug_dial_rec_dec].
-
-### Debugging Environment - Switch
--    **DIAL**:
-    a DIAL system running on the discrete custom SwitchGame environment.
-     - *Recurrent* [decentralised][debug_switch_dial_rec_dec].
+      - *Recurrent* [centralised][debug_vdn_rec_cen].
 
 ### PettingZoo - Multi-Agent Atari
 -   **MADQN**:
    a MADQN system running on the two-player competitive Atari Pong environment.
-    - *Feedforward* [decentralised][pz_madqn_pong_ff_dec].
+    - *Recurrent* [decentralised][pz_madqn_pong_ff_dec].
 
 ### PettingZoo - Multi-Agent Particle Environment
   -   **MADDPG**:
@@ -101,15 +88,15 @@ We also include a number of systems running on discrete action space environment
     - *Feedforward*
         - [decentralised][smac_madqn_ff_dec].
     - *Recurrent*
-        - [decentralised with custom agent networks][smac_madqn_rec_dec_custom_agents] (***using custom agent networks***).
+        - [decentralised][smac_madqn_rec_dec].
 
 -   **QMIX**:
     a QMIX system running on the SMAC environment.
-    - *Feedforward* [decentralised][smac_qmix_ff_dec].
+    - *Recurrent* [centralised][smac_qmix_rec_cen].
 
 -   **VDN**:
     a VDN system running on the SMAC environment.
-    - *Feedforward* [decentralised][smac_vdn_ff_dec] and [decentralised record agents][smac_vdn_ff_dec_record].
+    - *Recurrent* [centralised][smac_vdn_rec_cen].
 
 ### OpenSpiel - Tic Tac Toe
   -   **MADQN**:
@@ -159,33 +146,19 @@ We also include a number of systems running on discrete action space environment
 [debug_madqn_ff_dec_custom_lr_schedule]: https://github.com/instadeepai/Mava/blob/develop/examples/debugging/simple_spread/feedforward/decentralised/run_madqn_custom_lr_schedule.py
 [debug_madqn_ff_dec_custom_eps_schedule]: https://github.com/instadeepai/Mava/blob/develop/examples/debugging/simple_spread/feedforward/decentralised/run_madqn_configurable_epsilon.py
 [debug_madqn_rec_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/debugging/simple_spread/recurrent/decentralised/run_madqn.py
-[debug_madqn_rec_dec_coms]: https://github.com/instadeepai/Mava/blob/develop/examples/debugging/simple_spread/recurrent/decentralised/run_madqn_with_coms.py
-
-[debug_qmix_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/debugging/simple_spread/feedforward/decentralised/run_qmix.py
-
-[debug_vdn_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/debugging/simple_spread/feedforward/decentralised/run_vdn.py
-
-[debug_dial_rec_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/debugging/simple_spread/recurrent/decentralised/run_dial.py
 
+[debug_vdn_rec_cen]: https://github.com/instadeepai/Mava/blob/develop/examples/debugging/simple_spread/recurrent/centralised/run_vdn.py
 
-[debug_switch_dial_rec_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/debugging/switch/recurrent/decentralised/run_dial.py
-
-
-[pz_madqn_pong_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/petting_zoo/atari/pong/feedforward/decentralised/run_madqn.py
+[pz_madqn_pong_rec_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/petting_zoo/atari/pong/recurrent/centralised/run_madqn.py
 
 [pz_maddpg_mpe_ssl_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/petting_zoo/mpe/simple_speaker_listener/feedforward/decentralised/run_maddpg.py
 
 [pz_maddpg_mpe_ss_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/petting_zoo/mpe/simple_spread/feedforward/decentralised/run_maddpg.py
 
+[smac_madqn_rec_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/smac/recurrent/decentralised/run_madqn.py
 
-[smac_madqn_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/smac/feedforward/decentralised/run_madqn.py
-
-[smac_madqn_rec_dec_custom_agents]: https://github.com/instadeepai/Mava/blob/develop/examples/smac/recurrent/decentralised/run_madqn.py
-
-[smac_qmix_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/smac/feedforward/decentralised/run_qmix.py
-
-[smac_vdn_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/smac/feedforward/decentralised/run_vdn.py
+[smac_qmix_rec_cen]: https://github.com/instadeepai/Mava/blob/develop/examples/smac/recurrent/centralised/run_qmix.py
 
-[smac_vdn_ff_dec_record]: https://github.com/instadeepai/Mava/blob/develop/examples/smac/feedforward/decentralised/run_vdn_record.py
+[smac_vdn_rec_cen]: https://github.com/instadeepai/Mava/blob/develop/examples/smac/recurrent/centralised/run_vdn.py
 
 [openspiel_madqn_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/openspiel/tic_tac_toe/feedforward/decentralised/run_madqn.py
@@ -84,7 +84,6 @@ def main(_: Any) -> None:
         exploration_scheduler_fn=LinearExplorationScheduler(
             epsilon_start=1.0, epsilon_min=0.05, epsilon_decay=5e-4
         ),
-        importance_sampling_exponent=0.2,
         optimizer=snt.optimizers.Adam(learning_rate=1e-4),
         checkpoint_subpath=checkpoint_dir,
     ).build()

@@ -108,7 +108,6 @@ def main(_: Any) -> None:
         logger_factory=logger_factory,
         num_executors=2,
         exploration_scheduler_fn=exploration_scheduler_fn,
-        importance_sampling_exponent=0.2,
         optimizer=snt.optimizers.Adam(learning_rate=1e-4),
         checkpoint_subpath=checkpoint_dir,
     ).build()

@@ -107,10 +107,9 @@ def main(_: Any) -> None:
         exploration_scheduler_fn=LinearExplorationScheduler(
             epsilon_start=1.0, epsilon_min=0.05, epsilon_decay=5e-4
         ),
-        importance_sampling_exponent=0.2,
         optimizer=snt.optimizers.Adam(learning_rate=lr_start),
         checkpoint_subpath=checkpoint_dir,
-        learning_rate_scheduler_fn=learning_rate_scheduler_fn,
+        learning_rate_scheduler_fn=learning_rate_scheduler_fn,  # type: ignore
     ).build()
 
     # Ensure only trainer runs on gpu, while other processes run on cpu.

@@ -97,7 +97,6 @@ def main(_: Any) -> None:
         exploration_scheduler_fn=LinearExplorationScheduler(
             epsilon_start=1.0, epsilon_min=0.05, epsilon_decay=5e-4
         ),
-        importance_sampling_exponent=0.2,
         optimizer=snt.optimizers.Adam(learning_rate=lr),
         checkpoint_subpath=checkpoint_dir,
         learning_rate_scheduler_fn=learning_rate_scheduler_fn,

@@ -13,7 +13,8 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-"""Example running QMIX on debug MPE environments."""
+
+"""Example running MADQN on debug MPE environments."""
 import functools
 from datetime import datetime
 from typing import Any
@@ -22,9 +23,9 @@
 import sonnet as snt
 from absl import app, flags
 
-from mava.components.tf.modules.exploration import LinearExplorationTimestepScheduler
-from mava.systems.tf import qmix
-from mava.utils import lp_utils
+from mava.components.tf.modules.exploration import LinearExplorationScheduler
+from mava.systems.tf import madqn
+from mava.utils import enums, lp_utils
 from mava.utils.environments import debugging_utils
 from mava.utils.loggers import logger_utils
 
@@ -49,18 +50,18 @@
 
 
 def main(_: Any) -> None:
+
     # Environment.
     environment_factory = functools.partial(
         debugging_utils.make_environment,
         env_name=FLAGS.env_name,
         action_space=FLAGS.action_space,
-        return_state_info=True,
     )
 
     # Networks.
-    network_factory = lp_utils.partial_kwargs(qmix.make_default_networks)
+    network_factory = lp_utils.partial_kwargs(madqn.make_default_networks)
 
-    # Checkpointer appends "Checkpoints" to checkpoint_dir.
+    # Checkpointer appends "Checkpoints" to checkpoint_dir
     checkpoint_dir = f"{FLAGS.base_dir}/{FLAGS.mava_id}"
 
     # Log every [log_every] seconds.
@@ -74,17 +75,19 @@ def main(_: Any) -> None:
         time_delta=log_every,
     )
 
-    # Distributed program.
-    program = qmix.QMIX(
+    # distributed program
+    program = madqn.MADQN(
         environment_factory=environment_factory,
         network_factory=network_factory,
         logger_factory=logger_factory,
         num_executors=1,
-        exploration_scheduler_fn=LinearExplorationTimestepScheduler(
-            epsilon_start=1.0, epsilon_min=0.05, epsilon_decay_steps=20000
+        exploration_scheduler_fn=LinearExplorationScheduler(
+            epsilon_start=1.0, epsilon_min=0.05, epsilon_decay=5e-4
         ),
-        max_replay_size=1000000,
-        optimizer=snt.optimizers.RMSProp(learning_rate=1e-4),
+        shared_weights=False,
+        trainer_networks=enums.Trainer.one_trainer_per_network,
+        network_sampling_setup=enums.NetworkSampler.fixed_agent_networks,
+        optimizer=snt.optimizers.Adam(learning_rate=1e-4),
         checkpoint_subpath=checkpoint_dir,
     ).build()