Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/madqn #362

Merged
merged 64 commits into from
Feb 11, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
dde7821
First commit.
jcformanek Jan 19, 2022
f453987
Fix Recurrent MADQN
jcformanek Jan 19, 2022
207f5f2
Working Rec MADQN on SMAC.
jcformanek Jan 20, 2022
6d62a0b
Start Env Wrappers.
jcformanek Jan 20, 2022
78408cc
Agent ID wrapper.
jcformanek Jan 20, 2022
7d2be2e
Added previous agent actions to wrapper
RuanJohn Jan 20, 2022
23b438a
Working QMIX.
jcformanek Jan 21, 2022
ee7b6fd
Small fixes in trainer.
jcformanek Jan 21, 2022
85cc19e
More small fixes in trainer.
jcformanek Jan 21, 2022
737b020
Benchmarking ready.
jcformanek Jan 24, 2022
ef8592c
Remove flatland import.
jcformanek Jan 24, 2022
bb2ad11
Small fix
jcformanek Jan 24, 2022
7435936
Small fixes and clean-up.
jcformanek Jan 25, 2022
9dded6a
Code clean up.
jcformanek Jan 26, 2022
0c9de48
Fix docstrings.
jcformanek Jan 26, 2022
83ee695
Fixed error caused by the typing fixes.
jcformanek Jan 26, 2022
dcbac08
Small fix.
jcformanek Jan 26, 2022
e70277b
Merge develop.
jcformanek Jan 26, 2022
de31b9c
Docstring coverage.
jcformanek Jan 26, 2022
805d0a7
Small change.
jcformanek Jan 26, 2022
fa70c50
Remove DIAL tests.
jcformanek Jan 26, 2022
38ace52
Fix test.
jcformanek Jan 27, 2022
4f60993
Formatting fixes.
jcformanek Jan 27, 2022
9adb204
Small docstring fix.
jcformanek Jan 27, 2022
1048527
Typing errors.
jcformanek Jan 27, 2022
57aba3a
More typing errors.
jcformanek Jan 27, 2022
94c2c83
Typo.
jcformanek Jan 27, 2022
0de21ff
Type ignore in test.
jcformanek Jan 27, 2022
edad908
Docstrings.
jcformanek Jan 27, 2022
30661d1
Flatland wrapper import error.
jcformanek Jan 27, 2022
476b1b5
Fix mypy issues.
jcformanek Jan 27, 2022
97bae04
Fixed tf.function bug. Big system speed-up.
jcformanek Jan 28, 2022
018665a
Doc strings.
jcformanek Jan 28, 2022
3216e41
Added tests for VDN and QMIX. Fixed MADQN test.
RuanJohn Jan 28, 2022
b9b52bc
Merge branch 'fix/madqn' of github.com:instadeepai/Mava into fix/madqn
jcformanek Jan 28, 2022
48f902a
Fixes.
jcformanek Jan 28, 2022
fe3ddaa
Fix old comments.
jcformanek Jan 31, 2022
1a8a5b3
Fix docstring in examples.
jcformanek Feb 2, 2022
26016ce
Fix docstrings in MADQN system.
jcformanek Feb 2, 2022
55e7694
Fix docstrings in Value Decomposition system.
jcformanek Feb 2, 2022
219a277
Fix docstrings in wrappers and utils.
jcformanek Feb 2, 2022
9adb7d0
Add Value Decomposition README.
jcformanek Feb 2, 2022
f66be7f
Fix imports when users have not installed SMAC or Flatland.
jcformanek Feb 2, 2022
afb5eab
Small typo in RAEDME.
jcformanek Feb 2, 2022
18c156a
Fix mixer docstrings.
jcformanek Feb 2, 2022
4bc0ae3
Merge branch 'develop' into fix/madqn
jcformanek Feb 2, 2022
f59545c
Fix import error in test.
jcformanek Feb 2, 2022
7dd5c20
More import fixes for flatland and smac.
jcformanek Feb 2, 2022
51a94ab
Reformating error.
jcformanek Feb 2, 2022
c5bab20
fix: Updated dockerfile for missing updates.
KaleabTessera Feb 2, 2022
0991973
Merge branch 'develop' into fix/madqn
jcformanek Feb 2, 2022
8be1abc
Merge branch 'develop' into fix/madqn
jcformanek Feb 3, 2022
d77b2db
Added random seed back to madqn system and executors
RuanJohn Feb 7, 2022
33c5ae5
Merge branch 'develop' into fix/madqn
jcformanek Feb 7, 2022
f4e193d
fix: main README system implementation table
jcformanek Feb 7, 2022
746722e
fix: mad4pg docstrings
jcformanek Feb 7, 2022
b77f08c
fix: examples README
jcformanek Feb 7, 2022
9db9baf
fix: evaluator interval 2 -> 2000 on SMAC
jcformanek Feb 7, 2022
e3991c7
Fixed import issues in env_preprocess_wrappers
RuanJohn Feb 8, 2022
16e76cf
Merge branch 'fix/madqn' of https://github.com/instadeepai/Mava into …
RuanJohn Feb 8, 2022
c9d8a0e
fix: EpsilonTimestepSchedulers in scaling MADQN
jcformanek Feb 9, 2022
ce614b8
fix: lower evaluator interval
jcformanek Feb 9, 2022
b220e49
Merge branch 'develop' of github.com:instadeepai/Mava into fix/madqn
jcformanek Feb 11, 2022
b678c48
fix: evaluator interval on SMAC examples should be ever 2000 steps
jcformanek Feb 11, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@ A clear and concise description of what the bug is.

### To Reproduce
Steps to reproduce the behavior:
1.
2.
3.
4.
1.
2.
3.
4.

### Expected behavior
A clear and concise description of what you expected to happen.
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ ARG record
# Ensure no installs try launch interactive screen
ARG DEBIAN_FRONTEND=noninteractive
# Update packages
RUN apt-get update -y && apt-get install -y python3-pip && apt-get install -y python3-venv
RUN apt-get update --fix-missing -y && apt-get install -y python3-pip && apt-get install -y python3-venv
# Update python path
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.8 10 &&\
rm -rf /root/.cache && apt-get clean
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ To read more about the motivation behind Mava, please see our [blog post][blog],

<hr>

👋 **UPDATE**: The team has been hard at work over the past few months to improve Mava's systems performance, stability and robustness. These efforts include extensively benchmarking system implementations, fixing bugs and profiling performance and speed. The culmination of this work will be reflected in our next stable release. However, during this period, we have learned a lot about what works and what doesn't. In particular, our current base system design allows for a decent amount of flexibility but quickly becomes difficult to maintain with growing signatures and system constructors as additional modules get added. Our class designs are also overly reliant on wrappers and inheritance which do not scale as well as we would like with increases in system complexity. Furthermore, our original motivation for choosing Tensorflow 2 (TF2) as our deep learning backend was to align with Acme's large repository of RL abstractions and tools for TF2. These were very useful for initially building our systems. But since then, we have found TF2 less performant and flexible than we desire given alternative frameworks. Acme has also affirmed their support of Jax underlying much of the DeepMind RL ecosystem. Therefore, in the coming months, following our stable release, **we plan to rollout a more modular and flexible build system specifically for Jax-based systems.** Please note that all TF2-based systems using the old build system will be maintained during the rollout. However, once a stable Jax release has been made with the new build system, Mava will only support a single DL backend, namely Jax, and we will begin to deprecate all TF2 systems and building support. That said, we will make sure to communicate clearly and often during the migration from TF2 to Jax.
👋 **UPDATE**: The team has been hard at work over the past few months to improve Mava's systems performance, stability and robustness. These efforts include extensively benchmarking system implementations, fixing bugs and profiling performance and speed. The culmination of this work will be reflected in our next stable release. However, during this period, we have learned a lot about what works and what doesn't. In particular, our current base system design allows for a decent amount of flexibility but quickly becomes difficult to maintain with growing signatures and system constructors as additional modules get added. Our class designs are also overly reliant on wrappers and inheritance which do not scale as well as we would like with increases in system complexity. Furthermore, our original motivation for choosing Tensorflow 2 (TF2) as our deep learning backend was to align with Acme's large repository of RL abstractions and tools for TF2. These were very useful for initially building our systems. But since then, we have found TF2 less performant and flexible than we desire given alternative frameworks. Acme has also affirmed their support of Jax underlying much of the DeepMind RL ecosystem. Therefore, in the coming months, following our stable release, **we plan to rollout a more modular and flexible build system specifically for Jax-based systems.** Please note that all TF2-based systems using the old build system will be maintained during the rollout. However, once a stable Jax release has been made with the new build system, Mava will only support a single DL backend, namely Jax, and we will begin to deprecate all TF2 systems and building support. That said, we will make sure to communicate clearly and often during the migration from TF2 to Jax.

<hr>

Expand Down Expand Up @@ -65,12 +65,12 @@ For details on how to add your own environment, see [here](https://github.com/in

| **Name** | **Recurrent** | **Continuous** | **Discrete** | **Centralised training** | **Multi Processing** |
| ------------------- | ------------------ | ------------------ | ------------------ | ------------------- | ------------------- |
| MADQN | :heavy_check_mark: | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| MADQN | :heavy_check_mark: | :x: | :heavy_check_mark: | :x: | :heavy_check_mark: |
| MADDPG | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| MAD4PG | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| MAPPO | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| VDN | :x: | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| QMIX | :x: | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| VDN | :heavy_check_mark: | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| QMIX | :heavy_check_mark: | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |

As we develop Mava further, we aim to have all systems well tested on a wide variety of environments.

Expand Down
2 changes: 1 addition & 1 deletion docs/images/focus_fire.html
Original file line number Diff line number Diff line change
Expand Up @@ -428,4 +428,4 @@
MTAw
">
Your browser does not support the video tag.
</video>
</video>
2 changes: 1 addition & 1 deletion docs/images/runaway.html
Original file line number Diff line number Diff line change
Expand Up @@ -1272,4 +1272,4 @@
dAAAACWpdG9vAAAAHWRhdGEAAAABAAAAAExhdmY1OC4yOS4xMDA=
">
Your browser does not support the video tag.
</video>
</video>
49 changes: 11 additions & 38 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,29 +62,16 @@ We also include a number of systems running on discrete action space environment
- *Feedforward*
- [decentralised][debug_madqn_ff_dec], [decentralised lr scheduling][debug_madqn_ff_dec_lr_schedule] (***using lr schedule***), [decentralised custom lr scheduling][debug_madqn_ff_dec_custom_lr_schedule] (***using custom lr schedule***) and [decentralised custom epsilon decay scheduling][debug_madqn_ff_dec_custom_eps_schedule] (***using configurable epsilon scheduling***).
- *Recurrent*
- [decentralised][debug_madqn_rec_dec] and [decentralised with coms][debug_madqn_rec_dec_coms] (***using a system with communication***).

- **QMIX**:
a QMIX system running on the discrete action space simple_spread MPE environment.
- *Feedforward* [decentralised][debug_qmix_ff_dec].
- [decentralised][debug_madqn_rec_dec].

- **VDN**:
a VDN system running on the discrete action space simple_spread MPE environment.
- *Feedforward* [decentralised][debug_vdn_ff_dec].

- **DIAL**:
a DIAL system running on the discrete action space simple_spread MPE environment.
- *Recurrent* [decentralised][debug_dial_rec_dec].

### Debugging Environment - Switch
- **DIAL**:
a DIAL system running on the discrete custom SwitchGame environment.
- *Recurrent* [decentralised][debug_switch_dial_rec_dec].
- *Recurrent* [centralised][debug_vdn_rec_cen].

### PettingZoo - Multi-Agent Atari
- **MADQN**:
a MADQN system running on the two-player competitive Atari Pong environment.
- *Feedforward* [decentralised][pz_madqn_pong_ff_dec].
- *Recurrent* [decentralised][pz_madqn_pong_ff_dec].

### PettingZoo - Multi-Agent Particle Environment
- **MADDPG**:
Expand All @@ -101,15 +88,15 @@ We also include a number of systems running on discrete action space environment
- *Feedforward*
- [decentralised][smac_madqn_ff_dec].
- *Recurrent*
- [decentralised with custom agent networks][smac_madqn_rec_dec_custom_agents] (***using custom agent networks***).
- [decentralised][smac_madqn_rec_dec].

- **QMIX**:
a QMIX system running on the SMAC environment.
- *Feedforward* [decentralised][smac_qmix_ff_dec].
- *Recurrent* [centralised][smac_qmix_rec_cen].

- **VDN**:
a VDN system running on the SMAC environment.
- *Feedforward* [decentralised][smac_vdn_ff_dec] and [decentralised record agents][smac_vdn_ff_dec_record].
- *Recurrent* [centralised][smac_vdn_rec_cen].

### OpenSpiel - Tic Tac Toe
- **MADQN**:
Expand Down Expand Up @@ -159,33 +146,19 @@ We also include a number of systems running on discrete action space environment
[debug_madqn_ff_dec_custom_lr_schedule]: https://github.com/instadeepai/Mava/blob/develop/examples/debugging/simple_spread/feedforward/decentralised/run_madqn_custom_lr_schedule.py
[debug_madqn_ff_dec_custom_eps_schedule]: https://github.com/instadeepai/Mava/blob/develop/examples/debugging/simple_spread/feedforward/decentralised/run_madqn_configurable_epsilon.py
[debug_madqn_rec_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/debugging/simple_spread/recurrent/decentralised/run_madqn.py
[debug_madqn_rec_dec_coms]: https://github.com/instadeepai/Mava/blob/develop/examples/debugging/simple_spread/recurrent/decentralised/run_madqn_with_coms.py

[debug_qmix_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/debugging/simple_spread/feedforward/decentralised/run_qmix.py

[debug_vdn_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/debugging/simple_spread/feedforward/decentralised/run_vdn.py

[debug_dial_rec_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/debugging/simple_spread/recurrent/decentralised/run_dial.py

[debug_vdn_rec_cen]: https://github.com/instadeepai/Mava/blob/develop/examples/debugging/simple_spread/recurrent/centralised/run_vdn.py

[debug_switch_dial_rec_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/debugging/switch/recurrent/decentralised/run_dial.py


[pz_madqn_pong_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/petting_zoo/atari/pong/feedforward/decentralised/run_madqn.py
[pz_madqn_pong_rec_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/petting_zoo/atari/pong/recurrent/centralised/run_madqn.py

[pz_maddpg_mpe_ssl_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/petting_zoo/mpe/simple_speaker_listener/feedforward/decentralised/run_maddpg.py

[pz_maddpg_mpe_ss_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/petting_zoo/mpe/simple_spread/feedforward/decentralised/run_maddpg.py

[smac_madqn_rec_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/smac/recurrent/decentralised/run_madqn.py

[smac_madqn_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/smac/feedforward/decentralised/run_madqn.py

[smac_madqn_rec_dec_custom_agents]: https://github.com/instadeepai/Mava/blob/develop/examples/smac/recurrent/decentralised/run_madqn.py

[smac_qmix_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/smac/feedforward/decentralised/run_qmix.py

[smac_vdn_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/smac/feedforward/decentralised/run_vdn.py
[smac_qmix_rec_cen]: https://github.com/instadeepai/Mava/blob/develop/examples/smac/recurrent/centralised/run_qmix.py

[smac_vdn_ff_dec_record]: https://github.com/instadeepai/Mava/blob/develop/examples/smac/feedforward/decentralised/run_vdn_record.py
[smac_vdn_rec_cen]: https://github.com/instadeepai/Mava/blob/develop/examples/smac/recurrent/centralised/run_vdn.py

[openspiel_madqn_ff_dec]: https://github.com/instadeepai/Mava/blob/develop/examples/openspiel/tic_tac_toe/feedforward/decentralised/run_madqn.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,6 @@ def main(_: Any) -> None:
exploration_scheduler_fn=LinearExplorationScheduler(
epsilon_start=1.0, epsilon_min=0.05, epsilon_decay=5e-4
),
importance_sampling_exponent=0.2,
jcformanek marked this conversation as resolved.
Show resolved Hide resolved
optimizer=snt.optimizers.Adam(learning_rate=1e-4),
checkpoint_subpath=checkpoint_dir,
).build()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,6 @@ def main(_: Any) -> None:
logger_factory=logger_factory,
num_executors=2,
exploration_scheduler_fn=exploration_scheduler_fn,
importance_sampling_exponent=0.2,
optimizer=snt.optimizers.Adam(learning_rate=1e-4),
checkpoint_subpath=checkpoint_dir,
).build()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -107,10 +107,9 @@ def main(_: Any) -> None:
exploration_scheduler_fn=LinearExplorationScheduler(
epsilon_start=1.0, epsilon_min=0.05, epsilon_decay=5e-4
),
importance_sampling_exponent=0.2,
optimizer=snt.optimizers.Adam(learning_rate=lr_start),
checkpoint_subpath=checkpoint_dir,
learning_rate_scheduler_fn=learning_rate_scheduler_fn,
learning_rate_scheduler_fn=learning_rate_scheduler_fn, # type: ignore
).build()

# Ensure only trainer runs on gpu, while other processes run on cpu.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,6 @@ def main(_: Any) -> None:
exploration_scheduler_fn=LinearExplorationScheduler(
epsilon_start=1.0, epsilon_min=0.05, epsilon_decay=5e-4
),
importance_sampling_exponent=0.2,
optimizer=snt.optimizers.Adam(learning_rate=lr),
checkpoint_subpath=checkpoint_dir,
learning_rate_scheduler_fn=learning_rate_scheduler_fn,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@
# See the License for the specific language governing permissions and
# limitations under the License.

"""Example running QMIX on debug MPE environments."""

"""Example running MADQN on debug MPE environments."""
import functools
from datetime import datetime
from typing import Any
Expand All @@ -22,9 +23,9 @@
import sonnet as snt
from absl import app, flags

from mava.components.tf.modules.exploration import LinearExplorationTimestepScheduler
from mava.systems.tf import qmix
from mava.utils import lp_utils
from mava.components.tf.modules.exploration import LinearExplorationScheduler
from mava.systems.tf import madqn
from mava.utils import enums, lp_utils
from mava.utils.environments import debugging_utils
from mava.utils.loggers import logger_utils

Expand All @@ -49,18 +50,18 @@


def main(_: Any) -> None:

# Environment.
environment_factory = functools.partial(
debugging_utils.make_environment,
env_name=FLAGS.env_name,
action_space=FLAGS.action_space,
return_state_info=True,
)

# Networks.
network_factory = lp_utils.partial_kwargs(qmix.make_default_networks)
network_factory = lp_utils.partial_kwargs(madqn.make_default_networks)

# Checkpointer appends "Checkpoints" to checkpoint_dir.
# Checkpointer appends "Checkpoints" to checkpoint_dir
checkpoint_dir = f"{FLAGS.base_dir}/{FLAGS.mava_id}"

# Log every [log_every] seconds.
Expand All @@ -74,17 +75,19 @@ def main(_: Any) -> None:
time_delta=log_every,
)

# Distributed program.
program = qmix.QMIX(
# distributed program
program = madqn.MADQN(
environment_factory=environment_factory,
network_factory=network_factory,
logger_factory=logger_factory,
num_executors=1,
exploration_scheduler_fn=LinearExplorationTimestepScheduler(
epsilon_start=1.0, epsilon_min=0.05, epsilon_decay_steps=20000
exploration_scheduler_fn=LinearExplorationScheduler(
epsilon_start=1.0, epsilon_min=0.05, epsilon_decay=5e-4
),
max_replay_size=1000000,
optimizer=snt.optimizers.RMSProp(learning_rate=1e-4),
shared_weights=False,
trainer_networks=enums.Trainer.one_trainer_per_network,
network_sampling_setup=enums.NetworkSampler.fixed_agent_networks,
optimizer=snt.optimizers.Adam(learning_rate=1e-4),
checkpoint_subpath=checkpoint_dir,
).build()

Expand Down

This file was deleted.

Loading