Feat/happo implementation #1151

ch33nchan · 2024-12-27T00:09:28Z

HAPPO Algorithm Implementation in Mava

Overview

This implementation introduces the HAPPO (Heterogeneous-Agent Proximal Policy Optimization) algorithm into the Mava repository. HAPPO is an extension of the PPO algorithm designed for multi-agent reinforcement learning, featuring a sequential update scheme and a centralized critic. This implementation ensures compatibility with the existing structure and components of the Mava repository.

Changes and Implementation

1. HAPPO Algorithm Implementation

File: mava/algorithms/happo.py

Description:

Created the HAPPO class inheriting from the base Algorithm class.
Initialized actor and critic networks, as well as the optimizer.
Implemented the update method to perform sequential updates for each agent's policy using the clipped surrogate objective.

Key Points:

Sequential Updates: Each agent's policy is updated sequentially to ensure stability and convergence.
Centralized Critic: The critic network estimates the value function using global state and actions of all agents.

2. Configuration

File: mava/configs/happo_config.py

Description:

Created the HAPPOConfig class inheriting from the FFIPPOConfig class found in mava.configs.system.ppo.ff_ippo.
Defined configuration parameters specific to the HAPPO algorithm, including learning rate, network configuration, and number of agents.

Key Points:

Inheritance: Inherits from an existing configuration class to ensure consistency and reuse of existing configurations.
Parameters: Includes parameters such as clip_param, num_agents, and lr.

3. Training Script

File: scripts/train_happo.py

Description:

Created a training script to initialize the HAPPO configuration, environment, algorithm, and trainer.
The script runs the training loop using the Trainer class.

Key Points:

Initialization: Initializes the HAPPO configuration, environment, algorithm, and trainer.
Training Loop: Runs the training loop using the Trainer class.

4. Integration with Existing Components

Files:

mava/utils/make_env.py
mava/networks/__init__.py
mava/utils/logger.py

Description:

Ensured compatibility with the HAPPO algorithm's environment, network, and logging requirements.
Updated the environment creation function to handle HAPPO-specific configurations.
Ensured necessary network components are imported.
Updated the logger to handle HAPPO-specific logging requirements.

Key Points:

Environment Creation: Updated to handle HAPPO-specific configurations.
Network Imports: Ensured necessary network components are imported.
Logger: Updated to handle HAPPO-specific logging requirements.

CLAassistant · 2024-12-27T00:09:34Z

All committers have signed the CLA.

sash-a · 2024-12-27T11:46:24Z

Hi @ch33nchan thanks for the contribution, just a heads up the team is on holiday till early Jan so we won't be able to review this until then.

Just a note though if you would like to contribute this please make sure to stay in line with Mava's style of doing things. HAPPO should not look too different from our current MAPPO implementation e.g keep the same code structure and place things in the relevant existing folders 🙏

sash-a · 2025-01-07T11:40:08Z

Hi @ch33nchan are you able to update this to be more in line with our current implementations?

sash-a · 2025-01-15T10:14:01Z

Hi again @ch33nchan, I see you're creating new folders (mava/algorithms) and creating classes for the agent which isn't Mava's convention. Are you able to modify your implementation to base it off mava's PPO in mava/systems/ppo/anakin/ff_ippo.py?

ch33nchan added 3 commits December 27, 2024 00:03

feat: Implement HAPPO algorithm

3babfa4

feat: Add HAPPO configuration

a954780

feat: Add training script for HAPPO algorithm

2dd8094

ch33nchan requested review from RuanJohn, sash-a, OmaymaMahjoub, WiemKhlifi, SimonDuToit and Louay-Ben-nessir as code owners December 27, 2024 00:09

pull-request-size bot added the size/M label Dec 27, 2024

feat: Add HAPPO configuration

b9b788b

pull-request-size bot added size/L and removed size/M labels Dec 27, 2024

ch33nchan added 3 commits December 27, 2024 00:28

feat: Add training script for HAPPO algorithm

9babcfb

refactor: Update environment creation for HAPPO

ee16224

refactor: Update logger for HAPPO

22a094d

ch33nchan mentioned this pull request Dec 27, 2024

[FEATURE] HAPPO, and HetIPPO, HetMAPPO, ... Implementations #1150

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/happo implementation #1151

Feat/happo implementation #1151

ch33nchan commented Dec 27, 2024

CLAassistant commented Dec 27, 2024 •

edited

Loading

sash-a commented Dec 27, 2024 •

edited

Loading

sash-a commented Jan 7, 2025

sash-a commented Jan 15, 2025

Feat/happo implementation #1151

Are you sure you want to change the base?

Feat/happo implementation #1151

Conversation

ch33nchan commented Dec 27, 2024

HAPPO Algorithm Implementation in Mava

Overview

Changes and Implementation

1. HAPPO Algorithm Implementation

2. Configuration

3. Training Script

4. Integration with Existing Components

CLAassistant commented Dec 27, 2024 • edited Loading

sash-a commented Dec 27, 2024 • edited Loading

sash-a commented Jan 7, 2025

sash-a commented Jan 15, 2025

CLAassistant commented Dec 27, 2024 •

edited

Loading

sash-a commented Dec 27, 2024 •

edited

Loading