Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/happo implementation #1151

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

ch33nchan
Copy link

HAPPO Algorithm Implementation in Mava

Overview

This implementation introduces the HAPPO (Heterogeneous-Agent Proximal Policy Optimization) algorithm into the Mava repository. HAPPO is an extension of the PPO algorithm designed for multi-agent reinforcement learning, featuring a sequential update scheme and a centralized critic. This implementation ensures compatibility with the existing structure and components of the Mava repository.

Changes and Implementation

1. HAPPO Algorithm Implementation

File: mava/algorithms/happo.py

Description:

  • Created the HAPPO class inheriting from the base Algorithm class.
  • Initialized actor and critic networks, as well as the optimizer.
  • Implemented the update method to perform sequential updates for each agent's policy using the clipped surrogate objective.

Key Points:

  • Sequential Updates: Each agent's policy is updated sequentially to ensure stability and convergence.
  • Centralized Critic: The critic network estimates the value function using global state and actions of all agents.

2. Configuration

File: mava/configs/happo_config.py

Description:

  • Created the HAPPOConfig class inheriting from the FFIPPOConfig class found in mava.configs.system.ppo.ff_ippo.
  • Defined configuration parameters specific to the HAPPO algorithm, including learning rate, network configuration, and number of agents.

Key Points:

  • Inheritance: Inherits from an existing configuration class to ensure consistency and reuse of existing configurations.
  • Parameters: Includes parameters such as clip_param, num_agents, and lr.

3. Training Script

File: scripts/train_happo.py

Description:

  • Created a training script to initialize the HAPPO configuration, environment, algorithm, and trainer.
  • The script runs the training loop using the Trainer class.

Key Points:

  • Initialization: Initializes the HAPPO configuration, environment, algorithm, and trainer.
  • Training Loop: Runs the training loop using the Trainer class.

4. Integration with Existing Components

Files:

  • mava/utils/make_env.py
  • mava/networks/__init__.py
  • mava/utils/logger.py

Description:

  • Ensured compatibility with the HAPPO algorithm's environment, network, and logging requirements.
  • Updated the environment creation function to handle HAPPO-specific configurations.
  • Ensured necessary network components are imported.
  • Updated the logger to handle HAPPO-specific logging requirements.

Key Points:

  • Environment Creation: Updated to handle HAPPO-specific configurations.
  • Network Imports: Ensured necessary network components are imported.
  • Logger: Updated to handle HAPPO-specific logging requirements.

@CLAassistant
Copy link

CLAassistant commented Dec 27, 2024

CLA assistant check
All committers have signed the CLA.

@pull-request-size pull-request-size bot added size/L and removed size/M labels Dec 27, 2024
@sash-a
Copy link
Contributor

sash-a commented Dec 27, 2024

Hi @ch33nchan thanks for the contribution, just a heads up the team is on holiday till early Jan so we won't be able to review this until then.

Just a note though if you would like to contribute this please make sure to stay in line with Mava's style of doing things. HAPPO should not look too different from our current MAPPO implementation e.g keep the same code structure and place things in the relevant existing folders 🙏

@sash-a
Copy link
Contributor

sash-a commented Jan 7, 2025

Hi @ch33nchan are you able to update this to be more in line with our current implementations?

@sash-a
Copy link
Contributor

sash-a commented Jan 15, 2025

Hi again @ch33nchan, I see you're creating new folders (mava/algorithms) and creating classes for the agent which isn't Mava's convention. Are you able to modify your implementation to base it off mava's PPO in mava/systems/ppo/anakin/ff_ippo.py?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants