Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPE Continuous Action support #419

Merged
merged 21 commits into from
Jul 16, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion docs/mpe.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ pip install pettingzoo[mpe]

Multi Particle Environments (MPE) are a set of communication oriented environment where particle agents can (sometimes) move, communicate, see each other, push each other around, and interact with fixed landmarks.

These environments are from [OpenAI's MPE](https://github.com/openai/multiagent-particle-envs) codebase, with several minor fixes, mostly related to making the action space discrete, making the rewards consistent and cleaning up the observation space of certain environments.
These environments are from [OpenAI's MPE](https://github.com/openai/multiagent-particle-envs) codebase, with several minor fixes, mostly related to making the action space discrete by default, making the rewards consistent and cleaning up the observation space of certain environments.

### Types of Environments

Expand Down Expand Up @@ -43,8 +43,16 @@ If an agent cannot see or observe the communication of a second agent, then the

### Action Space

Note: [OpenAI's MPE](https://github.com/openai/multiagent-particle-envs) uses continuous action spaces by default.

Discrete action space (Default):

The action space is a discrete action space representing the combinations of movements and communications an agent can perform. Agents that can move can choose between the 4 cardinal directions or do nothing. Agents that can communicate choose between 2 and 10 environment-dependent communication options, which broadcast a message to all agents that can hear it.

Continuous action space (Set by continuous_actions=True):

The action space is a continuous action space representing the movements and communication an agent can perform. Agents that can move can input a velocity between 0.0 and 1.0 in each of the four cardinal directions, where opposing velocities e.g. left and right are summed together. Agents that can communicate can output a continuous value over each communication channel in the environment which they have access to.

### Rendering

Rendering displays the scene in a window that automatically grows if agents wander beyond its border. Communication is rendered at the bottom of the scene. The `render()` method also returns the pixel map of the rendered area.
Expand Down
8 changes: 5 additions & 3 deletions docs/mpe/simple.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
actions: "Discrete"
actions: "Discrete/Continuous"
title: "Simple"
agents: "1"
manual-control: "No"
action-shape: "(5)"
action-values: "Discrete(5)"
action-values: "Discrete(5)/Box(0.0, 1.0, (5,))"
observation-shape: "(4)"
observation-values: "(-inf,inf)"
import: "from pettingzoo.mpe import simple_v2"
Expand All @@ -22,10 +22,12 @@ Observation space: `[self_vel, landmark_rel_position]`
### Arguments

```
simple_v2.env(max_cycles=25)
simple_v2.env(max_cycles=25, continuous_actions=False)
```



`max_cycles`: number of frames (a step for each agent) until game terminates

`continuous_actions`: Whether agent action spaces are discrete(default) or continuous

8 changes: 5 additions & 3 deletions docs/mpe/simple_adversary.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
actions: "Discrete"
actions: "Discrete/Continuous"
title: "Simple Adversary"
agents: "3"
manual-control: "No"
action-shape: "(5)"
action-values: "Discrete(5)"
action-values: "Discrete(5)/Box(0.0, 1.0, (5))"
observation-shape: "(8),(10)"
observation-values: "(-inf,inf)"
import: "from pettingzoo.mpe import simple_adversary_v2"
Expand All @@ -28,11 +28,13 @@ Adversary action space: `[no_action, move_left, move_right, move_down, move_up]`
### Arguments

```
simple_adversary_v2.env(N=2, max_cycles=25)
simple_adversary_v2.env(N=2, max_cycles=25, continuous_actions=False)
```



`N`: number of good agents and landmarks

`max_cycles`: number of frames (a step for each agent) until game terminates

`continuous_actions`: Whether agent action spaces are discrete(default) or continuous
8 changes: 5 additions & 3 deletions docs/mpe/simple_crypto.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
actions: "Discrete"
actions: "Discrete/Continuous"
title: "Simple Crypto"
agents: "2"
manual-control: "No"
action-shape: "(4)"
action-values: "Discrete(4)"
action-values: "Discrete(4)/Box(0.0, 1.0, (4))"
observation-shape: "(4),(8)"
observation-values: "(-inf,inf)"
import: "from pettingzoo.mpe import simple_crypto_v2"
Expand Down Expand Up @@ -35,9 +35,11 @@ For Bob and Eve, their communication is checked to be the 1 bit of information t
### Arguments

```
simple_crypto_v2.env(max_cycles=25)
simple_crypto_v2.env(max_cycles=25, continuous_actions=False)
```



`max_cycles`: number of frames (a step for each agent) until game terminates

`continuous_actions`: Whether agent action spaces are discrete(default) or continuous
6 changes: 3 additions & 3 deletions docs/mpe/simple_push.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
actions: "Discrete"
actions: "Discrete/Continuous"
title: "Simple Push"
agents: "2"
manual-control: "No"
action-shape: "(5)"
action-values: "Discrete(5)"
action-values: "Discrete(5)/Box(0.0, 1.0, (5,))"
observation-shape: "(8),(19)"
observation-values: "(-inf,inf)"
import: "from pettingzoo.mpe import simple_push_v2"
Expand All @@ -28,7 +28,7 @@ Adversary action space: `[no_action, move_left, move_right, move_down, move_up]`
### Arguments

```
simple_push_v2.env(max_cycles=25)
simple_push_v2.env(max_cycles=25, continuous_actions=False)
```


Expand Down
13 changes: 9 additions & 4 deletions docs/mpe/simple_reference.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
actions: "Discrete"
actions: "Discrete/Continuous"
title: "Simple Reference"
agents: "2"
manual-control: "No"
action-shape: "(50)"
action-values: "Discrete(50)"
action-values: "Discrete(50)/Box(0.0, 1.0, (15))"
observation-shape: "(21)"
observation-values: "(-inf,inf)"
average-total-reward: "-57.1"
Expand All @@ -22,19 +22,24 @@ Locally, the agents are rewarded by their distance to their target landmark. Glo

Agent observation space: `[self_vel, all_landmark_rel_positions, landmark_ids, goal_id, communication]`

Agent action space: `[say_0, say_1, say_2, say_3, say_4, say_5, say_6, say_7, say_8, say_9] X [no_action, move_left, move_right, move_down, move_up]`
Agent discrete action space: `[say_0, say_1, say_2, say_3, say_4, say_5, say_6, say_7, say_8, say_9] X [no_action, move_left, move_right, move_down, move_up]`

Where X is the Cartesian product (giving a total action space of 50).

Agent continuous action space: `[no_action, move_left, move_right, move_down, move_up, say_0, say_1, say_2, say_3, say_4, say_5, say_6, say_7, say_8, say_9]`

### Arguments


```
simple_reference_v2.env(local_ratio=0.5, max_cycles=25)
simple_reference_v2.env(local_ratio=0.5, max_cycles=25, continuous_actions=False)
```



`local_ratio`: Weight applied to local reward and global reward. Global reward weight will always be 1 - local reward weight.

`max_cycles`: number of frames (a step for each agent) until game terminates

`continuous_actions`: Whether agent action spaces are discrete(default) or continuous

8 changes: 5 additions & 3 deletions docs/mpe/simple_speaker_listener.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
actions: "Discrete"
actions: "Discrete/Continuous"
title: "Simple Speaker Listener"
agents: "2"
manual-control: "No"
action-shape: "(3),(5)"
action-values: "Discrete(3),(5)"
action-values: "Discrete(3),(5)/Box(0.0, 1.0, (3)), Box(0.0, 1.0, (5))"
observation-shape: "(3),(11)"
observation-values: "(-inf,inf)"
average-total-reward: "-80.9"
Expand All @@ -29,9 +29,11 @@ Listener action space: `[no_action, move_left, move_right, move_down, move_up]`
### Arguments

```
simple_speaker_listener_v2.env(max_cycles=25)
simple_speaker_listener_v2.env(max_cycles=25, continuous_actions=False)
```



`max_cycles`: number of frames (a step for each agent) until game terminates

`continuous_actions`: Whether agent action spaces are discrete(default) or continuous
8 changes: 5 additions & 3 deletions docs/mpe/simple_spread.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
actions: "Discrete"
actions: "Discrete/Continuous"
title: "Simple Spread"
agents: "3"
manual-control: "No"
action-shape: "(5)"
action-values: "Discrete(5)"
action-values: "Discrete(5)/Box(0.0, 1.0, (5))"
observation-shape: "(18)"
observation-values: "(-inf,inf)"
average-total-reward: "-115.6"
Expand All @@ -27,7 +27,7 @@ Agent action space: `[no_action, move_left, move_right, move_down, move_up]`
### Arguments

```
simple_spread_v2.env(N=3, local_ratio=0.5, max_cycles=25)
simple_spread_v2.env(N=3, local_ratio=0.5, max_cycles=25, continuous_actions=False)
```


Expand All @@ -37,3 +37,5 @@ simple_spread_v2.env(N=3, local_ratio=0.5, max_cycles=25)
`local_ratio`: Weight applied to local reward and global reward. Global reward weight will always be 1 - local reward weight.

`max_cycles`: number of frames (a step for each agent) until game terminates

`continuous_actions`: Whether agent action spaces are discrete(default) or continuous
8 changes: 5 additions & 3 deletions docs/mpe/simple_tag.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
actions: "Discrete"
actions: "Discrete/Continuous"
title: "Simple Tag"
agents: "4"
manual-control: "No"
action-shape: "(5)"
action-values: "Discrete(5)"
action-values: "Discrete(5)/Box(0.0, 1.0, (50))"
observation-shape: "(14),(16)"
observation-values: "(-inf,inf)"
import: "from pettingzoo.mpe import simple_tag_v2"
Expand Down Expand Up @@ -34,7 +34,7 @@ Agent and adversary action space: `[no_action, move_left, move_right, move_down,
### Arguments

```
simple_tag_v2.env(num_good=1, num_adversaries=3, num_obstacles=2 , max_cycles=25)
simple_tag_v2.env(num_good=1, num_adversaries=3, num_obstacles=2, max_cycles=25, continuous_actions=False)
```


Expand All @@ -47,3 +47,5 @@ simple_tag_v2.env(num_good=1, num_adversaries=3, num_obstacles=2 , max_cycles=25

`max_cycles`: number of frames (a step for each agent) until game terminates

`continuous_actions`: Whether agent action spaces are discrete(default) or continuous

11 changes: 7 additions & 4 deletions docs/mpe/simple_world_comm.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
actions: "Discrete"
actions: "Discrete/Continuous"
title: "Simple World Comm"
agents: "6"
manual-control: "No"
action-shape: "(5),(20)"
action-values: "Discrete(5),(20)"
action-values: "Discrete(5),(20)/Box(0.0, 1.0, (5)), Box(0.0, 1.0, (9))"
observation-shape: "(28),(34)"
observation-values: "(-inf,inf)"
import: "from pettingzoo.mpe import simple_world_comm_v2"
Expand All @@ -31,16 +31,17 @@ Good agent action space: `[no_action, move_left, move_right, move_down, move_up]

Normal adversary action space: `[no_action, move_left, move_right, move_down, move_up]`

Adversary leader observation space: `[say_0, say_1, say_2, say_3] X [no_action, move_left, move_right, move_down, move_up]`
Adversary leader discrete action space: `[say_0, say_1, say_2, say_3] X [no_action, move_left, move_right, move_down, move_up]`

Where X is the Cartesian product (giving a total action space of 50).

Adversary leader continuous action space: `[no_action, move_left, move_right, move_down, move_up, say_0, say_1, say_2, say_3]`

### Arguments

```
simple_world_comm.env(num_good=2, num_adversaries=4, num_obstacles=1,
num_food=2, max_cycles=25, num_forests=2)
num_food=2, max_cycles=25, num_forests=2, continuous_actions=False)
```


Expand All @@ -57,3 +58,5 @@ simple_world_comm.env(num_good=2, num_adversaries=4, num_obstacles=1,

`num_forests`: number of forests that can hide agents inside from being seen

`continuous_actions`: Whether agent action spaces are discrete(default) or continuous

2 changes: 1 addition & 1 deletion pettingzoo/mpe/_mpe_utils/rendering.py
Original file line number Diff line number Diff line change
Expand Up @@ -283,7 +283,7 @@ def set_text(self, text):
self.label = pyglet.text.Label(text,
font_name=font,
color=(0, 0, 0, 255),
font_size=25,
font_size=20,
x=0, y=self.idx * 40 + 20,
anchor_x="left", anchor_y="bottom")

Expand Down
Loading