Open ai gym wrapper + bump engine version (#52)

* wip * Update gym.py * . * wip * wip * Update Dockerfile.gym.dev * wip * Update gym.py * fwd model close * Update dev_gym.py * Update forward_model.py * wip * Update agent.py * Update gym.py * Update gym.py * Update gym.py * . * . * Update gym.py * wip * wip * Update gym.py * wip * wip * Update dev_gym.py * wip * wip * wip * wip * Update gym.py * wip * Update gym.py * Update gym.py * Update gym.py * . * Bump websockets version * Update gym.py * Update gym.py * wip * wip * gym * close * Update README.md * Update README.md * Update README.md
CoderOneHQ · Dec 6, 2021 · be04d86 · be04d86
1 parent dfeb54b
commit be04d86
Show file tree

Hide file tree

Showing 12 changed files with 193 additions and 28 deletions.
diff --git a/README.md b/README.md
@@ -23,14 +23,15 @@ docker-compose up --abort-on-container-exit --force-recreate
 
 # Starter kits
 
-| Kit            | Link                                                                      | Description                                        | Up-to-date? | Contributed by                          |
-| -------------- | ------------------------------------------------------------------------- | -------------------------------------------------- | ----------- | --------------------------------------- |
-| Python3        | [Link](https://github.com/CoderOneHQ/starter-kits/tree/master/python3)    | Basic Python3 starter                              | ✅          | Coder One                               |
-| Python3-fwd    | [Link](https://github.com/CoderOneHQ/starter-kits/tree/master/python3)    | Includes example for using forward model simulator | ❌          | Coder One                               |
-| TypeScript     | [Link](https://github.com/CoderOneHQ/starter-kits/tree/master/typescript) | Basic TypeScript starter                           | ❌          | Coder One                               |
-| TypeScript-fwd | [Link](https://github.com/CoderOneHQ/starter-kits/tree/master/typescript) | Includes example for using forward model simulator | ❌          | Coder One                               |
-| Go             | [Link](https://github.com/CoderOneHQ/bomberland/tree/master/go)           | Basic Go starter                                   | ✅          | [dtitov](https://github.com/dtitov)     |
-| C++            | [Link](https://github.com/CoderOneHQ/bomberland/tree/master/C%2B%2B)      | Basic C++ starter                                  | ✅          | [jfbogusz](https://github.com/jfbogusz) |
+| Kit                 | Link                                                                      | Description                                        | Up-to-date? | Contributed by                          |
+| ------------------- | ------------------------------------------------------------------------- | -------------------------------------------------- | ----------- | --------------------------------------- |
+| Python3             | [Link](https://github.com/CoderOneHQ/starter-kits/tree/master/python3)    | Basic Python3 starter                              | ✅          | Coder One                               |
+| Python3-fwd         | [Link](https://github.com/CoderOneHQ/starter-kits/tree/master/python3)    | Includes example for using forward model simulator | ✅          | Coder One                               |
+| Python3-gym-wrapper | [Link](https://github.com/CoderOneHQ/starter-kits/tree/master/python3)    | Open AI Gym wrapper                                | ✅          | Coder One                               |
+| TypeScript          | [Link](https://github.com/CoderOneHQ/starter-kits/tree/master/typescript) | Basic TypeScript starter                           | ❌          | Coder One                               |
+| TypeScript-fwd      | [Link](https://github.com/CoderOneHQ/starter-kits/tree/master/typescript) | Includes example for using forward model simulator | ❌          | Coder One                               |
+| Go                  | [Link](https://github.com/CoderOneHQ/bomberland/tree/master/go)           | Basic Go starter                                   | ✅          | [dtitov](https://github.com/dtitov)     |
+| C++                 | [Link](https://github.com/CoderOneHQ/bomberland/tree/master/C%2B%2B)      | Basic C++ starter                                  | ✅          | [jfbogusz](https://github.com/jfbogusz) |
 
 # Contributing
 
@@ -42,11 +43,12 @@ For any help, please contact us directly on [Discord](https://discord.gg/NkfgvRN
 
 # Release Notes
 
-| Ver. | Changes                                                                                                                                                                                                                                               | Date                                                                  | Binary        |
-| ---- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------- | ----------- |
-| 1523 | Forward model bug fixes + unit move blocking on moving to same cell + reset game with a set world and prng seed (See: [Docs](https://www.gocoder.one/docs/api-reference#reset-game)) | 29 Nov 2021  | [Link](https://github.com/CoderOneHQ/bomberland/releases/tag/build-1523) |
-| 1065 | Added `UNITS_PER_AGENT` environment flag (See: [Docs](https://gocoder.one/docs/api-reference#%EF%B8%8F-environment-flags))  | 9 Oct 2021 | - |
-| 974 | Added functionality: <ul><li>Reset the game without restarting engine/containers</li><li>Evaluate next state by the game engine given a state + list of actions</li></ul> See: [Docs](https://gocoder.one/docs/api-reference#-administrator-api) | 18 Sep 2021 | [Link](https://github.com/CoderOneHQ/bomberland/releases/tag/build-974) |
+| Ver. | Changes                                                                                                                                                                                                                                          | Date          | Binary                                                                   |
+| ---- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------- | ------------------------------------------------------------------------ |
+| 1555 | Changes to support open ai gym wrapper                                                                                                                                                                                                           | 6th Dev 2021  | [Link](https://github.com/CoderOneHQ/bomberland/releases/tag/build-1523) |
+| 1523 | Forward model bug fixes + unit move blocking on moving to same cell + reset game with a set world and prng seed (See: [Docs](https://www.gocoder.one/docs/api-reference#reset-game))                                                             | 29th Nov 2021 | [Link](https://github.com/CoderOneHQ/bomberland/releases/tag/build-1523) |
+| 1065 | Added `UNITS_PER_AGENT` environment flag (See: [Docs](https://gocoder.one/docs/api-reference#%EF%B8%8F-environment-flags))                                                                                                                       | 9th Oct 2021  | -                                                                        |
+| 974  | Added functionality: <ul><li>Reset the game without restarting engine/containers</li><li>Evaluate next state by the game engine given a state + list of actions</li></ul> See: [Docs](https://gocoder.one/docs/api-reference#-administrator-api) | 18th Sep 2021 | [Link](https://github.com/CoderOneHQ/bomberland/releases/tag/build-974)  |
 
 # Discussion and Questions
 

diff --git a/base-compose.yml b/base-compose.yml
@@ -1,7 +1,7 @@
 version: "3"
 services:
     game-server:
-        image: coderone.azurecr.io/game-server:1523
+        image: coderone.azurecr.io/game-server:1555
         volumes:
             - ./logs:/app/logs
 
@@ -33,6 +33,12 @@ services:
             dockerfile: Dockerfile.fwd.dev
         volumes:
             - ./python3:/app
+    python3-gym-dev:
+        build:
+            context: python3
+            dockerfile: Dockerfile.gym.dev
+        volumes:
+            - ./python3:/app
 
     typescript-agent:
         build:

diff --git a/open-ai-gym-wrapper-compose.yml b/open-ai-gym-wrapper-compose.yml
@@ -0,0 +1,26 @@
+version: "3"
+services:
+    gym:
+        extends:
+            file: base-compose.yml
+            service: python3-gym-dev
+        environment:
+            - FWD_MODEL_CONNECTION_STRING=ws://fwd-server:6969/?role=admin
+        depends_on:
+            - fwd-server
+        networks:
+            - coderone-open-ai-gym-wrapper
+
+    fwd-server:
+        extends:
+            file: base-compose.yml
+            service: game-server
+        environment:
+            - TELEMETRY_ENABLED=0
+            - PORT=6969
+            - WORLD_SEED=1234
+            - PRNG_SEED=1234
+        networks:
+            - coderone-open-ai-gym-wrapper
+networks:
+    coderone-open-ai-gym-wrapper: null
diff --git a/python3/Dockerfile.gym.dev b/python3/Dockerfile.gym.dev
@@ -0,0 +1,6 @@
+FROM python:3.8-bullseye
+
+COPY ./requirements.txt /app/requirements.txt
+WORKDIR /app
+RUN python -m pip install -r requirements.txt
+ENTRYPOINT PYTHONUNBUFFERED=1 python dev_gym.py
diff --git a/python3/README.md b/python3/README.md
@@ -0,0 +1,7 @@
+# Overview
+
+`agent.py` - random agent
+
+`agent_fwd.py` - random agent that connects to forward model
+
+`dev_gym.py` - [open ai gym wrapper](https://gym.openai.com/)
diff --git a/python3/agent.py b/python3/agent.py
@@ -1,3 +1,4 @@
+from typing import Union
 from game_state import GameState
 import asyncio
 import random
@@ -8,11 +9,12 @@
 
 actions = ["up", "down", "left", "right", "bomb", "detonate"]
 
+
 class Agent():
     def __init__(self):
         self._client = GameState(uri)
 
-        ### any initialization code can go here
+        # any initialization code can go here
         self._client.set_game_tick_callback(self._on_game_tick)
 
         loop = asyncio.get_event_loop()
@@ -23,7 +25,7 @@ def __init__(self):
         loop.run_until_complete(asyncio.wait(tasks))
 
     # returns coordinates of the first bomb placed by a unit
-    def _get_bomb_to_detonate(self, unit) -> [int, int] or None:
+    def _get_bomb_to_detonate(self, unit) -> Union[int, int] or None:
         entities = self._client._state.get("entities")
         bombs = list(filter(lambda entity: entity.get(
             "unit_id") == unit and entity.get("type") == "b", entities))
@@ -56,8 +58,10 @@ async def _on_game_tick(self, tick_number, game_state):
             else:
                 print(f"Unhandled action: {action} for unit {unit_id}")
 
+
 def main():
     Agent()
 
+
 if __name__ == "__main__":
     main()
diff --git a/python3/agent_fwd.py b/python3/agent_fwd.py
@@ -1,7 +1,7 @@
+from typing import Union
 from forward_model import ForwardModel
 from game_state import GameState
 import asyncio
-import copy
 import os
 import random
 
@@ -27,7 +27,6 @@ def connect(self):
         loop = asyncio.get_event_loop()
 
         client_connection = loop.run_until_complete(self._client.connect())
-        client_fwd_connection = None
 
         client_fwd_connection = loop.run_until_complete(
             self._client_fwd.connect())
@@ -38,7 +37,7 @@ def connect(self):
             self._client_fwd._handle_messages(client_fwd_connection))
         loop.run_forever()
 
-    def _get_bomb_to_detonate(self, game_state) -> [int, int] or None:
+    def _get_bomb_to_detonate(self, game_state) -> Union[int, int] or None:
         agent_number = game_state.get("connection").get("agent_number")
         entities = self._client._state.get("entities")
         bombs = list(filter(lambda entity: entity.get(

diff --git a/python3/dev_gym.py b/python3/dev_gym.py
@@ -0,0 +1,34 @@
+import asyncio
+from typing import Dict
+from gym import Gym
+import os
+
+fwd_model_uri = os.environ.get(
+    "FWD_MODEL_CONNECTION_STRING") or "ws://127.0.0.1:6969/?role=admin"
+
+mock_6x6_state: Dict = {"agents": {"a": {"agent_id": "a", "unit_ids": ["c", "e", "g"]}, "b": {"agent_id": "b", "unit_ids": ["d", "f", "h"]}}, "unit_state": {"c": {"coordinates": [0, 1], "hp": 3, "inventory": {"bombs": 3}, "blast_diameter": 3, "unit_id": "c", "agent_id": "a", "invulnerability": 0}, "d": {"coordinates": [5, 1], "hp": 3, "inventory": {"bombs": 3}, "blast_diameter": 3, "unit_id": "d", "agent_id": "b", "invulnerability": 0}, "e": {"coordinates": [3, 3], "hp": 3, "inventory": {"bombs": 3}, "blast_diameter": 3, "unit_id": "e", "agent_id": "a", "invulnerability": 0}, "f": {"coordinates": [2, 3], "hp": 3, "inventory": {"bombs": 3}, "blast_diameter": 3, "unit_id": "f", "agent_id": "b", "invulnerability": 0}, "g": {"coordinates": [2, 4], "hp": 3, "inventory": {"bombs": 3}, "blast_diameter": 3, "unit_id": "g", "agent_id": "a", "invulnerability": 0}, "h": {"coordinates": [3, 4], "hp": 3, "inventory": {"bombs": 3}, "blast_diameter": 3, "unit_id": "h", "agent_id": "b", "invulnerability": 0}}, "entities": [
+    {"created": 0, "x": 0, "y": 3, "type": "m"}, {"created": 0, "x": 5, "y": 3, "type": "m"}, {"created": 0, "x": 4, "y": 3, "type": "m"}, {"created": 0, "x": 1, "y": 3, "type": "m"}, {"created": 0, "x": 3, "y": 5, "type": "m"}, {"created": 0, "x": 2, "y": 5, "type": "m"}, {"created": 0, "x": 5, "y": 4, "type": "m"}, {"created": 0, "x": 0, "y": 4, "type": "m"}, {"created": 0, "x": 1, "y": 1, "type": "w", "hp": 1}, {"created": 0, "x": 4, "y": 1, "type": "w", "hp": 1}, {"created": 0, "x": 3, "y": 0, "type": "w", "hp": 1}, {"created": 0, "x": 2, "y": 0, "type": "w", "hp": 1}, {"created": 0, "x": 5, "y": 5, "type": "w", "hp": 1}, {"created": 0, "x": 0, "y": 5, "type": "w", "hp": 1}, {"created": 0, "x": 4, "y": 0, "type": "w", "hp": 1}, {"created": 0, "x": 1, "y": 0, "type": "w", "hp": 1}, {"created": 0, "x": 5, "y": 0, "type": "w", "hp": 1}, {"created": 0, "x": 0, "y": 0, "type": "w", "hp": 1}], "world": {"width": 6, "height": 6}, "tick": 0, "config": {"tick_rate_hz": 10, "game_duration_ticks": 300, "fire_spawn_interval_ticks": 2}}
+
+
+def calculate_reward(state: Dict):
+    # custom reward function
+    return 1
+
+
+async def main():
+    gym = Gym(fwd_model_uri)
+    await gym.connect()
+    env = gym.make("bomberland-open-ai-gym", mock_6x6_state)
+    for i_ in range(1000):
+        actions = []
+        observation, done, info = await env.step(actions)
+        reward = calculate_reward(observation)
+
+        print(f"reward: {reward} done: {done} info: {info}")
+        if done:
+            await env.reset()
+    await gym.close()
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
diff --git a/python3/forward_model.py b/python3/forward_model.py
@@ -1,19 +1,23 @@
-import asyncio
+from typing import Dict, List
 import websockets
 import json
-import copy
 
 
 class ForwardModel:
     def __init__(self, connection_string: str):
         self._connection_string = connection_string
         self._next_state_callback = None
+        self.connection = None
+
+    async def close(self):
+        if self.connection is not None:
+            await self.connection.close()
 
     def set_next_state_callback(self, next_state_callback):
         self._next_state_callback = next_state_callback
 
     async def connect(self):
-        self.connection = await websockets.client.connect(self._connection_string)
+        self.connection = await websockets.connect(self._connection_string)
         if self.connection.open:
             return self.connection
 
@@ -36,6 +40,9 @@ async def _on_data(self, data):
         elif data_type == "next_game_state":
             payload = data.get("payload")
             await self._on_next_state(payload)
+        elif data_type == "game_state":
+            # no-op
+            return
         else:
             print(f"unknown packet \"{data_type}\": {data}")
 
@@ -60,7 +67,7 @@ async def _on_next_state(self, payload):
     next_state call since payloads can come back in any order
     It should ideally be unique
     """
-    async def send_next_state(self, sequence_id, game_state, actions):
+    async def send_next_state(self, sequence_id: int, game_state: Dict, actions: List[Dict]):
         game_state.pop("connection", None)
         packet = {"actions": actions,
                   "type": "evaluate_next_state", "state": game_state, "sequence_id": sequence_id}

diff --git a/python3/game_state.py b/python3/game_state.py
@@ -1,7 +1,10 @@
 import asyncio
+from typing import Union
 import websockets
 import json
 
+from websockets.client import WebSocketClientProtocol
+
 _move_set = set(("up", "down", "left", "right"))
 
 
@@ -15,7 +18,7 @@ def set_game_tick_callback(self, generate_agent_action_callback):
         self._tick_callback = generate_agent_action_callback
 
     async def connect(self):
-        self.connection = await websockets.client.connect(self._connection_string)
+        self.connection = await websockets.connect(self._connection_string)
         if self.connection.open:
             return self.connection
 
@@ -32,10 +35,11 @@ async def send_bomb(self, unit_id: str):
         await self._send(packet)
 
     async def send_detonate(self, x, y, unit_id: str):
-        packet = {"type": "detonate", "coordinates": [x, y], "unit_id": unit_id}
+        packet = {"type": "detonate", "coordinates": [
+            x, y], "unit_id": unit_id}
         await self._send(packet)
 
-    async def _handle_messages(self, connection: str):
+    async def _handle_messages(self, connection: WebSocketClientProtocol):
         while True:
             try:
                 raw_data = await connection.recv()
@@ -138,7 +142,7 @@ def _on_unit_action(self, action_packet):
         else:
             print(f"Unhandled agent action recieved: {action_type}")
 
-    def _get_new_unit_coordinates(self, coordinates, move_action) -> [int, int]:
+    def _get_new_unit_coordinates(self, coordinates, move_action) -> Union[int, int]:
         [x, y] = coordinates
         if move_action == "up":
             return [x, y+1]

diff --git a/python3/gym.py b/python3/gym.py
@@ -0,0 +1,70 @@
+import asyncio
+import json
+from typing import Callable, Dict, List
+
+import websockets
+from forward_model import ForwardModel
+
+
+class GymEnv():
+    def __init__(self, fwd_model: ForwardModel, channel: int, initial_state: Dict, send_next_state: Callable[[Dict, List[Dict],  int], Dict]):
+        self._state = initial_state
+        self._initial_state = initial_state
+        self._fwd = fwd_model
+        self._channel = channel
+        self._send = send_next_state
+
+    async def reset(self):
+        self._state = self._initial_state
+        print("Resetting")
+
+    async def step(self, actions):
+        state = await self._send(self._state, actions, self._channel)
+        self._state = state.get("next_state")
+        return [state.get("next_state"), state.get("is_complete"), state.get("tick_result").get("events")]
+
+
+class Gym():
+    def __init__(self, fwd_model_uri: str):
+        self._client_fwd = ForwardModel(fwd_model_uri)
+        self._channel_counter = 0
+        self._channel_is_busy_status: Dict[int, bool] = {}
+        self._channel_buffer: Dict[int, Dict] = {}
+        self._client_fwd.set_next_state_callback(self._on_next_game_state)
+        self._environments: Dict[str, GymEnv] = {}
+
+    async def connect(self):
+        loop = asyncio.get_event_loop()
+
+        client_fwd_connection = await self._client_fwd.connect()
+
+        loop = asyncio.get_event_loop()
+        loop.create_task(
+            self._client_fwd._handle_messages(client_fwd_connection))
+
+    async def close(self):
+        await self._client_fwd.close()
+
+    async def _on_next_game_state(self, state):
+        channel = state.get("sequence_id")
+        self._channel_is_busy_status[channel] = False
+        self._channel_buffer[channel] = state
+
+    def make(self, name: str, initial_state: Dict) -> GymEnv:
+        if self._environments.get(name) is not None:
+            raise Exception(
+                f"environment \"{name}\" has already been instantiated")
+        self._environments[name] = GymEnv(
+            self._client_fwd, self._channel_counter,  initial_state, self._send_next_state)
+        self._channel_counter += 1
+        return self._environments[name]
+
+    async def _send_next_state(self, state, actions, channel: int):
+        self._channel_is_busy_status[channel] = True
+        await self._client_fwd.send_next_state(channel, state, actions)
+        while self._channel_is_busy_status[channel] == True:
+            # TODO figure out why packets are not received without some sleep
+            await asyncio.sleep(0.0001)
+        result = self._channel_buffer[channel]
+        del self._channel_buffer[channel]
+        return result
diff --git a/python3/requirements.txt b/python3/requirements.txt
@@ -1,2 +1,2 @@
 asyncio==3.4.3
-websockets==8.1
+websockets==10.1