Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gymnasium Integration #789

Merged
merged 29 commits into from
Feb 3, 2023
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
10cc606
Replaced Gym with Gymnasium
Markus28 Dec 29, 2022
4e827c1
Updated docs
Markus28 Dec 29, 2022
eaeb4cc
Added type ignores for gym wrappers
Markus28 Jan 3, 2023
125069a
Changed envpool from gym to gymnasium
Markus28 Jan 3, 2023
201f0cd
Require envpool>=0.7.0
Markus28 Jan 3, 2023
e1d0c0e
Fixed MyTestEnv, removed __init__ that is unnecessary due to Gymnasium
Markus28 Jan 3, 2023
5f0833a
Removed dead code that was necessary for old step API
Markus28 Jan 3, 2023
bea1559
Removed type hints about old step API
Markus28 Jan 3, 2023
3fc8c78
Removed check for whether environments return info upon reset
Markus28 Jan 3, 2023
9f2f104
Increase required version of PZ, fix some CI issues
Markus28 Jan 4, 2023
1ea1b90
Added dummy info to reset of FiniteVectorEnv
Markus28 Jan 4, 2023
9c9a70b
Fix log method to take terminated and truncated, update PettingZooEnv…
Markus28 Jan 4, 2023
28e5cb1
Made some code more explicit (removed hack for compatibility with old…
Markus28 Jan 4, 2023
6be6f41
Fix FiniteVectorEnv
Markus28 Jan 4, 2023
803f543
Fixed some type hints
Markus28 Jan 4, 2023
54e922f
Try to fix FiniteVectorEnv
Markus28 Jan 5, 2023
d9a1feb
Skip NNI test, remove commented out code
Markus28 Jan 5, 2023
54515f7
Fixed type errors
Markus28 Jan 5, 2023
bfca5a9
Disclaimer in README
Markus28 Jan 5, 2023
bc379b0
Put type ignore in the right places
Markus28 Jan 5, 2023
266c64a
Also allow OpenAI gym environments, fixed documentation
Jan 20, 2023
565c2e9
Also allow PettingZooEnv in vector environment, fixed type hint
Markus28 Jan 20, 2023
1402f1e
Fixed import of PettingZooEnv
Markus28 Jan 21, 2023
2a5ff31
Fixed type hinting, updated README
Markus28 Jan 21, 2023
040d2af
Updated documentation about ReplayBuffer
Markus28 Jan 21, 2023
28475f4
Fixed gymnasium version, added shimmy to dev requirements
Jan 23, 2023
2be8591
Added test for conversion of OpenAI Gym environments
Jan 23, 2023
654f9d8
fix spelling
Trinkle23897 Jan 26, 2023
363250d
Merge branch 'master' into gymnasium_integration
Trinkle23897 Feb 3, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 23 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@

[![PyPI](https://img.shields.io/pypi/v/tianshou)](https://pypi.org/project/tianshou/) [![Conda](https://img.shields.io/conda/vn/conda-forge/tianshou)](https://github.com/conda-forge/tianshou-feedstock) [![Read the Docs](https://img.shields.io/readthedocs/tianshou)](https://tianshou.readthedocs.io/en/master) [![Read the Docs](https://img.shields.io/readthedocs/tianshou-docs-zh-cn?label=%E4%B8%AD%E6%96%87%E6%96%87%E6%A1%A3)](https://tianshou.readthedocs.io/zh/master/) [![Unittest](https://github.com/thu-ml/tianshou/workflows/Unittest/badge.svg?branch=master)](https://github.com/thu-ml/tianshou/actions) [![codecov](https://img.shields.io/codecov/c/gh/thu-ml/tianshou)](https://codecov.io/gh/thu-ml/tianshou) [![GitHub issues](https://img.shields.io/github/issues/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/issues) [![GitHub stars](https://img.shields.io/github/stars/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/stargazers) [![GitHub forks](https://img.shields.io/github/forks/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/network) [![GitHub license](https://img.shields.io/github/license/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/blob/master/LICENSE)

> ⚠️️ **Transition to Gymnasium**: The maintainers of OpenAI Gym have recently released [Gymnasium](http://github.com/Farama-Foundation/Gymnasium),
> which is where future maintenance of OpenAI Gym will be taking place.
> Tianshou no longer supports OpenAI Gym environments and has transitioned to Gymnasium.
> If you would like to run Tianshou with legacy environments, you may use [Shimmy](https://github.com/Farama-Foundation/Shimmy)
> as a compatibility layer.

**Tianshou** ([天授](https://baike.baidu.com/item/%E5%A4%A9%E6%8E%88)) is a reinforcement learning platform based on pure PyTorch. Unlike existing reinforcement learning libraries, which are mainly based on TensorFlow, have many nested classes, unfriendly API, or slow-speed, Tianshou provides a fast-speed modularized framework and pythonic API for building the deep reinforcement learning agent with the least number of lines of code. The supported interface algorithms currently include:

- [Deep Q-Network (DQN)](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf)
Expand Down Expand Up @@ -105,21 +111,21 @@ The example scripts are under [test/](https://github.com/thu-ml/tianshou/blob/ma

### Comprehensive Functionality

| RL Platform | GitHub Stars | # of Alg. <sup>(1)</sup> | Custom Env | Batch Training | RNN Support | Nested Observation | Backend |
| ------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------ | --------------------------- | --------------------------------- | ------------------ | ------------------ | ---------- |
| [Baselines](https://github.com/openai/baselines) | [![GitHub stars](https://img.shields.io/github/stars/openai/baselines)](https://github.com/openai/baselines/stargazers) | 9 | :heavy_check_mark: (gym) | :heavy_minus_sign: <sup>(2)</sup> | :heavy_check_mark: | :x: | TF1 |
| [Stable-Baselines](https://github.com/hill-a/stable-baselines) | [![GitHub stars](https://img.shields.io/github/stars/hill-a/stable-baselines)](https://github.com/hill-a/stable-baselines/stargazers) | 11 | :heavy_check_mark: (gym) | :heavy_minus_sign: <sup>(2)</sup> | :heavy_check_mark: | :x: | TF1 |
| [Stable-Baselines3](https://github.com/DLR-RM/stable-baselines3) | [![GitHub stars](https://img.shields.io/github/stars/DLR-RM/stable-baselines3)](https://github.com/DLR-RM/stable-baselines3/stargazers) | 7<sup> (3)</sup> | :heavy_check_mark: (gym) | :heavy_minus_sign: <sup>(2)</sup> | :x: | :heavy_check_mark: | PyTorch |
| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) | [![GitHub stars](https://img.shields.io/github/stars/ray-project/ray)](https://github.com/ray-project/ray/stargazers) | 16 | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | TF/PyTorch |
| [SpinningUp](https://github.com/openai/spinningup) | [![GitHub stars](https://img.shields.io/github/stars/openai/spinningup)](https://github.com/openai/spinningupstargazers) | 6 | :heavy_check_mark: (gym) | :heavy_minus_sign: <sup>(2)</sup> | :x: | :x: | PyTorch |
| [Dopamine](https://github.com/google/dopamine) | [![GitHub stars](https://img.shields.io/github/stars/google/dopamine)](https://github.com/google/dopamine/stargazers) | 7 | :x: | :x: | :x: | :x: | TF/JAX |
| [ACME](https://github.com/deepmind/acme) | [![GitHub stars](https://img.shields.io/github/stars/deepmind/acme)](https://github.com/deepmind/acme/stargazers) | 14 | :heavy_check_mark: (dm_env) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | TF/JAX |
| [keras-rl](https://github.com/keras-rl/keras-rl) | [![GitHub stars](https://img.shields.io/github/stars/keras-rl/keras-rl)](https://github.com/keras-rl/keras-rlstargazers) | 7 | :heavy_check_mark: (gym) | :x: | :x: | :x: | Keras |
| [rlpyt](https://github.com/astooke/rlpyt) | [![GitHub stars](https://img.shields.io/github/stars/astooke/rlpyt)](https://github.com/astooke/rlpyt/stargazers) | 11 | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | PyTorch |
| [ChainerRL](https://github.com/chainer/chainerrl) | [![GitHub stars](https://img.shields.io/github/stars/chainer/chainerrl)](https://github.com/chainer/chainerrl/stargazers) | 18 | :heavy_check_mark: (gym) | :heavy_check_mark: | :heavy_check_mark: | :x: | Chainer |
| [Sample Factory](https://github.com/alex-petrenko/sample-factory) | [![GitHub stars](https://img.shields.io/github/stars/alex-petrenko/sample-factory)](https://github.com/alex-petrenko/sample-factory/stargazers) | 1<sup> (4)</sup> | :heavy_check_mark: (gym) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | PyTorch |
| | | | | | | | |
| [Tianshou](https://github.com/thu-ml/tianshou) | [![GitHub stars](https://img.shields.io/github/stars/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/stargazers) | 20 | :heavy_check_mark: (gym) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | PyTorch |
| RL Platform | GitHub Stars | # of Alg. <sup>(1)</sup> | Custom Env | Batch Training | RNN Support | Nested Observation | Backend |
| ------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------ |--------------------------------| --------------------------------- | ------------------ | ------------------ | ---------- |
| [Baselines](https://github.com/openai/baselines) | [![GitHub stars](https://img.shields.io/github/stars/openai/baselines)](https://github.com/openai/baselines/stargazers) | 9 | :heavy_check_mark: (gym) | :heavy_minus_sign: <sup>(2)</sup> | :heavy_check_mark: | :x: | TF1 |
| [Stable-Baselines](https://github.com/hill-a/stable-baselines) | [![GitHub stars](https://img.shields.io/github/stars/hill-a/stable-baselines)](https://github.com/hill-a/stable-baselines/stargazers) | 11 | :heavy_check_mark: (gym) | :heavy_minus_sign: <sup>(2)</sup> | :heavy_check_mark: | :x: | TF1 |
| [Stable-Baselines3](https://github.com/DLR-RM/stable-baselines3) | [![GitHub stars](https://img.shields.io/github/stars/DLR-RM/stable-baselines3)](https://github.com/DLR-RM/stable-baselines3/stargazers) | 7<sup> (3)</sup> | :heavy_check_mark: (gym) | :heavy_minus_sign: <sup>(2)</sup> | :x: | :heavy_check_mark: | PyTorch |
| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) | [![GitHub stars](https://img.shields.io/github/stars/ray-project/ray)](https://github.com/ray-project/ray/stargazers) | 16 | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | TF/PyTorch |
| [SpinningUp](https://github.com/openai/spinningup) | [![GitHub stars](https://img.shields.io/github/stars/openai/spinningup)](https://github.com/openai/spinningupstargazers) | 6 | :heavy_check_mark: (gym) | :heavy_minus_sign: <sup>(2)</sup> | :x: | :x: | PyTorch |
| [Dopamine](https://github.com/google/dopamine) | [![GitHub stars](https://img.shields.io/github/stars/google/dopamine)](https://github.com/google/dopamine/stargazers) | 7 | :x: | :x: | :x: | :x: | TF/JAX |
| [ACME](https://github.com/deepmind/acme) | [![GitHub stars](https://img.shields.io/github/stars/deepmind/acme)](https://github.com/deepmind/acme/stargazers) | 14 | :heavy_check_mark: (dm_env) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | TF/JAX |
| [keras-rl](https://github.com/keras-rl/keras-rl) | [![GitHub stars](https://img.shields.io/github/stars/keras-rl/keras-rl)](https://github.com/keras-rl/keras-rlstargazers) | 7 | :heavy_check_mark: (gym) | :x: | :x: | :x: | Keras |
| [rlpyt](https://github.com/astooke/rlpyt) | [![GitHub stars](https://img.shields.io/github/stars/astooke/rlpyt)](https://github.com/astooke/rlpyt/stargazers) | 11 | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | PyTorch |
| [ChainerRL](https://github.com/chainer/chainerrl) | [![GitHub stars](https://img.shields.io/github/stars/chainer/chainerrl)](https://github.com/chainer/chainerrl/stargazers) | 18 | :heavy_check_mark: (gym) | :heavy_check_mark: | :heavy_check_mark: | :x: | Chainer |
| [Sample Factory](https://github.com/alex-petrenko/sample-factory) | [![GitHub stars](https://img.shields.io/github/stars/alex-petrenko/sample-factory)](https://github.com/alex-petrenko/sample-factory/stargazers) | 1<sup> (4)</sup> | :heavy_check_mark: (gym) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | PyTorch |
| | | | | | | | |
| [Tianshou](https://github.com/thu-ml/tianshou) | [![GitHub stars](https://img.shields.io/github/stars/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/stargazers) | 20 | :heavy_check_mark: (Gymnasium) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | PyTorch |

<sup>(1): access date: 2021-08-08</sup>

Expand Down Expand Up @@ -175,7 +181,8 @@ This is an example of Deep Q Network. You can also run the full script at [test/
First, import some relevant packages:

```python
import gym, torch, numpy as np, torch.nn as nn
import gymnasium as gym
import torch, numpy as np, torch.nn as nn
from torch.utils.tensorboard import SummaryWriter
import tianshou as ts
```
Expand Down
4 changes: 2 additions & 2 deletions docs/tutorials/cheatsheet.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Cheat Sheet
This page shows some code snippets of how to use Tianshou to develop new
algorithms / apply algorithms to new scenarios.

By the way, some of these issues can be resolved by using a ``gym.Wrapper``.
By the way, some of these issues can be resolved by using a ``gymnasium.Wrapper``.
It could be a universal solution in the policy-environment interaction. But
you can also use the batch processor :ref:`preprocess_fn` or vectorized
environment wrapper :class:`~tianshou.env.VectorEnvWrapper`.
Expand Down Expand Up @@ -159,7 +159,7 @@ toy_text and classic_control environments. For more information, please refer to
# install envpool: pip3 install envpool

import envpool
envs = envpool.make_gym("CartPole-v0", num_envs=10)
envs = envpool.make_gymnasium("CartPole-v0", num_envs=10)
collector = Collector(policy, envs, buffer)

Here are some other `examples <https://github.com/sail-sg/envpool/tree/master/examples/tianshou_examples>`_.
Expand Down
8 changes: 4 additions & 4 deletions docs/tutorials/dqn.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,10 @@ Here is the overall system:
Make an Environment
-------------------

First of all, you have to make an environment for your agent to interact with. You can use ``gym.make(environment_name)`` to make an environment for your agent. For environment interfaces, we follow the convention of `OpenAI Gym <https://github.com/openai/gym>`_. In your Python code, simply import Tianshou and make the environment:
First of all, you have to make an environment for your agent to interact with. You can use ``gym.make(environment_name)`` to make an environment for your agent. For environment interfaces, we follow the convention of `Gymnasium <https://github.com/Farama-Foundation/Gymnasium>`_. In your Python code, simply import Tianshou and make the environment:
::

import gym
import gymnasium as gym
import tianshou as ts

env = gym.make('CartPole-v0')
Expand Down Expand Up @@ -84,8 +84,8 @@ You can also try the super-fast vectorized environment `EnvPool <https://github.
::

import envpool
train_envs = envpool.make_gym("CartPole-v0", num_envs=10)
test_envs = envpool.make_gym("CartPole-v0", num_envs=100)
train_envs = envpool.make_gymnasium("CartPole-v0", num_envs=10)
test_envs = envpool.make_gymnasium("CartPole-v0", num_envs=100)

For the demonstration, here we use the second code-block.

Expand Down
6 changes: 3 additions & 3 deletions docs/tutorials/tictactoe.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,15 +62,15 @@ The observation variable ``obs`` returned from the environment is a ``dict``, wi

.. note::

There is no special formulation of ``mask`` either in discrete action space or in continuous action space. You can also use some action spaces like ``gym.spaces.Discrete`` or ``gym.spaces.Box`` to represent the available action space. Currently, we use a boolean array.
There is no special formulation of ``mask`` either in discrete action space or in continuous action space. You can also use some action spaces like ``gymnasium.spaces.Discrete`` or ``gymnasium.spaces.Box`` to represent the available action space. Currently, we use a boolean array.

Let's play two steps to have an intuitive understanding of the environment.

::

>>> import numpy as np
>>> action = 0 # action is either an integer, or an np.ndarray with one element
>>> obs, reward, done, info = env.step(action) # the env.step follows the api of OpenAI Gym
>>> obs, reward, done, info = env.step(action) # the env.step follows the api of Gymnasium
>>> print(obs) # notice the change in the observation
{'agent_id': 'player_2', 'obs': array([[[0, 1],
[0, 0],
Expand Down Expand Up @@ -185,7 +185,7 @@ So let's start to train our Tic-Tac-Toe agent! First, import some required modul
from copy import deepcopy
from typing import Optional, Tuple

import gym
import gymnasium as gym
import numpy as np
import torch
from pettingzoo.classic import tictactoe_v3
Expand Down
6 changes: 3 additions & 3 deletions examples/atari/atari_wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from collections import deque

import cv2
import gym
import gymnasium as gym
import numpy as np

from tianshou.env import ShmemVectorEnv
Expand Down Expand Up @@ -324,15 +324,15 @@ def make_atari_env(task, seed, training_num, test_num, **kwargs):
"please set `x = x / 255.0` inside CNN network's forward function."
)
# parameters convertion
train_envs = env = envpool.make_gym(
train_envs = env = envpool.make_gymnasium(
task.replace("NoFrameskip-v4", "-v5"),
num_envs=training_num,
seed=seed,
episodic_life=True,
reward_clip=True,
stack_num=kwargs.get("frame_stack", 4),
)
test_envs = envpool.make_gym(
test_envs = envpool.make_gymnasium(
task.replace("NoFrameskip-v4", "-v5"),
num_envs=test_num,
seed=seed,
Expand Down
2 changes: 1 addition & 1 deletion examples/box2d/acrobot_dualdqn.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import os
import pprint

import gym
import gymnasium as gym
import numpy as np
import torch
from torch.utils.tensorboard import SummaryWriter
Expand Down
2 changes: 1 addition & 1 deletion examples/box2d/bipedal_bdq.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import os
import pprint

import gym
import gymnasium as gym
import numpy as np
import torch
from torch.utils.tensorboard import SummaryWriter
Expand Down
2 changes: 1 addition & 1 deletion examples/box2d/bipedal_hardcore_sac.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import os
import pprint

import gym
import gymnasium as gym
import numpy as np
import torch
from torch.utils.tensorboard import SummaryWriter
Expand Down
2 changes: 1 addition & 1 deletion examples/box2d/lunarlander_dqn.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import os
import pprint

import gym
import gymnasium as gym
import numpy as np
import torch
from torch.utils.tensorboard import SummaryWriter
Expand Down
2 changes: 1 addition & 1 deletion examples/box2d/mcc_sac.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import os
import pprint

import gym
import gymnasium as gym
import numpy as np
import torch
from torch.utils.tensorboard import SummaryWriter
Expand Down
2 changes: 1 addition & 1 deletion examples/inverse/irl_gail.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import pprint

import d4rl
import gym
import gymnasium as gym
import numpy as np
import torch
from torch import nn
Expand Down
2 changes: 1 addition & 1 deletion examples/mujoco/fetch_her_ddpg.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import os
import pprint

import gym
import gymnasium as gym
import numpy as np
import torch
import wandb
Expand Down
Loading