Skip to content

Commit

Permalink
polish(xjx): 1.0 (opendilab#143)
Browse files Browse the repository at this point in the history
* polish(xjx): change doc structure, add intro

* Replace head image

* polish(xjx): system, basic, middleware (opendilab#117)

* Add middleware

* Add system desigin

* Add middleware spec

* Change search background

* Add quick start

* Use pytorch theme

* Adjusting grammar and errors

* New logo

* doc(hansbug): add guides for unittest, visualization and code style. (opendilab#127)

* doc(hansbug): add 3 new pages

* doc(hansbug): add code style page

* dev(hansbug): add plantuml's documentation

* dev(hansbug): add note

* dev(hansbug): align the image to center

* dev(hansbug): add graphviz's documentation

* dev(hansbug): add documentation for draw.io

* dev(hansbug): fix problem on draw.io

* dev(hansbug): add introduction for snakeviz

* dev(hansbug): add introduction for snakeviz

* dev(hansbug): add former parts of unittest

* fix(hansbug): do some fix

* dev(hansbug): add writing guide for unittest

* fix(hansbug): fix bug of

* dev(hansbug): add running guide for unittest

* fix(hansbug): fix the last code block

* dev(hansbug): append features to visualization

* dev(hansbug): add code style guide

* dev(hansbug): move the docs to new path

* fix(hansbug): fix the problems in chinese pages

* fix(hansbug): use english tutorials

* feature(pu): add config_spec_zh, basic_rl_zh, exploration_rl_zh, imitation_learning_zh (opendilab#126)

* feature(pu): add config_spec_zh

* feature(pu): add config_spec_zh, basic_rl_zh, exploration_rl_zh, imitation_learning_zh

* polish(pu):polish index

* polish(pu): polish style

* polish(zlx): 24-cooperation (opendilab#122)

* polish(zlx): Init 24-cooperation(git + issue/pr)

* polish(zlx): Add git_guide and issue_pr

* fix(zlx): fix comments by xjx

* feature(zlx): Add en version of 24-cooperation

* polish(zlx): fix comments by xjx

* polish(zjow): polish and revise quickstart and installation. (opendilab#121)

* Polish Quickstart.

* Minor change.

* Minor change.

* polish(zlx): 13. envs (opendilab#118)

* polish(zlx): Move env to new place. Polish index and images

* polish(zlx): Modify image scale

* polish(zlx): Add space in zh version

* polish(zlx): Move env to new place. Polish index and images

* polish(zlx): Modify image scale

* fixbug(zlx): en index indent

* polish(zlx): polish format via make live

* polish(zlx): fix comments by xjx

Co-authored-by: zhaoliangxuan <[email protected]>

* doc(nyz): add distributed rl overview (opendilab#133)

* doc(nyz): add distributed rl overview

* polish(nyz): polish footnote and note

* doc(davide): transfer 12 policies (opendilab#120)

* filled index en and ch

* Update index.rst

* added dqn_zh in index

* doc(zms): 11_dizoo: add zh + en version of index (opendilab#130)

* 1st zh doc

* change

* change links

* add note

* draft version of en dizoo

* change a bit

* final version

* Update index.rst

* polish(nyz): add missing images and polish doc

* doc(lxl): 02_algo: add offline rl zh (opendilab#125)

* polish(lxl): fix gramma and typo

* resolve conflicts when changing branchs

* doc(lxl): add 02_algo/offline_rl_zh draft

* add offline rl doc

* polish offline rl doc

* polish offline rl

* polish offline rl: reformat reference

* polish offline rl: fix typo

* doc(jrn): add 02_algo model_based_rl_zh (opendilab#128)

* doc(jrn): add 02_algo mbrl

* doc(jrn): add 02_algo mbrl

* doc(jrn): modify 02_algo mbrl

* doc(jrn): modify 02_algo mbrl

* doc(jrn): polish 02_algo mbrl zh

* modify(jrn): polish source/02_algo/model_based_rl_zh.rst

* polish model_based_rl_zh.rst again

* polish model_based_rl_zh.rst again

* doc(jrn): add 02_algo mbrl

* doc(jrn): add 02_algo mbrl

* doc(jrn): modify 02_algo mbrl

* doc(jrn): modify 02_algo mbrl

* doc(jrn): polish 02_algo mbrl zh

* modify(jrn): polish source/02_algo/model_based_rl_zh.rst

* polish model_based_rl_zh.rst again

* polish(zlx): 24-cooperation (opendilab#122)

* polish(zlx): Init 24-cooperation(git + issue/pr)

* polish(zlx): Add git_guide and issue_pr

* fix(zlx): fix comments by xjx

* feature(zlx): Add en version of 24-cooperation

* polish(zlx): fix comments by xjx

* polish model_based_rl_zh.rst again

* polish(zjow): polish and revise quickstart and installation. (opendilab#121)

* Polish Quickstart.

* Minor change.

* Minor change.

* polish(nyz): add offline rl and gtrxl images

* doc(pu):  add exploration overview and footnote for exploration_rl_zh (opendilab#134)

* feature(pu): add config_spec_zh

* feature(pu): add config_spec_zh, basic_rl_zh, exploration_rl_zh, imitation_learning_zh

* polish(pu):polish index

* polish(pu): polish style

* polish(pu): polish style

* polish(pu): add exploration overview and footnote

* fix(pu): fix wrongly changed file

* polish(pu): add information-theory-based exloration part

* trasnlate(zlx): Integrate past translation prs (opendilab#135)

* translate(nyp): diayn zh

* doc(py): ngu zh

* translate(gh): smac zh

* translate(gh): icm zh

* translate(cy): cartpole & gym-hybrid en

* translate(xzy): minigrd & pendulum en

* translate(zyc&yf): bipedalwalker & lunarlander en

* translate(hs): mujoco & procgen en

* translate(hs): r2d3 en

* polish(zlx): remove .. _ in 13_envs

* polish(zlx): polish format

* doc(wyh): algo02 MARL docs (opendilab#129)

* doc(wyh):marl

* doc(wyh):marl polish

* translation(lxl): add offline_rl_en & polish offline_rl_zh (opendilab#136)

* fix(nyz): fix offline rl author typo

* polish(zlx): polish mujoco & r2d3 by hs, which are ignored before (opendilab#138)

* doc(zms): add comments to "framework/middleware" (opendilab#137)

* add the refs to comments of "framework/middleware"

* change maxdepth of framework/index.rst from 4 to 2

* polish(lxl): polish offline_rl_zh, fix typo and gramma (opendilab#139)

* add offline_rl_en

* reorganize the description of Future & Outlooks

* polish

* doc(nyp): add best practice zh for our doc 1.0 (opendilab#132)

* doc(wzl): add pettingzoo.zh doc (opendilab#124)

* add pettingzoo_zh.rst

* update pettingzoo_zh.rst

* fix(hs):fix install atari_env error (opendilab#116)

* fix install atari_env error

* Update atari.rst

* Update atari_zh.rst

* add best practice zh for doc 1.0

* add rnn translation

* finish rnn; fix some translations (wrappers)

* modify regarding the comments

* change wrt comment

* modify unroll_len/sequence_len key

* fix multi-discrete action space

Co-authored-by: zerlinwang <[email protected]>
Co-authored-by: norman <[email protected]>
Co-authored-by: nieyunpeng <[email protected]>

* Cleanup old resources

* Space

* Fix offline rl

Co-authored-by: Hankson Bradley <[email protected]>
Co-authored-by: 蒲源 <[email protected]>
Co-authored-by: LuciusMos <[email protected]>
Co-authored-by: zjowowen <[email protected]>
Co-authored-by: zhaoliangxuan <[email protected]>
Co-authored-by: Swain <[email protected]>
Co-authored-by: Davide Liu <[email protected]>
Co-authored-by: zms <[email protected]>
Co-authored-by: lixl-st <[email protected]>
Co-authored-by: Jia Ruonan <[email protected]>
Co-authored-by: Weiyuhong-1998 <[email protected]>
Co-authored-by: Will-Nie <[email protected]>
Co-authored-by: zerlinwang <[email protected]>
Co-authored-by: norman <[email protected]>
Co-authored-by: nieyunpeng <[email protected]>
  • Loading branch information
16 people authored May 30, 2022
1 parent 6a7becd commit df02585
Show file tree
Hide file tree
Showing 775 changed files with 9,970 additions and 13,178 deletions.
9 changes: 5 additions & 4 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
*.eps
*.jpg
*.svg
*.puml.eps
*.puml.jpg
*.puml.svg
.DS_Store
build/
source/_build
_build/
.vscode/
venv/
.idea/
.idea/
src/pytorch-sphinx-theme/
3 changes: 2 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
Pillow==8.2.0
sphinx>=2.2.1,<=4.2
sphinx_rtd_theme~=0.4.3
sphinx_rtd_theme
enum_tools
sphinx-toolbox
plantumlcli>=0.0.2
sphinx-autobuild
git+http://github.com/opendilab/DI-engine@main
-e git+https://github.com/opendilab/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme
37 changes: 37 additions & 0 deletions source/00_intro/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
Introduction
===============================

What is DI-engine?
-------------------------------

DI-engine is a decision intelligence platform built by a group of enthusiastic researchers and engineers, \
that will provide you with the most professional and convenient assistance for your reinforcement learning algorithm research \
and development work, mainly including:

1. Comprehensive algorithm support, such as DQN, PPO, SAC, and many related algorithms for research subfields - \
QMIX for multi-intelligent reinforcement learning, GAIL for inverse reinforcement learning, RND for exploration problems, etc.

2. User-friendly interface, we abstract most common objects in reinforcement learning tasks, such as environments, policies, \
and encapsulate complex reinforcement learning processes into middleware, allowing you to build your own learning process as you wish.

3. Flexible scalability, using the integrated messaging components and event programming interfaces within the framework, \
you can flexibly scale your basic research work to industrial-grade large-scale training clusters, \
such as StarCraft Intelligence `DI-star <https://github.com/opendilab/DI-star>`_.

.. image::
../images/system_layer.png

Key Concepts
-------------------------------

If you are not familiar with reinforcement learning, you can go to our `reinforcement learning tutorial <../10_concepts/index_zh.html>`_ \
for a glimpse into the wonderful world of reinforcement learning.

If you have already been exposed to reinforcement learning, you will already be familiar with the basic interaction objects of reinforcement learning: \
**environments** and **agents (or the policies that make them up)**.

Instead of creating more concepts, the DI-engine abstracts the complex interaction logic between the two into declarative middleware, \
such as **collect**, **train**, **evaluate**, and **save_ckpt**. You can adapt each part of the process in the most natural way.

Using the DI-engine will be very easy, in the `quickstart <... /01_quickstart/index_zh.html>`_, \
we will show you how to quickly build a classic reinforcement learning process using DI-engine with a simple example.
28 changes: 28 additions & 0 deletions source/00_intro/index_zh.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
DI-engine 简介
===============================

了解 DI-engine
-------------------------------

DI-engine 是由一群充满活力的研究员和工程师打造的决策智能平台,它将为您的强化学习算法研究和开发工作提供最专业最便捷的帮助,主要包括:

1. 完整的算法支持,例如 DQN,PPO,SAC 以及许多研究子领域的相关算法——多智能体强化学习中的 QMIX,逆强化学习中的 GAIL,探索问题中的 RND 等等。

2. 友好的用户接口,我们抽象了强化学习任务中的大部分常见对象,例如环境,策略,并将复杂的强化学习流程封装成丰富的中间件,让您随心所欲的构建自己的学习流程。

3. 弹性的拓展能力,利用框架内集成的消息组件和事件编程接口,您可以灵活的将基础研究工作拓展到工业级大规模训练集群中,例如星际争霸智能体 `DI-star <https://github.com/opendilab/DI-star>`_。

.. image::
../images/system_layer.png

核心概念
-------------------------------

假如您尚未了解强化学习,可以转至我们的 `强化学习教程 <../10_concepts/index_zh.html>`_ 一窥强化学习的奇妙世界。

假如您已经接触过强化学习,想必已经非常了解强化学习的基本交互对象: **环境** 和 **智能体(或者构成智能体的策略)**。

DI-engine 没有创造更多的概念,而是将这两者之间复杂的交互逻辑抽象成了声明式的中间件,例如 **采集数据(collect)**,**训练模型(train)**,**评估模型(evaluate)**,**保存模型(save_ckpt)**,
您可以依据最自然的方式调整流程中的各个环节。

使用 DI-engine 将会非常简单,在 `快速开始 <../01_quickstart/index_zh.html>`_ 部分,我们将通过一个简单的例子向您介绍,如何使用 DI-engine 快速搭建一个经典的强化学习流程。
109 changes: 109 additions & 0 deletions source/01_quickstart/first_rl_program.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
First Reinforcement Learning Program
======================================

.. toctree::
:maxdepth: 2

CartPole is the ideal learning environment for an introduction to reinforcement learning, \
and using the DQN algorithm allows CartPole to converge (maintain equilibrium) in a very short time. \
We will introduce the use of DI-engine based on CartPole + DQN.

.. image::
images/cartpole_cmp.gif
:width: 1000
:align: center

Using the Configuration File
------------------------------

The DI-engine uses a global configuration file to control all variables of the environment and strategy, \
each of which has a corresponding default configuration that can be found in \
`cartpole_dqn_config <https://github.com/opendilab/DI-engine/blob/main/dizoo/classic_control/cartpole/config/cartpole_dqn_config.py>`_, \
in the tutorial we use the default configuration directly:

.. code-block:: python
from dizoo.classic_control.cartpole.config.cartpole_dqn_config import main_config, create_config
from ding.config import compile_config
cfg = compile_config(main_config, create_cfg=create_config, auto=True)
Initialize the Environments
------------------------------

In reinforcement learning, there may be a difference in the strategy for collecting environment data \
between the training process and the evaluation process, for example, the training process tends to train \
one epoch for n steps of collection, while the evaluation process requires completing the whole game to get a score. \
We recommend that the collection and evaluation environments be initialized separately as follows.

.. code-block:: python
from ding.envs import DingEnvWrapper, BaseEnvManagerV2
collector_env = BaseEnvManagerV2(
env_fn=[lambda: DingEnvWrapper(gym.make("CartPole-v0")) for _ in range(cfg.env.collector_env_num)],
cfg=cfg.env.manager
)
evaluator_env = BaseEnvManagerV2(
env_fn=[lambda: DingEnvWrapper(gym.make("CartPole-v0")) for _ in range(cfg.env.evaluator_env_num)],
cfg=cfg.env.manager
)
.. note::

DingEnvWrapper is a unified wrapper of DI-engine for different environment libraries. \
BaseEnvManagerV2 is a unified external interface for managing multiple environments. \
so you can use BaseEnvManagerV2 to collect multiple environments in parallel.

Select Policy
--------------

DI-engine covers most of the reinforcement learning policies, using them only requires selecting the right policy and model.
Since DQN is off-policy, we also need to instantiate a buffer module.

.. code-block:: python
from ding.model import DQN
from ding.policy import DQNPolicy
from ding.data import DequeBuffer
model = DQN(**cfg.policy.model)
buffer_ = DequeBuffer(size=cfg.policy.other.replay_buffer.replay_buffer_size)
policy = DQNPolicy(cfg.policy, model=model)
Build the Pipeline
---------------------

With the various middleware provided by DI-engine, we can easily build the entire pipeline:

.. code-block:: python
from ding.framework import task
from ding.framework.context import OnlineRLContext
from ding.framework.middleware import OffPolicyLearner, StepCollector, interaction_evaluator, data_pusher, eps_greedy_handler, CkptSaver
with task.start(async_mode=False, ctx=OnlineRLContext()):
# Evaluating, we place it on the first place to get the score of the random model as a benchmark value
task.use(interaction_evaluator(cfg, policy.eval_mode, evaluator_env))
task.use(eps_greedy_handler(cfg)) # Decay probability of explore-exploit
task.use(StepCollector(cfg, policy.collect_mode, collector_env)) # Collect environmental data
task.use(data_pusher(cfg, buffer_)) # Push data to buffer
task.use(OffPolicyLearner(cfg, policy.learn_mode, buffer_)) # Train the model
task.use(CkptSaver(cfg, policy, train_freq=100)) # Save the model
# In the evaluation process, if the model is found to have exceeded the convergence value, it will end early here
task.run()
Run the Code
--------------

The full example can be found in `DQN example <https://github.com/opendilab/DI-engine/blob/main/ding/example/dqn.py>`_ and can be run via ``python dqn.py``.

.. image::
images/train_dqn.gif
:width: 1000
:align: center

Now you have completed your first reinforcement learning task with DI-engine, you can try out more algorithms \
in the `Examples directory <https://github.com/opendilab/DI-engine/blob/main/ding/example>`_, or continue reading \
the documentation to get a deeper understanding of DI-engine's `Algorithm <../02_algo/index.html>`_, `System Design <../03_system/index.html>`_ \
and `Best Practices <../04_best_practice/index.html>`_.
100 changes: 100 additions & 0 deletions source/01_quickstart/first_rl_program_zh.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
第一个强化学习程序
============================

.. toctree::
:maxdepth: 2

CartPole 是强化学习入门的理想学习环境,使用 DQN 算法可以在很短的时间内让 CartPole 收敛(保持平衡)。
我们将基于 CartPole + DQN 介绍一下 DI-engine 的用法。

.. image::
images/cartpole_cmp.gif
:width: 1000
:align: center

使用配置文件
--------------

DI-engine 使用一个全局的配置文件来控制环境和策略的所有变量,每个环境和策略都有对应的默认配置,可以在
`cartpole_dqn_config <https://github.com/opendilab/DI-engine/blob/main/dizoo/classic_control/cartpole/config/cartpole_dqn_config.py>`_
看到完整的配置,在教程里我们直接使用默认配置:

.. code-block:: python
from dizoo.classic_control.cartpole.config.cartpole_dqn_config import main_config, create_config
from ding.config import compile_config
cfg = compile_config(main_config, create_cfg=create_config, auto=True)
初始化采集环境和评估环境
------------------------

在强化学习中,训练阶段和评估阶段采集环境数据的策略可能有区别,例如训练阶段往往是采集 n 个步骤就训练一次,
而评估阶段则需要完成整局游戏才能得到评分。我们推荐将采集和评估环境分开初始化:

.. code-block:: python
from ding.envs import DingEnvWrapper, BaseEnvManagerV2
collector_env = BaseEnvManagerV2(
env_fn=[lambda: DingEnvWrapper(gym.make("CartPole-v0")) for _ in range(cfg.env.collector_env_num)],
cfg=cfg.env.manager
)
evaluator_env = BaseEnvManagerV2(
env_fn=[lambda: DingEnvWrapper(gym.make("CartPole-v0")) for _ in range(cfg.env.evaluator_env_num)],
cfg=cfg.env.manager
)
.. note::

DingEnvWrapper 是 DI-engine 对不同环境库的统一封装。BaseEnvManagerV2 管理多个环境的统一对外接口,
利用 BaseEnvManagerV2 可以同时对多个环境进行并行采集。

选择策略
--------------

DI-engine 覆盖了大部分强化学习策略,使用它们只需要选择正确的策略和模型即可。
由于 DQN 是一个 off-policy 策略,所以我们还需要实例化一个 buffer 模块。

.. code-block:: python
from ding.model import DQN
from ding.policy import DQNPolicy
from ding.data import DequeBuffer
model = DQN(**cfg.policy.model)
buffer_ = DequeBuffer(size=cfg.policy.other.replay_buffer.replay_buffer_size)
policy = DQNPolicy(cfg.policy, model=model)
构建训练管线
--------------

利用 DI-engine 提供的各类中间件,我们可以很容易的构建整个训练管线:

.. code-block:: python
from ding.framework import task
from ding.framework.context import OnlineRLContext
from ding.framework.middleware import OffPolicyLearner, StepCollector, interaction_evaluator, data_pusher, eps_greedy_handler, CkptSaver
with task.start(async_mode=False, ctx=OnlineRLContext()):
task.use(interaction_evaluator(cfg, policy.eval_mode, evaluator_env)) # 评估流程,放在第一个是为了获得随机模型的评分作为基准值
task.use(eps_greedy_handler(cfg)) # 衰减探索-利用的概率
task.use(StepCollector(cfg, policy.collect_mode, collector_env)) # 采集环境数据
task.use(data_pusher(cfg, buffer_)) # 将数据保存到 buffer
task.use(OffPolicyLearner(cfg, policy.learn_mode, buffer_)) # 训练模型
task.use(CkptSaver(cfg, policy, train_freq=100)) # 保存模型
task.run() # 在评估流程中,如果发现模型表现已经超过了收敛值,这里将提前结束
运行代码
--------------

代码完整的示例代码可以在 `DQN example <https://github.com/opendilab/DI-engine/blob/main/ding/example/dqn.py>`_ 中找到,通过 ``python dqn.py`` 即可运行代码

.. image::
images/train_dqn.gif
:width: 1000
:align: center

至此您已经完成了 DI-engine 的第一个强化学习任务,您可以在 `示例目录 <https://github.com/opendilab/DI-engine/blob/main/ding/example>`_ 中尝试更多的算法,
或继续阅读文档来深入了解 DI-engine 的 `算法 <../02_algo/index_zh.html>`_, `系统设计 <../03_system/index_zh.html>`_ 和 `最佳实践 <../04_best_practice/index_zh.html>`_。
File renamed without changes
Binary file added source/01_quickstart/images/train_dqn.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions source/01_quickstart/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Quickstart
============================

.. toctree::
:maxdepth: 2

installation
first_rl_program
8 changes: 8 additions & 0 deletions source/01_quickstart/index_zh.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
快速开始
============================

.. toctree::
:maxdepth: 2

installation_zh
first_rl_program_zh
Loading

0 comments on commit df02585

Please sign in to comment.