Merge branch 'NeuralNetworkOutput' into neglogp+entropy

hill-a · Apr 9, 2020 · 352224f · 352224f
2 parents 1e3b7a9 + a93db61
commit 352224f
Show file tree

Hide file tree

Showing 74 changed files with 2,823 additions and 1,262 deletions.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -24,6 +24,6 @@
 - [ ] My change requires a change to the documentation.
 - [ ] I have updated the tests accordingly (*required for a bug fix or a new feature*).
 - [ ] I have updated the documentation accordingly.
-- [ ] I have ensured `pytest` and `pytype` both pass.
+- [ ] I have ensured `pytest` and `pytype` both pass (by running  `make pytest` and `make type`).
 
 <!--- This Template is an edited version of the one from https://github.com/evilsocket/pwnagotchi/ -->
diff --git a/.travis.yml b/.travis.yml
@@ -4,7 +4,7 @@ python:
 
 env:
   global:
-    - DOCKER_IMAGE=stablebaselines/stable-baselines-cpu:v2.9.0
+    - DOCKER_IMAGE=stablebaselines/stable-baselines-cpu:v2.10.0
 
 notifications:
   email: false
@@ -42,7 +42,7 @@ jobs:
 
     - name: "Sphinx Documentation"
       script:
-        - 'docker run -it --rm --mount src=$(pwd),target=/root/code/stable-baselines,type=bind ${DOCKER_IMAGE} bash -c "cd /root/code/stable-baselines/ && pip install .[docs] && pushd docs/ && make clean && make html"'
+        - 'docker run -it --rm --mount src=$(pwd),target=/root/code/stable-baselines,type=bind ${DOCKER_IMAGE} bash -c "cd /root/code/stable-baselines/ && pushd docs/ && make clean && make html"'
 
     - name: "Type Checking"
       script:

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -57,17 +57,17 @@ from stable_baselines import PPO2
 
 In general, we recommend using pycharm to format everything in an efficient way.
 
-Please documentation each function/method using the following template:
+Please document each function/method and [type](https://google.github.io/pytype/user_guide.html) them using the following template:
 
 ```python
 
-def my_function(arg1, arg2):
+def my_function(arg1: type1, arg2: type2) -> returntype:
     """
     Short description of the function.
 
-    :param arg1: (arg1 type) describe what is arg1
-    :param arg2: (arg2 type) describe what is arg2
-    :return: (return type) describe what is returned
+    :param arg1: (type1) describe what is arg1
+    :param arg2: (type2) describe what is arg2
+    :return: (returntype) describe what is returned
     """
     ...
     return my_variable
@@ -77,7 +77,7 @@ def my_function(arg1, arg2):
 
 Before proposing a PR, please open an issue, where the feature will be discussed. This prevent from duplicated PR to be proposed and also ease the code review process.
 
-Each PR need to be reviewed and accepted by at least one of the maintainers (@hill-a , @araffin or @erniejunior ).
+Each PR need to be reviewed and accepted by at least one of the maintainers (@hill-a, @araffin, @erniejunior, @AdamGleave or @Miffyli).
 A PR must pass the Continuous Integration tests (travis + codacy) to be merged with the master branch.
 
 Note: in rare cases, we can create exception for codacy failure.
@@ -88,15 +88,34 @@ All new features must add tests in the `tests/` folder ensuring that everything
 We use [pytest](https://pytest.org/).
 Also, when a bug fix is proposed, tests should be added to avoid regression.
 
-To run tests with `pytest` and type checking with `pytype`:
+To run tests with `pytest`:
 
 ```
-./scripts/run_tests.sh
+make pytest
 ```
 
+Type checking with `pytype`:
+
+```
+make type
+```
+
+Build the documentation:
+
+```
+make doc
+```
+
+Check documentation spelling (you need to install `sphinxcontrib.spelling` package for that):
+
+```
+make spelling
+```
+
+
 ## Changelog and Documentation
 
-Please do not forget to update the changelog and add documentation if needed.
+Please do not forget to update the changelog (`docs/misc/changelog.rst`) and add documentation if needed.
 A README is present in the `docs/` folder for instructions on how to build the documentation.
 
 

diff --git a/Dockerfile b/Dockerfile
@@ -27,12 +27,13 @@ ENV VENV /root/venv
 
 COPY ./setup.py ${CODE_DIR}/stable-baselines/setup.py
 RUN \
+    pip install pip --upgrade && \
     pip install virtualenv && \
     virtualenv $VENV --python=python3 && \
     . $VENV/bin/activate && \
     pip install --upgrade pip && \
     cd ${CODE_DIR}/stable-baselines && \
-    pip install -e .[mpi,tests] && \
+    pip install -e .[mpi,tests,docs] && \
     rm -rf $HOME/.cache/pip
 
 ENV PATH=$VENV/bin:$PATH

diff --git a/Makefile b/Makefile
@@ -0,0 +1,41 @@
+# Run pytest and coverage report
+pytest:
+	./scripts/run_tests.sh
+
+# Type check
+type:
+	pytype
+
+# Build the doc
+doc:
+	cd docs && make html
+
+# Check the spelling in the doc
+spelling:
+	cd docs && make spelling
+
+# Clean the doc build folder
+clean:
+	cd docs && make clean
+
+# Build docker images
+# If you do export RELEASE=True, it will also push them
+docker: docker-cpu docker-gpu
+
+docker-cpu:
+	./scripts/build_docker.sh
+
+docker-gpu:
+	USE_GPU=True ./scripts/build_docker.sh
+
+# PyPi package release
+release:
+	python setup.py sdist
+	python setup.py bdist_wheel
+	twine upload dist/*
+
+# Test PyPi package release
+test-release:
+	python setup.py sdist
+	python setup.py bdist_wheel
+	twine upload --repository-url https://test.pypi.org/legacy/ dist/*
diff --git a/README.md b/README.md
@@ -144,11 +144,14 @@ Please read the [documentation](https://stable-baselines.readthedocs.io/) for mo
 
 All the following examples can be executed online using Google colab notebooks:
 
-- [Getting Started](https://colab.research.google.com/drive/1_1H5bjWKYBVKbbs-Kj83dsfuZieDNcFU)
-- [Training, Saving, Loading](https://colab.research.google.com/drive/1KoAQ1C_BNtGV3sVvZCnNZaER9rstmy0s)
-- [Multiprocessing](https://colab.research.google.com/drive/1ZzNFMUUi923foaVsYb4YjPy4mjKtnOxb)
-- [Monitor Training and Plotting](https://colab.research.google.com/drive/1L_IMo6v0a0ALK8nefZm6PqPSy0vZIWBT)
-- [Atari Games](https://colab.research.google.com/drive/1iYK11yDzOOqnrXi1Sfjm1iekZr4cxLaN)
+- [Full Tutorial](https://github.com/araffin/rl-tutorial-jnrr19)
+- [All Notebooks](https://github.com/Stable-Baselines-Team/rl-colab-notebooks)
+- [Getting Started](https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/master/stable_baselines_getting_started.ipynb)
+- [Training, Saving, Loading](https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/master/saving_loading_dqn.ipynb)
+- [Multiprocessing](https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/master/multiprocessing_rl.ipynb)
+- [Monitor Training and Plotting](https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/master/monitor_training.ipynb)
+- [Atari Games](https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/master/atari_games.ipynb)
+- [RL Baselines Zoo](https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/master/rl-baselines-zoo.ipynb)
 
 
 ## Implemented Algorithms
@@ -190,7 +193,7 @@ Some of the baselines examples use [MuJoCo](http://www.mujoco.org) (multi-joint
 All unit tests in baselines can be run using pytest runner:
 ```
 pip install pytest pytest-cov
-pytest --cov-config .coveragerc --cov-report html --cov-report term --cov=.
+make pytest
 ```
 
 ## Projects Using Stable-Baselines

diff --git a/docs/common/distributions.rst b/docs/common/distributions.rst
@@ -10,7 +10,7 @@ Probability distributions used for the different action spaces:
 - ``MultiCategoricalProbabilityDistribution`` -> MultiDiscrete
 - ``BernoulliProbabilityDistribution`` -> MultiBinary
 
-The policy networks output parameters for the distributions (named `flat` in the methods).
+The policy networks output parameters for the distributions (named ``flat`` in the methods).
 Actions are then sampled from those distributions.
 
 For instance, in the case of discrete actions. The policy network outputs probability

diff --git a/docs/guide/algos.rst b/docs/guide/algos.rst
@@ -51,7 +51,7 @@ Actions ``gym.spaces``:
 
 .. note::
 
-  Some logging values (like `ep_rewmean`, `eplenmean`) are only available when using a Monitor wrapper
+  Some logging values (like ``ep_rewmean``, ``eplenmean``) are only available when using a Monitor wrapper
   See `Issue #339 <https://github.com/hill-a/stable-baselines/issues/339>`_ for more info.
 
 
@@ -62,7 +62,7 @@ Completely reproducible results are not guaranteed across Tensorflow releases or
 Furthermore, results need not be reproducible between CPU and GPU executions, even when using identical seeds.
 
 In order to make make computations deterministic on CPU, on your specific problem on one specific platform,
-you need to pass a `seed` argument at the creation of a model and set `n_cpu_tf_sess=1` (number of cpu for Tensorflow session).
+you need to pass a ``seed`` argument at the creation of a model and set `n_cpu_tf_sess=1` (number of cpu for Tensorflow session).
 If you pass an environment to the model using `set_env()`, then you also need to seed the environment first.
 
 .. note::