Python 3.7 have a problem when installing tensorflow (tensorflow/tensorflow#20444).
# See https://apple.stackexchange.com/questions/329187
$ brew install \
https://raw.githubusercontent.com/Homebrew/homebrew-core/\
f2a764ef944b1080be64bd88dca9a1d80130c558/Formula/python.rb
$ cd $WORKDIR
$ python3 -m venv pybullet-env
$ source pybullet-env/bin/activate
$ pip install tensorflow
$ pip install gym
$ git clone https://github.com/openai/baselines.git
$ cd baselines
$ pip install -e .
$ cd ..
$ pip install pybullet
$ pip install ruamel-yaml
$ cd pybullet-env/lib/python3.6/site-packages/pybullet_envs/examples
$ python kukaGymEnvTest.py
$ python kukaCamGymEnvTest.py # much slower
- High-Dimensional Continuous Control Using Generalized Advantage Estimation
- Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015, June). Trust Region Policy Optimization. In ICML, 2015
- Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- Approximately optimal approximate reinforcement learning
- Reinforcement Learning: An Introduction
- Chapter 13 Policy Gradient Methods
- 13.2 The Policy Gradient Theorem
- 13.3 REINFORCE: Monte Carlo Policy Gradient
- 13.4 REINFORCE with Baseline
- 13.5 Actor–Critic Methods
- Understanding RL: The Bellman Equations
- Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. (2014, June). Deterministic policy gradient algorithms. In ICML, 2014.
- Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., ... & Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.
- OpenAI Gym 入門
- [Python] Keras-RLで簡単に強化学習(DQN)を試す
- OpenAI GymでFXのトレーディング環境を構築する
- Tan, J., Zhang, T., Coumans, E., Iscen, A., Bai, Y., Hafner, D., & Vanhoucke, V. (2018). Sim-to-Real: Learning Agile Locomotion For Quadruped Robots. arXiv preprint arXiv:1804.10332.
- baselinesによる動作はバグのため失敗。
TypeError: learn() missing 1 required positional argument: 'network'
というエラー。 - Tensorflow agents PPOによる
動作確認はできた。ただし訓練のみ。警告が大量に表示されるので消したい。
pendulum
という 名前のディレクトリが作成される。Configurationはpybullet_envs/agents/configs.py
の中で設定されている。