20181125-pybullet

Install python 3.6 with homebrew

Python 3.7 have a problem when installing tensorflow (tensorflow/tensorflow#20444).

# See https://apple.stackexchange.com/questions/329187
$ brew install \
  https://raw.githubusercontent.com/Homebrew/homebrew-core/\
  f2a764ef944b1080be64bd88dca9a1d80130c558/Formula/python.rb

venv

https://qiita.com/fiftystorm36/items/b2fd47cf32c7694adc2e

$ cd $WORKDIR
$ python3 -m venv pybullet-env
$ source pybullet-env/bin/activate
$ pip install tensorflow
$ pip install gym
$ git clone https://github.com/openai/baselines.git
$ cd baselines
$ pip install -e .
$ cd ..
$ pip install pybullet
$ pip install ruamel-yaml

Check pybullet

$ cd pybullet-env/lib/python3.6/site-packages/pybullet_envs/examples
$ python kukaGymEnvTest.py
$ python kukaCamGymEnvTest.py # much slower

References

High-Dimensional Continuous Control Using Generalized Advantage Estimation
Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015, June). Trust Region Policy Optimization. In ICML, 2015
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
Approximately optimal approximate reinforcement learning
Reinforcement Learning: An Introduction
- Chapter 13 Policy Gradient Methods
- 13.2 The Policy Gradient Theorem
- 13.3 REINFORCE: Monte Carlo Policy Gradient
- 13.4 REINFORCE with Baseline
- 13.5 Actor–Critic Methods
Understanding RL: The Bellman Equations
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. (2014, June). Deterministic policy gradient algorithms. In ICML, 2014.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., ... & Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.
OpenAI Gym 入門
[Python] Keras-RLで簡単に強化学習(DQN)を試す
OpenAI GymでFXのトレーディング環境を構築する
Tan, J., Zhang, T., Coumans, E., Iscen, A., Bai, Y., Hafner, D., & Vanhoucke, V. (2018). Sim-to-Real: Learning Agile Locomotion For Quadruped Robots. arXiv preprint arXiv:1804.10332.

Progress

baselinesによる動作はバグのため失敗。 TypeError: learn() missing 1 required positional argument: 'network' というエラー。
Tensorflow agents PPOによる動作確認はできた。ただし訓練のみ。警告が大量に表示されるので消したい。 pendulum という名前のディレクトリが作成される。Configurationは pybullet_envs/agents/configs.py の中で設定されている。

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
baselines		baselines
snippet		snippet
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

20181125-pybullet

Install python 3.6 with homebrew

venv

Check pybullet

References

Progress

About

Releases

Packages

Languages

DurhamSmith/20181125-pybullet

Folders and files

Latest commit

History

Repository files navigation

20181125-pybullet

Install python 3.6 with homebrew

venv

Check pybullet

References

Progress

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages