RELEX - Reinforcement Learning Experiments
RELEX project was created with three ideas in mind: To teach myself Reinforcement Learning by implementing algorithms from scratch; To teach others who struggle to understand some detailed aspects of reinforcement learning algorithms; To create a space where I can experiment with innovative ideas and environments for research purposes.
Therefore, this is not a production-ready or deployment-ready library. But if you are looking for some (hopefully) easy-to-understand implementations of RL from scratch or some inspirations for study/research/paper - probably this is the right place :)
I wanted to keep implementations in this library as simple as possible, even if this means that algorithms will work slowly (because of lack of parallelism). I wanted to have something easy-to-debug, break and play with instead of a lightning-fast but hard-to-follow tool. For example, PPO implementation, or AC in theory, is embarrassingly parallel (multiple independent agents, gathering trajectories using copies of the environment), but RELEX version is single-threaded, so you can easily follow what's going on.
- Value-based algorithms:
- Policy gradient:
- PPO
- Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- Schulman, J., Moritz, P., Levine, S., Jordan, M., & Abbeel, P. (2015). High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438.
- AC
- Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., ... & Kavukcuoglu, K. (2016, June). Asynchronous methods for deep reinforcement learning. In International conference on machine learning (pp. 1928-1937). PMLR.
- Konda, V., & Tsitsiklis, J. (1999). Actor-critic algorithms. Advances in neural information processing systems, 12.
- VPG
- Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12.
- Abbeel, P. (2016). OPTIMIZING EXPECTATIONS: FROM DEEP REINFORCEMENT LEARNING TO STOCHASTIC COMPUTATION GRAPHS (Doctoral dissertation, University of California, Berkeley).
- PPO
- Hybrid algorithms:
- DDPG: Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., ... & Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.
- TD3: Fujimoto, S., Hoof, H., & Meger, D. (2018, July). Addressing function approximation error in actor-critic methods. In International conference on machine learning (pp. 1587-1596). PMLR.
- SAC:
- Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018, July). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning (pp. 1861-1870). PMLR.
- Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., ... & Levine, S. (2018). Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905.
- Spinning up AI
- Morales, M. (2020). Grokking deep reinforcement learning. Manning Publications.
- Winder, P. (2021). Reinforcement learning: Industrial applications of intelligent agents.
Policy gradient algorithms:
- PPO
- AC
- VPG
Hybrid algorithms:
- SAC
- DDPG
- TD3
Value-based algorithms:
- DQN (vanilla)
- DDQN
- Dueling DQN
TODO - top priority after DQN/DDQN/Dueling DQN and others.
This project is structured according to the Data Science Cookiecutter schema. Below you can find the description of the main directories from RL / ML perspective:
- experiments/ - contains self-contained scripts under MLFLow monitoring, each with a full experimentation pipeline.
- Experiments are divided into:
- policy gradient - generic experiments to check/debug/learn how algorithms work;
- stock_trading- experiments connected with stock markets;
- Various - uncategorized experiments utilizing e.g. external libraries like Stable-Baselines 3.
- In each experiment the following operations are performed:
- Check the algorithm performance before training.
- Evaluate "benchmark agents" (e.g., always predict mean, random choice).
- Train the main agent
- Evaluate agent after training.
- Make statistical comparison - nonparametric Kruskal test followed by post-hoc tests between pairs of agents (benchmarks vs main).
- Experiments are divided into:
- notebooks/ - personally, I'm against using notebooks to store finalized code. However, sometimes - for educational purposes or when preparing a scientific paper, a "notebook" (quotation intentional - notebook in a strict sense) form comes as a natural choice. Most notebooks in this folder are part of research papers (in writing, in review, or already published) or are "scratchpads" that will later be rewritten as self-contained scripts.
- src/ dir that is divided into:
- algorithms - contains implementations of various RL algorithms, divided into categories (policy gradient and others).
- models - contains basic, shared neural network architectures used by various models. E.g.: policy/value networks used by all policy gradient algorithms.
- experiments - contains various utilities for experiments that help automate the experimentation process.
- envs - various experimental implementations of environments. For example: "project allocation environment" used in one of the papers, or "stock trading" envs.
- If you are interested in a particular algorithm implementation - go to src/algorithms.
- If you are interested in shared/common neural networks (like policy or value nets) - go to src/models
- If you want to look at experiments:
- experiments/ - contains self-containing scripts for each algorithm or problem.
- notebooks/ - also include experiments, sometimes half-done, not working, or in progress, in the form of notebooks.
Project based on the cookiecutter data science project template. #cookiecutterdatascience
Below you can find publications utilizing Relex library.
Below you can find publications utilizing Relex. The list will be gradually updated as I develop both - the library and write new papers (which can take a looooot of time, sometimes a year, before a review).
I will be glad, if you could cite the papers in your own work if you use Relex somewhere.
Paper | Status |
---|---|
Wójcik, F. (2022). Utilization of deep reinforcement learning for discrete resource allocation problem in project management - a simulation experiment. Informatyka Ekonomiczna=Business Informatics. | In review (as of July 2022) |