This work was done as part of the Reinforcement Learning (RL) lecture at the University of Tübingen in the summer semester of 2023. The goal was to develop three RL agents for a 2D two-player hockey game that can beat two PD-controlled basic opponent players.
We implemented Decoupled Q-Networks (DecQN) [1], MuZero [2] as well as Double Dueling DQN (DDDQN) [3-4]. In this repository, we provide both the source code and the trained network parameters.
There was also a final tournament against all RL agents developed by the other participants in the lecture. In this tournament, our MuZero agent reached the first place among 89 participants. The DecQN agent ranked 10th and the DDDQN agend finished in the middle of the field of participants.
Since this repository contains submodules, clone and pull with the additional flag --recurse-submodules
.
To run the scripts in this repository, Python 3.10 is needed.
Then, simply create a virtual environment and install the required packages via
pip install -r requirements.txt
This repository is structured as follows:
src/
├── evaluation/
│ ├── agents/
│ └── main.py
├── fabrice/
├── jens/
└── christoph/
src/evaluation/agents: Trained network parameters, agent interface.
src/evaluation/main.py: Evaluation script (cf. Usage).
src/fabrice: DecQN source code.
src/jens: MuZero source code.
src/christoph: DDDQN source code.
The Python script src/evaluation/main.py
is used to evaluate RL agents against each other or against the two basic opponents Weak and Strong.
It implements a command line interface that allows quick configuration for evaluations.
Important: It must be run inside the directory src/evaluation/
.
In the following, we present the most important arguments:
Selects the left (protagonist) player ('MuZero', 'DecQN', 'DDDQN', 'Strong', 'Weak').
Selects the right (opponent) player ('MuZero', 'DecQN', 'DDDQN', 'Strong', 'Weak').
Number of played games.
Fixes random number generator seed to produce deterministic results.
Disables graphical rendering.
For example, to evaluate MuZero against DecQN for 10 games without graphical rendering, run the command
python main.py --player-1 MuZero --player-2 DecQN --num-episodes 10 --disable-rendering
For more details, invoke the script with the flag -h
/--help
.
[1] Tim Seyde et al. "Solving Continuous Control via Q-learning". In: The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. URL: https://openreview.net/pdf?id=U5XOGxAgccS.
[2] Julian Schrittwieser et al. "Mastering Atari, Go, chess and shogi by planning with a learned model". In: Nature 588.7839 (Dec. 2020), pp. 604–609. URL: https://doi.org/10.1038%2Fs41586-020-03051-4
[3] Ziyu Wang et al. "Dueling Network Architectures for Deep Reinforcement Learning". In: Proceedings of The 33rd International Conference on Machine Learning. New York, USA: PMLR, June 2016, pp. 1995–2003. URL: https://proceedings.mlr.press/v48/wangf16.html.
[4] Hado van Hasselt, Arthur Guez, and David Silver. "Deep Reinforcement Learning with Double Q- Learning". In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. AAAI’16. AAAI Press, 2016, pp. 2094–2100. URL: https://arxiv.org/abs/1509.06461