Codebase for master's thesis project in computer science specializing in artificial intelligence at the Norwegian University of Science and Technology (NTNU).
This project aims to discover the aquisition of static and dynamic concepts in the policy of an agent trained using deep reinforcement learning. The project is based on the AlphaGo Zero algorithm by Google DeepMind, and the agent is trained in the game of Go. The project uses Concept Activation Vectors (CAVs) to find static and dynamic concepts in the agent's policy. The project also uses the Monte Carlo Tree Search (MCTS) algorithm to unsupervisedly generate datasets for dynamic concepts. The project uses a joint embedding model to learn the relationship between state-action pairs and conceptual explanations. The project uses the joint embedding model and concept functions to improve the reward function of the agent. The project also trains a concept bottleneck model to learn concepts in the agent's policy.
The codebase contains:
- the deep reinforcmemt training loop, similar to the one outlined in the AlphaGo Zero paper by Google DeepMind
- concept detection using CAVs to find static and dynamic concepts in the agent's policy
- concept functions for static concepts
- algorithm using MCTS to unsupervisedly generate datasets for dynamic concepts
- joint embedding model to learn the relationship between state-action pairs and conceptual explanations
- using the joint embedding model and concept functions to improve the reward function of the agent
- Training a concept bottleneck model to learn concepts in the agent's policy
python -m pip install -r requirements.txt
nano config.py
python train_single_thread.py
mpirun -np 4 python train_hpc.py
python play.py
python tournament.py
python test_name.py
sbatch hpc.sh
tensorboard --logdir tensorboard_logs/
The results from the experiments are located in the notebooks
folder. The notebooks are named according to the experiments they represent.