by Zhongwei Yu
This repository implements Explainable Causal Reinforcement Learning with attention.
The requirement file of the python environment is provided in requirement.txt
.
To create the required conda environment, use
conda create --name <env> --file requirement.txt
Our experiments involve StarCraftII Learning Enviroment. The StarCraftII program and maps can be downloaded from https://github.com/Blizzard/s2client-proto. Make sure you have installed StartCraftII and pysc2 correctly!
The main usages of this code is provided by run.py
. It executes experiment commands:
python run.py <command> <arguments>
The supported commands are:
model-based
: train a policy using environment models.model-free
: train a policy using a model-free algorithm (PPO).fitting
: fit a model for a policy (deprecated)train-explain
: train explainatory models for a policy.test-explain
: present explanation examples.
Type this in your console to see the arguments of the commands:
python run.py <command> -h
For example, to train a policy for the Build-Marine environment using models:
python run.py model-based buildmarine --seed=1 --run-id=run-1
The results and log files are saved in the experiments\
directory.
We support 4 environments:
lunarlander
for Lunarlander environment with discrete action spacelunarlander
with argument--continuous
for Lunarlander environment with continuous action spacebuildmarine
for Build-Mrine environmentcartpole
for Cartpole environment
Most hyper-parameters are managed by the Config
object defined in learning\config.py
. Default config is specified in alg\_env_setting.py
.
we may also use specified config file for each experiment simply using the argument --config=xxx.json
. the config files used for our main experiments are in the configs
directory.
- Execute the
model-based
ormodel-free
command withrun.py
to learn a policy on a given environment. This will create an experiment directory inexperiments\
. Mostlikely, it will beexperiments\<env_id>\<model-based|model-free>\run-xxx
- If you used the
model-based
command:- Go to the experiment directory, you should find the saved
actor.nn
,env-model-x.nn
andcausal-graph.json
. - Rename any environment model to
explain-env-model.nn
and the causal graph toexplain-causal-graph.json
.
- Go to the experiment directory, you should find the saved
- Otherwise:
- go to the experiment directory, you should find the saved
actor.nn
. - execute the following command using
run.py
:to train a post-hoc model. When completed, you shall findtrain-explain <your experiment direcotry> [--n-sample] [--n-step]
explain-env-model.nn
andexplain-causal-graph.json
in the experiment directory.
- go to the experiment directory, you should find the saved
- Execute
test-explain <your experiment directory>
usingrun.py
to see examples of causal chains. This command starts an interaction cycle.