Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning (UNICORN)
GPU: NVIDIA A100-SXM4-80GB
CPU: Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
NVIDIA-SMI: 515.43.04
CUDA Version: 11.7
Install the MuJoCo according to [OpenAI guideline](https://github.com/openai/mujoco-py). To reproduce, you need to set the directory **.mujoco** under ~/. , and we provide the file-tree as follows:
--.mujoco/
--mjkey.txt
--mjpro131/
--mujoco210/
Then do the following procedures to install unicorn.
conda create --name unicorn python=3.10.4
conda activate unicorn
pip install setuptools==59.5.0
pip install wheel==0.37.1
pip install cython==0.29.32
pip install patchelf
pip install pyOpenGL -i https://pypi.douban.com/simple
pip install -r requirements.txt
pip install -U 'mujoco-py<2.2,>=2.1'
pip install gin-config
pip install scikit-learn
pip install seaborn==0.11.2
pip install tensorboardX==2.6.2
** NOTE: To reproduce the experiment results, please strictly follow versions of torch, Python and CUDA!!! CUDA version is 11.7, Python version is 3.10.4, they are listed above. Torch version is 2.0.1 that can be found at requirements.txt
As collecting the dataset need a huge amount of time, we provide two datasets to valid as an example. You can download the datasets like [env_name].tar.bz2 [At Anonymous Site Here](https://drive.google.com/drive/folders/1pCoot1fWSWqBlE64pAJcJkQTmoT4tqT7?usp=sharing), and then extract them under the directory called **batch_data**.
--batch_data/
--AntDir-v0/
--data/
--seed_1_goal_2.62/
--obs.npy
--actions.npy
--next_obs.npy
--rewards.npy
--terminals.npy
--seed_2_goal_2.739/
...
--seed_40_goal_2.562/
To test the behaviour-ood performance, you need to download the datasets like [env_name]_model.tar.bz2 [At Anonymous Site Here](https://drive.google.com/drive/folders/1pCoot1fWSWqBlE64pAJcJkQTmoT4tqT7?usp=sharing), and then extract them under the directory called **batch_data_copy**.
--batch_data_copy/
--AntDir-v0/
--data/
--seed_1_goal_2.62/
--models/
--agentxx.pt
--seed_2_goal_2.739/
...
--seed_40_goal_2.562/
You can use the following command to run the code:
conda activate unicorn
python train_offline_FOCAL.py --env-type ant_dir(hopper_param)
If there shows an error that module numpy has no attribute 'int' or module numpy has no attribute 'bool', please substitute np.bool to bool or np.int to int in the code. This is because numpy no longer supports np.int or np.bool after 1.20, some of our libraries are using an older version of numpy
If you find the codebase is helpful for you, please cite
@inproceedings{
li2024towards,
title={Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning},
author={Lanqing Li and Hai Zhang and Xinyu Zhang and Shatong Zhu and Yang YU and Junqiao Zhao and Pheng-Ann Heng},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=QFUsZvw9mx}
}