SVEA: Stabilized Q-Value Estimation under Data Augmentation

This is a PyTorch implementation of SVEA using Vision Transformers from

Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation
Nicklas Hansen, Hao Su, Xiaolong Wang.

[Paper] [Webpage]

See this repository for SVEA implemented using ConvNets, as well as additional baselines and test environments.

Citation

If you find our work useful in your research, please consider citing our work as follows:

@article{hansen2021stabilizing,
  title={Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation},
  author={Nicklas Hansen and Hao Su and Xiaolong Wang},
  year={2021},
  eprint={2107.00644},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

Setup

We assume that you have access to a GPU with CUDA >=9.2 support. All dependencies can then be installed with the following commands:

conda env create -f setup/conda.yaml
conda activate svea
sh setup/install_envs.sh

Training & Evaluation

The scripts directory contains bash scripts for both SVEA and a DrQ baseline, which can be run by sh scripts/svea.sh and sh scripts/drq.sh, respectively.

Alternatively, you can call the python scripts directly, e.g. for training call

python3 src/train.py

to run SVEA on the default task, walker_walk, and using default hyperparameters. This should give you an output of the form:

Working directory: logs/walker_walk/svea/default/1
Evaluating: logs/walker_walk/svea/default/1
| eval | S: 0 | ER: 26.2285 | ERTEST: 25.3730
| train | E: 1 | S: 250 | D: 328.1 s | R: 0.0000 | ALOSS: 0.0000 | CLOSS: 0.0000 | AUXLOSS: 0.0000

where ER and ERTEST corresponds to the average return in the training and test environments, respectively. You can select the test environment used in evaluation with the --eval_mode argument, which accepts one of (original, color_easy, color_hard, video_easy, video_hard). Note that training time is approximately 9 days for 500k frames.

Acknowledgements

We want to thank the numerous researchers involved in work of which this implementation is based on. Our DrQ implementation is based on this repository, the original DMControl is available here, and the gym wrapper for it is available here. Data augmentations are based on this repository and kornia which is available here. Our ViT encoder is based on this implementation by Ross Wightman.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
figures		figures
scripts		scripts
setup		setup
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SVEA: Stabilized Q-Value Estimation under Data Augmentation

Citation

Setup

Training & Evaluation

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

nicklashansen/svea-vit

Folders and files

Latest commit

History

Repository files navigation

SVEA: Stabilized Q-Value Estimation under Data Augmentation

Citation

Setup

Training & Evaluation

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages