A clean and robust implementation of Prioritized Experience Replay (PER) with DQN/DDQN.
![]() |
![]() |
---|
Other RL algorithms by Pytorch can be found here.
cd LightPriorDQN_gym0.2x # or PriorDQN_gym0.2x, PriorDQN_gym0.1x
python main.py
where the default enviroment is CartPole-v1.
cd LightPriorDQN_gym0.2x # or PriorDQN_gym0.2x, PriorDQN_gym0.1x
python main.py --write False --render True --Loadmodel True --ModelIdex 50
If you want to train on different enviroments
cd LightPriorDQN_gym0.2x # or PriorDQN_gym0.2x, PriorDQN_gym0.1x
python main.py --EnvIdex 1
The --EnvIdex can be set to be 0 and 1, where
'--EnvIdex 0' for 'CartPole-v1'
'--EnvIdex 1' for 'LunarLander-v2'
if you want train on LunarLander-v2, you need to install box2d-py first.
You can use the tensorboard to visualize the training curve. History training curve is saved at '\runs'
For more details of Hyperparameter Setting, please check 'main.py'
- V1: PriorDQN_gym0.1x
- V2: PriorDQN_gym0.2x
- V3: LightPriorDQN_gym0.2x
where V3 is most recommended, because it is the newest, simplest, and fastest one.
-
V1: PriorDQN_gym0.1x
Implemented with gym==0.19.0, where s_next, a, r, done, info = env.step(a)
Prioritized sampling is realized by sum-tree
# Dependencies of PriorDQN_gym0.1x gym==0.19.0 numpy==1.21.6 pytorch==1.11.0 tensorboard==2.13.0 python==3.9.0
CartPole LunarLander
-
V2: PriorDQN_gym0.2x
Implemented with gymnasium==0.29.1, where s_next, a, r, terminated, truncated, info = env.step(a)
Prioritized sampling is realized by sum-tree
# Dependencies of PriorDQN_gym0.2x gymnasium==0.29.1 box2d-py==2.3.5 numpy==1.26.1 pytorch==2.1.0 tensorboard==2.15.1 python==3.11.5
CartPole LunarLander
-
V3: LightPriorDQN_gym0.2x
An optimized version of PriorDQN_gym0.2x,
where prioritized sampling is realized by torch.multinomial(), which is 3X faster than sum-tree.
# Dependencies of LightPriorDQN_gym0.2x gymnasium==0.29.1 box2d-py==2.3.5 numpy==1.26.1 pytorch==2.1.0 tensorboard==2.15.1 python==3.11.5
CartPole LunarLander
The traning time comparasion between LightPriorDQN_gym0.2x(red) and PriorDQN_gym0.2x(blue) is given as follow, where 3X acceleration can be observed:
PER: Schaul T, Quan J, Antonoglou I, et al. Prioritized experience replay[J]. arXiv preprint arXiv:1511.05952, 2015.
DQN: Mnih V , Kavukcuoglu K , Silver D , et al. Playing Atari with Deep Reinforcement Learning[J]. Computer Science, 2013.
Double DQN: Hasselt H V , Guez A , Silver D . Deep Reinforcement Learning with Double Q-learning[J]. Computer ence, 2015.