Prioritized Experience Replay DDQN-Pytorch

A clean and robust implementation of Prioritized Experience Replay (PER) with DQN/DDQN.

Other RL algorithms by Pytorch can be found here.

How to use my code

Train from scratch

cd LightPriorDQN_gym0.2x # or PriorDQN_gym0.2x, PriorDQN_gym0.1x

python main.py

where the default enviroment is CartPole-v1.

Play with trained model

cd LightPriorDQN_gym0.2x # or PriorDQN_gym0.2x, PriorDQN_gym0.1x

python main.py --write False --render True --Loadmodel True --ModelIdex 50

Change Enviroment

If you want to train on different enviroments

cd LightPriorDQN_gym0.2x # or PriorDQN_gym0.2x, PriorDQN_gym0.1x

python main.py --EnvIdex 1

The --EnvIdex can be set to be 0 and 1, where

'--EnvIdex 0' for 'CartPole-v1'  
'--EnvIdex 1' for 'LunarLander-v2'

if you want train on LunarLander-v2, you need to install box2d-py first.

Visualize the training curve

You can use the tensorboard to visualize the training curve. History training curve is saved at '\runs'

Hyperparameter Setting

For more details of Hyperparameter Setting, please check 'main.py'

Versions

This repository contains three versions of PER :

V1: PriorDQN_gym0.1x
V2: PriorDQN_gym0.2x
V3: LightPriorDQN_gym0.2x

where V3 is most recommended, because it is the newest, simplest, and fastest one.

Details of V1, V2, and V3:

V1: PriorDQN_gym0.1x

Implemented with gym==0.19.0, where s_next, a, r, done, info = env.step(a)

Prioritized sampling is realized by sum-tree
```
# Dependencies of PriorDQN_gym0.1x
gym==0.19.0
numpy==1.21.6
pytorch==1.11.0
tensorboard==2.13.0

python==3.9.0
```
CartPole LunarLander

V2: PriorDQN_gym0.2x

Implemented with gymnasium==0.29.1, where s_next, a, r, terminated, truncated, info = env.step(a)

Prioritized sampling is realized by sum-tree
```
# Dependencies of PriorDQN_gym0.2x
gymnasium==0.29.1
box2d-py==2.3.5
numpy==1.26.1
pytorch==2.1.0
tensorboard==2.15.1

python==3.11.5
```
CartPole LunarLander

V3: LightPriorDQN_gym0.2x

An optimized version of PriorDQN_gym0.2x,

where prioritized sampling is realized by torch.multinomial(), which is 3X faster than sum-tree.
```
# Dependencies of LightPriorDQN_gym0.2x
gymnasium==0.29.1
box2d-py==2.3.5
numpy==1.26.1
pytorch==2.1.0
tensorboard==2.15.1

python==3.11.5
```
CartPole LunarLander

The traning time comparasion between LightPriorDQN_gym0.2x(red) and PriorDQN_gym0.2x(blue) is given as follow, where 3X acceleration can be observed:

References

PER: Schaul T, Quan J, Antonoglou I, et al. Prioritized experience replay[J]. arXiv preprint arXiv:1511.05952, 2015.

DQN: Mnih V , Kavukcuoglu K , Silver D , et al. Playing Atari with Deep Reinforcement Learning[J]. Computer Science, 2013.

Double DQN: Hasselt H V , Guez A , Silver D . Deep Reinforcement Learning with Double Q-learning[J]. Computer ence, 2015.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
LightPriorDQN_gym0.2x		LightPriorDQN_gym0.2x
PriorDQN_gym0.1x		PriorDQN_gym0.1x
PriorDQN_gym0.2x		PriorDQN_gym0.2x
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prioritized Experience Replay DDQN-Pytorch

How to use my code

Train from scratch

Play with trained model

Change Enviroment

Visualize the training curve

Hyperparameter Setting

Versions

This repository contains three versions of PER :

Details of V1, V2, and V3:

References

About

Releases

Packages

Languages

License

XinJingHao/Prioritized-Experience-Replay-DDQN-Pytorch

Folders and files

Latest commit

History

Repository files navigation

Prioritized Experience Replay DDQN-Pytorch

How to use my code

Train from scratch

Play with trained model

Change Enviroment

Visualize the training curve

Hyperparameter Setting

Versions

This repository contains three versions of PER :

Details of V1, V2, and V3:

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages