Skip to content

A clean and robust implementation of Prioritized DQN and Prioritized Double DQN

License

Notifications You must be signed in to change notification settings

XinJingHao/Prioritized-Experience-Replay-DDQN-Pytorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Prioritized Experience Replay DDQN-Pytorch

A clean and robust implementation of Prioritized Experience Replay (PER) with DQN/DDQN.

Other RL algorithms by Pytorch can be found here.



How to use my code

Train from scratch

cd LightPriorDQN_gym0.2x # or PriorDQN_gym0.2x, PriorDQN_gym0.1x

python main.py

where the default enviroment is CartPole-v1.

Play with trained model

cd LightPriorDQN_gym0.2x # or PriorDQN_gym0.2x, PriorDQN_gym0.1x

python main.py --write False --render True --Loadmodel True --ModelIdex 50

Change Enviroment

If you want to train on different enviroments

cd LightPriorDQN_gym0.2x # or PriorDQN_gym0.2x, PriorDQN_gym0.1x

python main.py --EnvIdex 1

The --EnvIdex can be set to be 0 and 1, where

'--EnvIdex 0' for 'CartPole-v1'  
'--EnvIdex 1' for 'LunarLander-v2'   

if you want train on LunarLander-v2, you need to install box2d-py first.

Visualize the training curve

You can use the tensorboard to visualize the training curve. History training curve is saved at '\runs'

Hyperparameter Setting

For more details of Hyperparameter Setting, please check 'main.py'


Versions

This repository contains three versions of PER :

  • V1: PriorDQN_gym0.1x
  • V2: PriorDQN_gym0.2x
  • V3: LightPriorDQN_gym0.2x

where V3 is most recommended, because it is the newest, simplest, and fastest one.

Details of V1, V2, and V3:

  • V1: PriorDQN_gym0.1x

    Implemented with gym==0.19.0, where s_next, a, r, done, info = env.step(a)

    Prioritized sampling is realized by sum-tree

    # Dependencies of PriorDQN_gym0.1x
    gym==0.19.0
    numpy==1.21.6
    pytorch==1.11.0
    tensorboard==2.13.0
    
    python==3.9.0
    CartPole LunarLander


  • V2: PriorDQN_gym0.2x

    Implemented with gymnasium==0.29.1, where s_next, a, r, terminated, truncated, info = env.step(a)

    Prioritized sampling is realized by sum-tree

    # Dependencies of PriorDQN_gym0.2x
    gymnasium==0.29.1
    box2d-py==2.3.5
    numpy==1.26.1
    pytorch==2.1.0
    tensorboard==2.15.1
    
    python==3.11.5
    CartPole LunarLander


  • V3: LightPriorDQN_gym0.2x

    An optimized version of PriorDQN_gym0.2x,

    where prioritized sampling is realized by torch.multinomial(), which is 3X faster than sum-tree.

    # Dependencies of LightPriorDQN_gym0.2x
    gymnasium==0.29.1
    box2d-py==2.3.5
    numpy==1.26.1
    pytorch==2.1.0
    tensorboard==2.15.1
    
    python==3.11.5
    CartPole LunarLander

    The traning time comparasion between LightPriorDQN_gym0.2x(red) and PriorDQN_gym0.2x(blue) is given as follow, where 3X acceleration can be observed:



References

PER: Schaul T, Quan J, Antonoglou I, et al. Prioritized experience replay[J]. arXiv preprint arXiv:1511.05952, 2015.

DQN: Mnih V , Kavukcuoglu K , Silver D , et al. Playing Atari with Deep Reinforcement Learning[J]. Computer Science, 2013.

Double DQN: Hasselt H V , Guez A , Silver D . Deep Reinforcement Learning with Double Q-learning[J]. Computer ence, 2015.

About

A clean and robust implementation of Prioritized DQN and Prioritized Double DQN

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages