Double Deep Q-learning for Breakout

A simple implementation of double deep Q-network based on Keras for playing Breakout

In brief, deep Q-learning [1] parametrizes action-value functions in Q-learning with deep neural networks to circumvent problems caused by large state-space. The implemented method, double Q-learning [2], uses the current action instead of the maximum Q score to estimate the future reward.

Feature transformation in action (Youtube)

How to use

You may type $ python3 run.py train in prompt to train a deep Q network.
You may type $ python3 run.py demo in prompt to see a demo (in slow motion with feature visualization).
Breakout.py contains a Breakout class, the Breakout game. Set Breakout(player=Player()) to see that a deep Q-network plays the game. You may set display=False to fit a model silently (and much faster).
Player.py contains a Player class, including a deep neural network and the training algorithm descripted in [1] and [2].
- learning: bool: run the training process every FRAMES_PER_TRAINING in the game.
- load_model: load a model as well as pretrained weights in serialized folder.
- This class is independent of Breakout. It can be adapted to any game with one agent and small number of discrete actions.
serialized folder contains a pretained sample model.

Breakout and Player classes have some static parameters. The names of the parameters are self-describing, and you may change these parameters to change to settings of the game and to tune parameters of the training algorithm. Note that the provided pretrained model may not work well in different game settings.

Settings

Data preprocessing

Each frame of the game is converted to a gray-scale image and stored as a 2-D Numpy array of unsigned 8-bit intgers.

States

Frames are sampled every 0.009 seconds. A state of the game is deinfed as consecutive Player.BANDWIDTH resized frames, which is represented by a tensor of size (Player.BANDWIDTH, frame_height, frame_width), which is defaulted to be (5, 80, 80).

Rewards

Hitting some bricks: +1
Failing to catch the ball: -1

Tuning for the pretrained model

The pretrained model was trained with 1,000,000 iterations of RMSProp (learning rate=0.0002, decay factor=0.95) and can be trainined further to improve its performance. Each iteration updated the weights with a batch of transitions of size 32, which is randomly sampled from a replay dataset of size 1,000,000 (it is very cruial to have a huge replay dataset for training deep Q networks; increase the size of the replay dataset if it is possible). The model was trained with ϵ linearly decaying from 1.0 to 0.1 over the first 500,000 and being fixed at 0.1 in the remaining iterations. The target Q network was updated every 5,000 iterations.

Test environment

Python 3.5.2
Keras 2.0.1
Matplotlib 2.0.0
Numpy 1.12.0
Pygame 1.9.3
Theano 0.9.0

References

[1] Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.

[2] Van Hasselt, H., Guez, A., & Silver, D. (2015). Deep reinforcement learning with double Q-learning. CoRR, abs/1509.06461.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
serialized		serialized
Breakout.py		Breakout.py
Demo.py		Demo.py
Player.py		Player.py
README.md		README.md
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Double Deep Q-learning for Breakout

How to use

Settings

Data preprocessing

States

Rewards

Network architecture

Tuning for the pretrained model

Test environment

References

About

Releases

Packages

Languages

yunjhongwu/Double-DQN-Breakout

Folders and files

Latest commit

History

Repository files navigation

Double Deep Q-learning for Breakout

How to use

Settings

Data preprocessing

States

Rewards

Network architecture

Tuning for the pretrained model

Test environment

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages