Why?: Keras implementation of a blog post (Deep Reinforcement Learning: Pong from Pixels) that originally used Python's numpy library for neural network operations.
During testing, the trained model was executed using pure Python in Jupyter Notebook:
![](https://private-user-images.githubusercontent.com/21960382/285658612-466749f6-33ca-4a2e-9509-8da9ed0e52a9.gif?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxMzAzNzgsIm5iZiI6MTczOTEzMDA3OCwicGF0aCI6Ii8yMTk2MDM4Mi8yODU2NTg2MTItNDY2NzQ5ZjYtMzNjYS00YTJlLTk1MDktOGRhOWVkMGU1MmE5LmdpZj9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA5VDE5NDExOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTE2ZDVmMzFhZmJhZTY5ODgyNTI2MjZjYzgxNGYyNDUzNWFkZTEzNTNkOTdjMWUxYzFiNTA1YmIyMDk1MDA5NDcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.iGRrZHCKd5lNmhlE7e-jEpNFg006pr6InqzD6ZMVnUA)
Next, this model was implemented using Keras and TensorFlow. Here, I show the model and its one-run reward
The Model that performed the best:
![image](https://private-user-images.githubusercontent.com/21960382/286043108-63938ff6-66ed-4c16-affe-da680488def0.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxMzAzNzgsIm5iZiI6MTczOTEzMDA3OCwicGF0aCI6Ii8yMTk2MDM4Mi8yODYwNDMxMDgtNjM5MzhmZjYtNjZlZC00YzE2LWFmZmUtZGE2ODA0ODhkZWYwLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA5VDE5NDExOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTg2NTM5YjFiZTZkZGQwNDNiMTFjNWU5ODY3OTY1M2FkMTM5OWQ1MWUwMWUyOTNlNmM4NDg1OTYwNDhhMDBkN2QmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.aQE-lClj3jSEsZYfCKxPm-DVS6Zf9wU0vduaJNbnyDI)
Model 1:
![image](https://private-user-images.githubusercontent.com/21960382/286034486-1b9f80b9-aecb-424c-b8ff-4d3d5d969749.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxMzAzNzgsIm5iZiI6MTczOTEzMDA3OCwicGF0aCI6Ii8yMTk2MDM4Mi8yODYwMzQ0ODYtMWI5ZjgwYjktYWVjYi00MjRjLWI4ZmYtNGQzZDVkOTY5NzQ5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA5VDE5NDExOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTVjZWEwYmVjNjRkNGMwZGU1ZWQ4ZWFiNTkyMGFjMGJhZTQzMDRjZjgyYzVlOGVmYzM4ZWQwNTQxMmZlMmNkYzMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.Px4KNQE6X2qkpmMjuOF5GznfeBDP0JcQlLSp_dvsq3I)
Model 2:
![image](https://private-user-images.githubusercontent.com/21960382/286038307-8c00cadf-fb0e-41b0-b238-136d9c226f2e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxMzAzNzgsIm5iZiI6MTczOTEzMDA3OCwicGF0aCI6Ii8yMTk2MDM4Mi8yODYwMzgzMDctOGMwMGNhZGYtZmIwZS00MWIwLWIyMzgtMTM2ZDljMjI2ZjJlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA5VDE5NDExOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTk0NmE1MDJkYzM5MTAxMTExNmQyNGNmZTdkM2VhNzhkOWU0M2ZlZDdlMjE1Zjc5MzljNTM4N2EzN2E0NmE2NzAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.8vD0ODteSZbBZcxidyrr3WC3IRfjHylBpe6iplJ9LZM)
This repository contains two implementations of a Pong game agent - one in pure Python and one using Keras/TensorFlow. The goal is to contrast different approaches to this reinforcement learning problem.
The Python implementation in Pong_game_python_implementation.py follows a policy gradient method using RMSProp and a simple 2-layer neural network model. Key aspects:
State Representation: The Pong screen is preprocessed into an 80x80 1D vector Model Architecture: 2 fully-connected hidden layers, using ReLU activations and Xavier initialization Training: Policy gradient using discounted rewards and RMSProp parameter updates Keras Implementation The Keras implementation in Pong_game_The_keras_model.py uses a Deep Q Learning (DQN) agent with a Convolutional Neural Network (CNN) model. Key aspects:
State Representation: Stores last 4 frames as 84x84 grayscale images Model Architecture: 3 Conv2D layers for feature extraction, followed by fully-connected layers Training: DQN agent with experience replay, target network, and ε-greedy exploration strategy The CNN automatically extracts spatial features from the game screen, instead of manual engineering of the input. Training is done through Q-learning updates based on memories of (state, action, reward) transitions.