GitHub - xvado00/AIQ-POA: Policy Optimisation Agents module fort the AIQ Test

xvado00 / AIQ-POA Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Policy Optimisation Agents module fort the AIQ Test

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
agents		agents
License.txt		License.txt
Readme.txt		Readme.txt

Repository files navigation


POLICY OPTIMISATION AGENTS MODULE FOR THE AIQ TEST README
=========================================================


This is a module for the Python v3 AIQ Test (https://github.com/xvado00/AIQ)
that adds implementation of selected Policy Optimisation Agents:

- VPG.py  Vanilla Policy Gradient agent based on OpenAI SpinningUp:
https://spinningup.openai.com/en/latest/algorithms/vpg.html

- PPO.py  Proximal Policy Optimization agent based on OpenAI SpinningUp:
https://spinningup.openai.com/en/latest/algorithms/ppo.html

The module extends the original implementation that was part of a thesis:

Assessing Policy Optimization agents using Algorithmic IQ test
by Petr Zeman, Prague University of Economics and Business, 2023,

and was used for the experiments in paper:

Towards Evaluating Policy Optimisation Agents
using Algorithmic Intelligence Quotient Test
by Ondřej Vadinský and Petr Zeman, 2024.

This work was funded by the Internal Grant Agency
of Prague University of Economics and Business (F4/41/2023).

The code is released under the GNU GPLv3.  See Licence.txt file.


Known Issues
------------

Both the agents experience NaN errors, most likely due to the advantages
between policy changes being almost 0 in some environments. The number
of errors increases with decreasing values of SPE hyperparameter. For
low values of SPE (10) this noticeably impacts agents performance.
The number of errors encountered by PPO is higher than that of VPG for
the same SPE setting.


Installation
------------

0) Install PyTorch and mpi4py.
1) Get the AIQ test from: https://github.com/xvado00/AIQ
2) Copy the contents of agents directory into the AIQ/agents.
3) Edit AIQ/agents/__init__.py to list the agents "VPG" and "PPO".


Usage
-----

Hyperparametrs for the VPG agent (and their defaults) are as follows:

- SPE: number of environment interaction steps per epoch,
- VFTI: number of gradient descent iterations
		for value function optimisation per epoch (80),
- gamma: discount factor (0.99),
- PLR: policy learning rate (0.0003),
- VFLR: value function learning rate (0.001),
- Lambda: balances variance (0) and bias (1) in
		generalised advantage estimation (0.97),
- hidden1: number of neurons in the 1st hidden layer of Actor and
		Critic networks (64),
- hidden2: number of neurons in the 2nd hidden layer of Actor and
		Critic networks (64),
- hidden3: number of neurons in the 3rd hidden layer of Actor and
		Critic networks, if set to 0, 3rd hidden layer is not used (0).

This resutls in the following agent string for the AIQ test:

VPG,<SPE>,80,0.99,0.0003,0.001,0.97,64,64,0

Suggested values for SPE: [30,150].

Hyperparametrs for the PPO agent (and their defaults) are as follows:

- SPE: number of environment interaction steps per epoch,
- PTI: maximum number of gradient ascent iterations
		for policy optimisation per epoch (80),
- VFTI: number of gradient descent iterations
		for value function optimisation per epoch (80),
- gamma: discount factor (0.99),
- PLR: policy learning rate (0.0003),
- VFLR: value function learning rate (0.001),
- Lambda: balances variance (0) and bias (1) in
		generalised advantage estimation (0.97),
- epsilon: clip ratio for new policy clipping (0.2),
- TKL: (approximate) KL-Divergence between new and old policy that
		triggers early-stopping of policy gradient update (0.01),
- hidden1: number of neurons in the 1st hidden layer of Actor and
		Critic networks (64),
- hidden2: number of neurons in the 2nd hidden layer of Actor and
		Critic networks (64),
- hidden3: number of neurons in the 3rd hidden layer of Actor and
		Critic networks, if set to 0, 3rd hidden layer is not used (0),

This resutls in the following agent string for the AIQ test:

PPO,<SPE>,80,80,0.99,0.0003,0.001,0.97,0.2,0.01,64,64,0

Suggested values for SPE: [30,250].


Refer to the AIQ Test Readme.txt for the test parameters and how to run it.