The numbers in the table are fitness scores, which is a high level metric summarizing the strength, speed, stability and consistency of a trial. An experiment generates many specs to search through; each spec is ran in a trial, and each trial runs multiple repeated sessions for reproducibility. For more, see analytics.
All the results below link to their respective PRs with the full experiment reports. To see more:
- the
result
PRs. - the full experiment datas contributed are public on Dropbox
Algorithm / Owner | DQN | DDQN | Dueling DQN | DQN + PER | DDQN + PER | DQN + CER | DDQN + CER | DIST DQN | REINFORCE | A2C | A2C + GAE | A2C + GAE + SIL | A3C | A3C + GAE | PPO | PPO + SIL | DPPO |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CartPole-v0 | 3.52 | 0.85 | 4.79 | 5.65 | 1.21 | 7.10 | 1.20 | 6.26 | 0.93 | 1.60 | 0.88 | 1.48 | |||||
LunarLander-v2 | 1.15 | 1.39 | 0.77 | ||||||||||||||
MountainCar-v0 | 1.04 | 1.02 | |||||||||||||||
3dball | |||||||||||||||||
gridworld | |||||||||||||||||
BeamRider-v0 | |||||||||||||||||
Pendulum-v0 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | |||||||||
Acrobot-v1 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | |||||||||
BipedalWalker-v2 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | |||||||||
CarRacing-v0 | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
- DQN: Deep Q-learning
- DDQN: Double Deep Q-Learning
- PER: Prioritized Experience Replay
- CER: Combined Experience Replay
- DIST: Distributed
- A2C: Advantage Actor-Critic
- A3C: Asynchronous Advantage Actor-Critic
- GAE: Generalized Advantage Estimation
- PPO: Proximal Policy Optimization
- SIL: Self Imitation Learning
- CartPole-v0
- LunarLander-v2
- MountainCar-v0
- 3dball
- gridworld
- BeamRider-v0
- more coming soon
- Pendulum-v0
- Acrobot-v1
- BipedalWalker-v2
- CarRacing-v0
- more coming soon