Skip to content

Latest commit

 

History

History
48 lines (42 loc) · 4.76 KB

BENCHMARK.md

File metadata and controls

48 lines (42 loc) · 4.76 KB

Benchmarks

The numbers in the table are fitness scores, which is a high level metric summarizing the strength, speed, stability and consistency of a trial. An experiment generates many specs to search through; each spec is ran in a trial, and each trial runs multiple repeated sessions for reproducibility. For more, see analytics.

All the results below link to their respective PRs with the full experiment reports. To see more:

Algorithm / Owner DQN DDQN Dueling DQN DQN + PER DDQN + PER DQN + CER DDQN + CER DIST DQN REINFORCE A2C A2C + GAE A2C + GAE + SIL A3C A3C + GAE PPO PPO + SIL DPPO
CartPole-v0 3.52 0.85 4.79 5.65 1.21 7.10 1.20 6.26 0.93 1.60 0.88 1.48
LunarLander-v2 1.15 1.39 0.77
MountainCar-v0 1.04 1.02
3dball
gridworld
BeamRider-v0
Pendulum-v0 n/a n/a n/a n/a n/a n/a n/a n/a
Acrobot-v1 n/a n/a n/a n/a n/a n/a n/a n/a
BipedalWalker-v2 n/a n/a n/a n/a n/a n/a n/a n/a
CarRacing-v0 n/a n/a n/a n/a n/a n/a n/a n/a

Terminology

  • DQN: Deep Q-learning
  • DDQN: Double Deep Q-Learning
  • PER: Prioritized Experience Replay
  • CER: Combined Experience Replay
  • DIST: Distributed
  • A2C: Advantage Actor-Critic
  • A3C: Asynchronous Advantage Actor-Critic
  • GAE: Generalized Advantage Estimation
  • PPO: Proximal Policy Optimization
  • SIL: Self Imitation Learning

Discrete environments

Continuous environments