Benchmarks

The numbers in the table are fitness scores, which is a high level metric summarizing the strength, speed, stability and consistency of a trial. An experiment generates many specs to search through; each spec is ran in a trial, and each trial runs multiple repeated sessions for reproducibility. For more, see analytics.

All the results below link to their respective PRs with the full experiment reports. To see more:

the result PRs.
the full experiment datas contributed are public on Dropbox

Algorithm / Owner	DQN	DDQN	Dueling DQN	DQN + PER	DDQN + PER	DQN + CER	DDQN + CER	DIST DQN	REINFORCE	A2C	A2C + GAE	A2C + GAE + SIL	A3C	A3C + GAE	PPO	PPO + SIL
CartPole-v0	3.52	0.85				4.79	5.65		1.21	7.10	1.20	6.26	0.93	1.60	0.88	1.48
LunarLander-v2	1.15	1.39							0.77
MountainCar-v0	1.04	1.02
3dball
gridworld
BeamRider-v0
Pendulum-v0	n/a	n/a	n/a	n/a	n/a	n/a	n/a	n/a
Acrobot-v1	n/a	n/a	n/a	n/a	n/a	n/a	n/a	n/a
BipedalWalker-v2	n/a	n/a	n/a	n/a	n/a	n/a	n/a	n/a
CarRacing-v0	n/a	n/a	n/a	n/a	n/a	n/a	n/a	n/a

Terminology

DQN: Deep Q-learning
DDQN: Double Deep Q-Learning
PER: Prioritized Experience Replay
CER: Combined Experience Replay
DIST: Distributed
A2C: Advantage Actor-Critic
A3C: Asynchronous Advantage Actor-Critic
GAE: Generalized Advantage Estimation
PPO: Proximal Policy Optimization
SIL: Self Imitation Learning

Discrete environments

CartPole-v0
LunarLander-v2
MountainCar-v0
3dball
gridworld
BeamRider-v0
more coming soon

Continuous environments

Pendulum-v0
Acrobot-v1
BipedalWalker-v2
CarRacing-v0
more coming soon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BENCHMARK.md

BENCHMARK.md

Benchmarks

Terminology

Discrete environments

Continuous environments

Files

BENCHMARK.md

Latest commit

History

BENCHMARK.md

File metadata and controls

Benchmarks

Terminology

Discrete environments

Continuous environments