Release 🦎 New strategies, API flexibility, small fixes · RobertTLange/evosax

Adds a total_env_steps counter to both GymFitness and BraxFitness for easier sample efficiency comparability with RL algorithms.
Support for new strategies/genetic algorithms
- SAMR-GA (Clune et al., 2008)
- GESMR-GA (Kumar et al., 2022)
- SNES (Wierstra et al., 2014)
- DES (Lange et al., 2022)
- Guided ES (Maheswaranathan et al., 2018)
- ASEBO (Choromanski et al., 2019)
- CR-FM-NES (Nomura & Ono, 2022)
- MR15-GA (Rechenberg, 1978)
Adds full set of BBOB low-dimensional functions (BBOBFitness)
Adds 2D visualizer animating sampled points (BBOBVisualizer)
Adds Evosax2JAXWrapper to wrap all evosax strategies
Adds Adan optimizer (Xie et al., 2022)

ParameterReshaper can now be directly applied from within the strategy. You simply have to provide a pholder_params pytree at strategy instantiation (and no num_dims).
FitnessShaper can also be directly applied from within the strategy. This makes it easier to track the best performing member across generations and addresses issue #32. Simply provide the fitness shaping settings as args to the strategy (maximize, centered_rank, ...)
Removes Brax fitness (use EvoJAX version instead)
Add lrate and sigma schedule to strategy instantiation

Fixed reward masking in GymFitness. Using jnp.sum(dones) >= 1 for cumulative return computation zeros out the final timestep, which is wrong. That's why there were problems with sparse reward gym environments (e.g. Mountain Car).
Fixed PGPE sample indexing.
Fixed weight decay. Falsely multiplied by -1 when maximization.

Provide feedback