Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adastop user guide #444

Merged
merged 8 commits into from
Apr 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 83 additions & 0 deletions docs/basics/userguide/adastop.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
(adastop_userguide)=


# AdaStop



## Hypothesis testing to compare RL agents

AdaStop is a Sequential testing for efficient and reliable comparison of stochastic algorithms first introduced in <https://arxiv.org/abs/2306.10882>.

This section explains how to use the AdaStop algorithm in rlberry. AdaStop implements a sequential statistical test using group sequential permutation test and is especially adapted to multiple testing with very small sample size. The main library for AdaStop is in <https://github.com/TimotheeMathieu/adastop> but for comparison of RL agents, it is easier to use rlberry bindings.

We use AdaStop in particular to to adaptively choose the number of training necessary to get a statistically significant decision on a comparison of algorithms. The rationale is that when the returns of some experiment in computer science is stochastic, it becomes necessary to make the same experiment several time in order to have a viable comparison of the algorithms and be able to rank them with a theoretically controlled family-wise error rate. Adastop allows us to choose the number of repetition adaptively to stop collecting data as soon as possible. Please note, that what we call here algorithm is really a certain implementation of an algorithm.



## Comparison of A2C and PPO from stable-baselines3

Below, we compare A2C and PPO agents from stable-baselines3 on Acrobot environment. We limit the maximum number of trainings for each agent to $5\times 5 = 25$, using $5$ batches of size $5$. We ask that the resulting test has a level of $99\%$ (i.e. the probability to wrongly say that the agents are different is $1\%$).

```python
from rlberry.envs import gym_make
from stable_baselines3 import A2C, PPO
from rlberry.agents.stable_baselines import StableBaselinesAgent
from rlberry.manager import AdastopComparator

env_ctor, env_kwargs = gym_make, dict(id="CartPole-v1")

managers = [
{
"agent_class": StableBaselinesAgent,
"train_env": (env_ctor, env_kwargs),
"fit_budget": 5e4,
"agent_name": "A2C",
"init_kwargs": {"algo_cls": A2C, "policy": "MlpPolicy", "verbose": 1},
},
{
"agent_class": StableBaselinesAgent,
"train_env": (env_ctor, env_kwargs),
"agent_name": "PPO",
"fit_budget": 5e4,
"init_kwargs": {"algo_cls": PPO, "policy": "MlpPolicy", "verbose": 1},
},
]

comparator = AdastopComparator(n=5, K=5, alpha=0.01)
comparator.compare(managers)
print(comparator.managers_paths)
```

## Result visualisation

The results of the comparison can be obtained either in text format using `print_results`

```python
comparator.print_results()
```

The result is found using 10 scores for each agent:
```
Number of scores used for each agent:
A2C:10
PPO:10

Mean of scores of each agent:
A2C:271.17600000000004
PPO:500.0

Decision for each comparison:
A2C vs PPO:smaller
```


or with a plot using `plot_results`

```python
comparator.plot_results()
```

![](adastop_boxplots.png)

The boxplots in the plot represent the distribution of the scores gathered for each agent. The table on the top of the figure represent the decisions taken by the test: larger, smaller or equal.
Binary file added docs/basics/userguide/adastop_boxplots.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 3 additions & 5 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,14 +49,12 @@ It could be useful in many way :
See the [Save and Load Experiment](save_load_page) page.

### Statistical comparison of RL agents

The principal goal of rlberry is to give tools for proper experimentations in RL. In research, one of the usual tasks is to compare two or more RL agents, and for this one typically uses several seeds to train the agents several times and compare the resulting mean reward. We show here how to make sure that enough data and enough information were acquired to assert that two RL agents are indeed different. We propose two ways to do that: first are classical hypothesis testing and second are sequential testing scheme with AdaStop that aim at saving computation by stopping early if possible.
#### Compare agents
Compare several trained agents using the mean over a specify number of evaluations for each agent.
TODO : to complete
We give tools to compare several trained agents using the mean over a specify number of evaluations for each agent. The explanation can be found in the [user guide](comparison_page).

#### AdaStop
TODO : Text

AdaStop is a Sequential testing for efficient and reliable comparison of stochastic algorithms. It has been successfully used to compare efficiently RL agents and an example of such use can be found in the [user guide](adastop_userguide).

[linked paper](https://hal-lara.archives-ouvertes.fr/hal-04132861/)

Expand Down
3 changes: 2 additions & 1 deletion docs/user_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,4 +45,5 @@ You can find more details about installation [here](installation)!
- Custom Environments (In construction)
- [Using extrenal libraries](external) (like [Stable Baselines](stable_baselines) and [Gymnasium](Gymnasium_ancor))
- Transfer Learning (In construction)
- AdaStop(In construction)
- [Hypothesis testing for comparison of RL agents](comparison_page)
- [Adaptive hypothesis testing for comparison of RL agents with AdaStop](adastop_userguide)
22 changes: 22 additions & 0 deletions rlberry/manager/comparison.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,28 @@ def compare(self, manager_list, n_evaluations=50, verbose=True):
logger.info("Results are ")
print(self.get_results())

def print_results(self):
"""
Print the results of the test.
"""
print("Number of scores used for each agent:")
for key in self.n_iters:
print(key + ":" + str(self.n_iters[key]))

print("")
print("Mean of scores of each agent:")
for key in self.eval_values:
print(key + ":" + str(np.mean(self.eval_values[key])))

print("")
print("Decision for each comparison:")
for c in self.comparisons:
print(
"{0} vs {1}".format(self.agent_names[c[0]], self.agent_names[c[1]])
+ ":"
+ str(self.decisions[str(c)])
)

def _fit_evaluate(self, managers, eval_values, seeders):
"""
fit rlberry agents.
Expand Down
1 change: 1 addition & 0 deletions rlberry/manager/tests/test_comparisons.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,5 +139,6 @@ def test_adastop():

comparator = AdastopComparator(seed=42)
comparator.compare(managers)
comparator.print_results()
assert comparator.is_finished
assert not ("equal" in comparator.decisions.values())
Loading