This page explains shortly what I studied on sparse stochastic multi-armed bandits.
Assume a MAB problem with $K$
arms, each parametrized by its mean $\mu_k\in\mathbb{R}$
.
If you know in advance that only a small subset (of size $s$
) of the arms have a positive arm, it sounds reasonable to hope to be more efficient in playing the bandit game compared to an approach which is non aware of the sparsity.
The SparseUCB
is an extension of the well-known UCB
, and requires to known exactly the value of $s$
.
It works by identifying as fast as possible (actually, in a sub-logarithmic number of samples) the arms with non-positive means.
Then it only plays in the "good" arms with positive means, with a regular UCB policy.
I studied extensions of this idea, first of all the SparseklUCB
policy as it was suggested in the original research paper, but mainly a generic "wrapper" black-box approach.
For more details, see SparseWrapper
.
- Reference: ["Sparse Stochastic Bandits", by J. Kwon, V. Perchet & C. Vernade, COLT 2017]. Note that this algorithm only works for sparse Gaussian (or sub-Gaussian) stochastic bandits, and it includes Bernoulli arms.
TODO finish! I am writing a small research article on that topic, it is a better introduction as a self-contained document to explain this idea and the algorithms. Reference: [Structure and Sparsity of Stochastic Multi-Arm Bandits, Lilian Besson and Emilie Kaufmann, 2018].
A simple python file, configuration_sparse.py
, is used to import the arm classes, the policy classes and define the problems and the experiments.
For example, we can compare the standard UCB
and BayesUCB
algorithms, non aware of the sparsity, against the sparsity-aware SparseUCB
algorithm, as well as 4 versions of SparseWrapper
applied to BayesUCB
.
configuration = {
"horizon": 10000, # Finite horizon of the simulation
"repetitions": 100, # number of repetitions
"n_jobs": -1, # Maximum number of cores for parallelization: use ALL your CPU
"verbosity": 5, # Verbosity for the joblib calls
# Environment configuration, you can set up more than one.
"environment": [
{ # sparsity = nb of >= 0 mean, = 3 here
"arm_type": Bernoulli,
"params": 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.2, 0.3
}
],
# Policies that should be simulated, and their parameters.
"policies": [
{"archtype": UCB, "params": {} },
{"archtype": SparseUCB, "params": { "sparsity": 3 } },
{"archtype": BayesUCB, "params": { } },
]
}
Then add a Sparse-Wrapper bandit algorithm (SparseWrapper
class), you can use this piece of code:
configuration["policies"] += [
{
"archtype": SparseWrapper,
"params": {
"policy": BayesUCB,
"use_ucb_for_set_J": use_ucb_for_set_J,
"use_ucb_for_set_K": use_ucb_for_set_K,
}
}
for use_ucb_for_set_J in [ True, False ]
for use_ucb_for_set_K in [ True, False ]
]
You should use the provided Makefile
file to do this simply:
make install # install the requirements ONLY ONCE
make sparse # run and log the main.py script
Here are some plots illustrating the performances of the different policies implemented in this project, against various sparse problems (with Bernoulli
or UnboundedGaussian
arms only):
3 variants of Sparse-Wrapper for UCB, on a "simple" sparse Bernoulli problem
FIXME run some simulations and explain them!
These illustrations come from my (work in progress) article, [Structure and Sparsity of Stochastic Multi-Arm Bandits, Lilian Besson and Emilie Kaufmann, 2018].
MIT Licensed (file LICENSE).
© 2016-2018 Lilian Besson.