Boundary Over-Exploration Hinders Performance for Minimizing a Very Noisy Function #2184
Replies: 2 comments
-
Very interesting problem and great writeup. What is the signal-to-noise ratio in your example? What model are you using? Are you passing in the observation noise level or are you letting the model infer that? Would you be able to share your simulation code so we can better understand what's going on? |
Beta Was this translation helpful? Give feedback.
-
The simulation has a couple steps to it so I think it would be best to describe that in some more detail (I'll also provide the code). The foundation of the simulation is what's called a reinforcement learning-drift diffusion model (RLDDM). The basic idea is that the model has to choose from two options over the course of a task and will learn which of the options is more valuable depending on reward history. Options are chosen based on the value difference with choices being more consistent and rapid for higher value differences. I fit this model to a previously collected behavioral dataset and have posterior distributions for model parameters. From this prior fit, I know that two parameters (boundary separation and drift rate) are affected by my intervention and lead to a reduction in reaction time when modulated. This is the effect I'm aiming to leverage in my simulation. These simulated subjects are then used to generate task behavior along with the suggestions from the Bayesian optimization. Due to the structure of this model, the noise does vary somewhat trial to trial (high value difference decisions are faster and lower variance than low value difference ones). This was an effect I was planning to model for in later iterations of this investigation, but I haven't gotten there yet due to the current issue. In the first example, the signal to noise ratio (change in reaction time divided by standard deviation in reaction time) ranges from roughly 0.3-0.5 and for the second it ranges from roughly 0.2-0.3. For the Gaussian process, I was using a SingleTaskGP and let the model infer the observation noise (I may eventually feed it in when I try approximating the trial-to-trial correlation). In case it's relevant, I did log transform the reaction times to make the distributions closer to normal (rather than the standard gamma-like shape they normally have). For the code I've attached (running the master2 script), the working optimization was from simulation 0 and the failed one was from simulation 33. The ground-truth plots can be produced using the plot_ground_truth function and the sample and model overlays can be made using the commented code in the simulation loop. Let me know if something is missing or unclear, I appreciate any insight! |
Beta Was this translation helpful? Give feedback.
-
I've been using Bayesian Optimization with Botorch to identify input parameters that minimize reaction times in a decision-making experiment. Reaction times are very noisy, so numerous trials are required to accurately measure the value of the function. I believe this high degree of noise (relative to the effect of my parameters) is interacting poorly with virtually all the acquisition functions I've tried leading to major over-exploration of the boundaries of the parameter space and poor performance.
To develop a protocol for my actual experiments, I have been running Bayesian Optimization on simulated reaction time data with stereotypical response surfaces for parameters. Two example surfaces are shown below, one more standard and the other with a coincidental minimum on the boundary:
To mimic how the experiment would run in practice, I've divided the simulation into 8 sequential blocks corresponding to 8 attempts at the task. Each attempt consists of 150 trials. In the first attempt, I perform a variant of space filling to uniformly fill the parameter space with samples. In the remaining 7 attempts, I select points by optimizing whatever acquisition function I'm trying to test. Hyperparameters for the underlying gaussian process are only optimized between attempts to account for the time constraints in the actual experiment (3-7s window to select the next parameters). I've also fixed the length scale of the kernel based on pilot data and the observation that there was a tendency to overfit if there were outliers.
Below I've included a couple example fits using the GIBBON acquisition function for optimization (one where it worked very well and the other where it got stuck to the boundaries):
It's pretty clear from the second figure that the acquisition function is exploring very poorly in some conditions.
Things I've tried already:
Any suggestions would be greatly appreciated and I can provide more information if necessary!
Beta Was this translation helpful? Give feedback.
All reactions