-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a tutorial on BO constrained by probability of classification model #2700
Conversation
Hi @FrankWanger! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for putting this up, this is great.
My main comment (see inline) is on how to leverage the probability of feasibility produced by the classification model directly, rather than converting it twice, but that would require some changes to botorch itself.
Other than that mostly cosmetic comments.
I see that the method finds what appears to be the optimum very quickly - is this consistent across runs? If so it may make sense to reduce the number of iterations somewhat to cut down the runtime of the tutorial.
"$$ \n", | ||
"where $t = \\arctan\\left(\\frac{x_1}{x_2}\\right)$\n", | ||
"\n", | ||
"Here, we follow a natural representation where $y_{\\text{con}}=1$ indicates a feasible condition. We will train a classification model to predict the feasibility of the point. Note that in BoTorch's implementation, **negative values** indicate feasibility, thus we need to do conversion later when feeding feasibility into the pipeline.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Here, we follow a natural representation where $y_{\\text{con}}=1$ indicates a feasible condition. We will train a classification model to predict the feasibility of the point. Note that in BoTorch's implementation, **negative values** indicate feasibility, thus we need to do conversion later when feeding feasibility into the pipeline.\n", | |
"Here, we follow a natural representation where $y_{\\text{con}}=1$ indicates a feasible condition. We will train a classification model to predict the feasibility of the point. Note that in BoTorch's implementation, **negative values** indicate feasibility, thus we need to do conversion later when feeding feasibility into the pipeline.\n", | |
"Note that we essentially 'throw away' information contained in the value of $y_{\\text{con}}$ by applying a binary mask - this is for illustration purposes as part of this tutorial, in a real-world application we would model the numerical value of $y_{\\text{con}}$ direction and apply the constraint $y_{\\text{con}}>01$ as part of the optimization.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit confusing here that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed I have realised the problem of notation. I wanted to add that in many situations in experiments the numerical value of the constraint is not directly observable. So what we have as the data is only binary outcomes of success or failure - and yes here we applied this binary mask to our synthetic problem to throw away information so that we can simulate what we obtain in lab.
"def pass_con_unsigmoid(Z, model_con, X=None):\n", | ||
" '''\n", | ||
" pass the constraint to the acquisition function\n", | ||
"\n", | ||
" Note: Botorch does sigmoid transformation for the constraint by default, \n", | ||
" therefore we need to unsigmoid our probability (0-1) to (-inf,inf)\n", | ||
" also we need to invert the probability, where -inf means the constraint is satisfied. Finally,we add 1e-8 to avoid log(0).\n", | ||
" '''\n", | ||
" y_con = Z[...,1] #get the constraint\n", | ||
"\n", | ||
" prob = model_con.likelihood(y_con).probs #obtain the probability of y_con(when constraint satisfied)\n", | ||
" prob_unsigmoid_neg = torch.log(1-prob+1e-8)-torch.log(prob+1e-8) #unsigmoid the probability and invert it to adapt to BoTorch's constraint API\n", | ||
" \n", | ||
" return prob_unsigmoid_neg\n" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the classification model already produces the probabilities of feasibility, it would be great if we could directly use that in the acquisition function, rather than converting it back first. @SebastianAment do you see any major challenges to just accept an additional "probability_of_feasibility" argument to SampleReducingMCAcquisitionFunction
(and possibly in other places) and then just use that in the probability weighting?
Even if there are no issues, getting such a change into botorch would require some eng work so I wouldn't want to block this PR on that. That said, the probability of feasibility conversion is not a standard sigmoid though internally, see https://github.com/pytorch/botorch/blob/main/botorch/utils/objective.py#L178 - ideally for the time being (until we can accept the probability directly) we could apply the actual inverse of what is being applied in botorch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you see any major challenges to just accept an additional "probability_of_feasibility" argument to SampleReducingMCAcquisitionFunction (and possibly in other places) and then just use that in the probability weighting?
That should be pretty straightforward, mainly taking care of appropriate reshaping, since we are usually applying the feasibility weighting on a per-sample basis, and probability_of_feasibility
won't share the MC dimension.
Regarding the inversion of the sigmoid, we are currently using a sigmoid with inverse quadratic asymptotic behavior, which could likely be inverted analytically as well, but that will not be necessary once we support this in the acquisition function directly.
Co-authored-by: Max Balandat <[email protected]>
Co-authored-by: Max Balandat <[email protected]>
Co-authored-by: Max Balandat <[email protected]>
Co-authored-by: Max Balandat <[email protected]>
Co-authored-by: Max Balandat <[email protected]>
Thank you so much! I've addressed most of the formatting issues, there is only one that I am not sure how to remove - the KeOps warnings. I've switched to macOS and it did not help. In terms of the results, yes I can see that it is quite consistent so I have halved the iterations to 25 and slighted added the freq of plots. |
Great. I may just manually strip the output from the notebook source to keep it clean. I'll get this merged in since it's in great shape already, but still curious to hear @SebastianAment's thoughts on supporting this better in the acquisition functions themselves (which would be a separate PR anyway). |
@Balandat has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2700 +/- ##
=======================================
Coverage 99.98% 99.98%
=======================================
Files 202 202
Lines 18588 18588
=======================================
Hits 18586 18586
Misses 2 2 ☔ View full report in Codecov by Sentry. |
Motivation
There is no current tutorial on using the probability pulled from classification results as constraints in acquisition functions. And such an application holds strong interest from BO guided laboratory experimentation. A prior discussion (#725) was formed.
Have you read the Contributing Guidelines on pull requests?
Yes
Test Plan
In the present tutorial we show how to deal with feasibility constraints that are observed alongside the optimization process (referred to as 'outcome constriants' in BoTorch document, or sometimes as 'black-box constraints'). More specifically, the feasibility is modelled by a classification model, followed by feeding this learned probability to the acquisition funtion through the$\alpha_{\text{acqf-con}}=\mathbb{P}(\text{Constraint satisfied})*\alpha_{\text{acqf}}$ . To achieve this, the pulled probability of classification model underwent un-sigmoid function and was inversed to fit into the API (as negative values treated as feasibility).
constraint
argument inSampleReducingMCAcquisitionFunction
. Namely, this is achieved through re-weighting the acquisition function byA 2D syntheic problem of Townsend function was used. For the classification model, we implemented approximate GP with a Bernoulli likelihood.
qLogExpectedImprovement
was selected as the aquisition function.Below are the plots of the problem landscape, acquisition function value, constraint probability, and the EI value (before weighting) at different iterations:
At iter=1:
At iter=10:
At iter=50:
The log regret after 50 iterations are plotted against random (sobel).
All images can be reproduced by the notebook.
Related PRs
not related to any change of functionality