Classification Model as Output Constraint #725

jduerholt · 2021-03-01T09:59:33Z

Dear botorch developers,

I have a question regarding output constraints. So far they are used and implemented in the following way:

There is a property which should be larger than a user provided threshold.
A GP regression model is build to model the property. The output of the regression model is used as input for a sigmoid function that is chosen in a way that it is 1 above the thresshold.
The actual optimization target is then multiplied by the output of the sigmoid function and this product forms the actual objective function
The combination of regression model and sigmoid can be interpreted as a classification model --> related to logistic regression

Now comes my question:

How to set up a optimization in which a "real" classification model is needed. Imagine achemical experiment in which only certain input combinations lead to a "successful" experiments. Successful in this context means that it is possible to measure the properties that should be optimized. For the failed experiments, no outputs are available, and within the BO loop one wants to propose only experiments that lead to succesful experiments.

I found one paper in this context. They are using an in the loop classifier to judge if a experiment will be succesful or not. I think there should be better solutions using the technique on the output constraints described above. One could for example train directly a classification model using gpytorch and multiply its output with the actual objective.

What are your thoughts on this and have you ever tried something related within botorch?

Best,

Johannes

Balandat · 2021-03-01T15:23:14Z

Hi @jduerholt, great question.

Using a Probit classification model with a Bernoulli likelihood for actual classification should work in this context. You're right that since such model maps the latent function directly to a probability of success, one should be able to use the same approach as with the existing constrained formulation, but using this success probability directly (here it is: https://github.com/cornellius-gp/gpytorch/blob/master/gpytorch/likelihoods/bernoulli_likelihood.py#L26).

We could probably just use the vanilla GP Regression model from https://github.com/cornellius-gp/gpytorch/blob/master/examples/04_Variational_and_Approximate_GPs/Non_Gaussian_Likelihoods.ipynb to prototype something (note that this uses variational inference). The main reason we haven't tried this out is that we so far didn't run into any concrete use cases in our work, but it would be interesting to try this out.

I recall reading a paper that did this within the last year or two, but I can't find it right now.

An interesting question is whether one would want to use a multi-task model for modeling the latent constraint function and the objective(s), the rationale being that performance and feasibility may be quite correlated. But that's a refinement and not necessary as a first step.

jduerholt · 2021-03-02T07:08:00Z

Hi @Balandat,

thanks for the quick response. A prototype based on a single task model would be sufficient for the start, if this works properly one could extend it to a multi-task model, which could boost performance again.

Would you do the prototyping, or is it too far from the typical Facebook use cases?

Best,

Johannes

Balandat · 2021-03-04T02:24:22Z

I can throw something together in a notebook on some toy problem, kind of curious myself how this works. Might take a bit to get this cleaned up and make it conveniently usable though.

Balandat · 2021-03-04T05:52:19Z

Made some initial strides, making a BoTorch model based on the simple classification GP is easy, hooking things up into the model API will require a bit of thinking since it wasn't initially designed for classification models. We can probably (i) implement a Posterior that generically represents probability values, and then (ii) make some abstraction that allows to easily run BO for some outcome model (with standard posterior) subject to one or more constraint models (with the new probability posterior. This should at least cover the case of modeling the outcome and the feasibility independently.

jduerholt · 2021-03-04T07:42:11Z

Hi @Balandat,

this sounds like a plan. Modelling outcome and feasibility independently looks like the key for me.

Another possibility for cetain cases could be to X values in the objective to run them through a simple classifier (like logistic regression) at this point. But this violates somehow the way botorch is designed.

Thanks for your efforts. If we get it running properly this could be from my perspective also something publisheable since I have not found so many things on this topic.

Best,

Johannes

Balandat · 2021-03-04T15:47:57Z

wondering if @bletham or @mshvartsman have any thoughts here, as they were working on similar / related things.

bletham · 2021-03-05T05:18:42Z

Yeah ultimately in constrained EI the constraint shows up in the acquisition function as Probability(x is feasible). In the usual case where constraints are continuous valued and observed this probability can be computed from a GP, but it could just as easily be a probability from a logistic regression. The challenge is just differentiability and being able to backprop through that Prob(feasible).

My only comment would be that we've done something along these lines with a GP classifier, in which a GP is used to model the latent probability and then it gets a Bernoulli likelihood on top. And that doesn't work well in a lot of practical settings because the GP will assume smoothness in the latent fail/success probabilities, and often in real problems there are in reality hard edges where a small changes in parameters can take it from failing every time to succeeding every time. So something like an SVM or RandomForest classifier tends to work better for those types of problems. But then you can't get the backprop inside Botorch.

Balandat · 2021-03-05T05:45:16Z

@jduerholt, curious if you have any intuition as to whether in your case there is reason to expect / not expect smoothness in the probability of feasibility.

jduerholt · 2021-03-05T06:47:24Z

Thanks for the comments @bletham. @Balandat, I will take some time next week to investigate which type of classifier fits best to the problem and if one can expect smoothness. When I have the answer, I come back to you and we can discuss what is then the best solution.

Have a nice weekend!

Best,

Johannes

mshvartsman · 2021-03-07T00:39:42Z

I probably don't have as much experience as @bletham with constrained BO of this flavor, so consider all of this with a grain of salt, but:

You don't have to use an RBF kernel, right? Something like Matern might be more reasonable if you're not expecting as much smoothness.
I think that while formally you may have smoothness in f(x), with some creativity with the outputscale prior you may be able to get pretty sharp cutpoints numerically on the probability phi(f(x)). In the limit where the scale on f(x) is very large, phi(f(x)) will always be 0 or 1. You could also try other transformations before the probit to really jam the probability towards 0 and 1 faster if you think that's necessary.

I'd probably try getting the differentiable GP classifier to work first even if you have doubts about smoothness, before going down the harder path of non-differentiable constraint modeling.

bletham · 2021-03-08T17:42:18Z

The issue we ran into when trying to do something very similar to this was that a sharp transition from p=0 to p=1 (always fails to always succeeds) forced the GP to fit a very short lengthscale, which then meant it couldn't generalize. But I agree that there are GP formulations (perhaps with a non-stationary kernel) that probably could have worked.

mshvartsman · 2021-03-08T18:04:19Z

Good point! The setting where we've seen reasonable behavior wasn't stationary, strictly speaking. Non-stationary kernel or input warping seems like it might be useful if one were to try this.

Balandat · 2021-12-08T22:24:43Z

Adding @dme65 and @jtwilson who have been making strides towards this recently.

JoseValenR · 2022-01-21T01:19:51Z

Hi all, I was also looking for handling experiment failures and came across this issue.
Is there any plan for implementing it?

Balandat · 2022-01-21T02:10:06Z

Yeah we're working on implementing this with a new probabilistic classification model. I don't have a timeline for open source for this yet, but hopefully we have something preliminary ready in a couple of months or so.

If you need something quicker it may make sense to try to hook something like this up: https://github.com/cornellius-gp/gpytorch/blob/master/examples/04_Variational_and_Approximate_GPs/PolyaGamma_Binary_Classification.ipynb

JoseValenR · 2022-01-21T02:15:05Z

Look forward to it... in the meantime I ll look into the example provided

jduerholt · 2022-01-21T11:00:26Z

Same here, looking forward to it. This will be very useful for some use cases ;)

jduerholt · 2023-12-12T20:57:50Z

Coming back to this issue after a long time.

Let's assume that we have an arbitrary differentiable classification model $g$, which returns as function of $x$ a vector of class probabilities (which sum up to 1). Furthermore we have a bit vector of the same length which encodes the desirable class for the constrained optimization.

An example: $g(x) = [0.8, 0.1, 0.1]$, the desirability matrix is given by $d = [1, 0, 0]$ which means that we only desirable outcome is class 1. Then the overall desirable probability is given by the dot product of the two vectors $p = g(x) d^T$. We want to use this probability directly as an outcome constraint in an optimization. To do so, one has to somehow invert the sigmoid operation, this can be achieved by pushing $p$ through $log(1/p -1)$. When then sigmoid is applied to this term this results in an identitiy operation for 0<p<1. For the edge case of $p$ approaching zero or 1 one needs to introduce some numeric offset which is neglected here.

From my perspective, this is a lot of overhead for an indentity operation, as we directly want to use the probability $p$ for the output constraint. So my question is, do you have an idea how to do this more elegantly or how to implement this indentity constraint handling in the best way directly in botorch?

cc @gmancino

jduerholt · 2023-12-18T13:49:22Z

@Balandat: any thoughts on this?

Balandat · 2023-12-21T06:22:07Z

Sorry for the delay, busy times...

We want to use this probability directly as an outcome constraint in an optimization. To do so, one has to somehow invert the sigmoid operation

Hmm I am not sure I understand this part. Why do you need to invert the sigmoid operation?

jduerholt · 2023-12-21T08:10:32Z

Hmm, perhaps this was unclear. Let me try again ;)

The desired probability $p$ should be directly used as output constraint, because when it is 0 it means the output constraint is not fulfilled and if it is 1 then it is fulfilled, but botorch always put a sigmoid operation upon an output constraint which is totally fine for regression, as the sigmoid transforms the regression problem in some kind of classification problem (as in logistic regression) which classifies the regions in which the constraint is fulfilled and in which it is not fulfilled.

In our case, I have already this probability and do not need any sigmoid operation. For this reason, one can transform the desired probability $p$ as shown above before applying the sigmoid which results then in an identity operation, but with an extra in mathematical operations and at the risk of numerical instabilities.

Hope, this makes it clearer ...

Balandat · 2023-12-26T15:31:15Z

I think so? I guess my first idea here would be to just extract the latent (un-sigmoidified) model prediction and pass that in as the outcome together with the transformed bound. One way of doing that could be to just wrap the model. I guess I'm not sure what classification model exactly you're using if you have a pointer that would be helpful.

FrankWanger · 2025-01-23T18:21:13Z

Hey all, I am quite interested in this topic as I am working on wet-lab experimentation. Following your idea I've managed to draft something using GPyTorch's approximate GP and extracting the likelihood and convert it back to probability by inversing it with sigmoidal transformation. After that I just feed this probability to the contraint argument of the acqf.

I wanted to contribute to BoTorch community but it's a first time thing for me, I was wondering if I can send a pull request to add a tutorial on this topic, following the style of existing ones? (or should i email the moderator first?) Thank you :)

saitcakmak · 2025-01-23T18:33:23Z

Hi @FrankWanger. Thanks for your interest. Feel free to submit a PR adding this under notebooks_community.

jduerholt · 2025-01-24T07:34:12Z

@FrankWanger: maybe this paper from ICML24 could be also of interest for you; https://arxiv.org/abs/2402.07692.

FrankWanger · 2025-01-24T08:25:35Z

@FrankWanger: maybe this paper from ICML24 could be also of interest for you; https://arxiv.org/abs/2402.07692.

Fab! Will add ref to it. Thanks ;)

jduerholt · 2025-01-24T10:11:54Z

We implemented also the following solution to be able to directly use the probabilities for the individual classes (after a softmax) in a sigmoid based output constraint by implementing a function that is multiplied with the probabilities which is then giving again the probablities when it is multiplied with the sigmoid transform, as a result you are multiplying the probabilities with an identity operation within the botorch output constraint. Have a look here: https://github.com/experimental-design/bofire/blob/4d6d6c6fb3b24b040dda5f4884b3f1e84b5cf76c/bofire/utils/torch_tools.py#L302 and here https://github.com/experimental-design/bofire/blob/4d6d6c6fb3b24b040dda5f4884b3f1e84b5cf76c/tests/bofire/utils/test_torch_tools.py#L1084

…model (#2700) Summary: ## Motivation There is no current tutorial on using the probability pulled from classification results as constraints in acquisition functions. And such an application holds strong interest from BO guided laboratory experimentation. A prior discussion (#725) was formed. ### Have you read the [Contributing Guidelines on pull requests](https://github.com/pytorch/botorch/blob/main/CONTRIBUTING.md#pull-requests)? Yes Pull Request resolved: #2700 Test Plan: In the present tutorial we show how to deal with feasibility constraints that are observed alongside the optimization process (referred to as 'outcome constriants' in BoTorch document, or sometimes as 'black-box constraints'). More specifically, the feasibility is modelled by a classification model, followed by feeding this learned probability to the acquisition funtion through the `constraint` argument in `SampleReducingMCAcquisitionFunction`. Namely, this is achieved through re-weighting the acquisition function by $\alpha_{\text{acqf-con}}=\mathbb{P}(\text{Constraint satisfied})*\alpha_{\text{acqf}}$. To achieve this, the pulled probability of classification model underwent un-sigmoid function and was inversed to fit into the API (as negative values treated as feasibility). A 2D syntheic problem of [Townsend function](https://www.chebfun.org/examples/opt/ConstrainedOptimization.html) was used. For the classification model, we implemented [approximate GP](https://docs.gpytorch.ai/en/v1.13/examples/04_Variational_and_Approximate_GPs/Non_Gaussian_Likelihoods.html) with a Bernoulli likelihood. `qLogExpectedImprovement` was selected as the aquisition function. Below are the plots of the problem landscape, acquisition function value, constraint probability, and the EI value (before weighting) at different iterations: At iter=1: ![image](https://github.com/user-attachments/assets/de4118dd-9142-438c-acf8-6afcd5459543) At iter=10: ![image](https://github.com/user-attachments/assets/b7abb75d-945f-47e9-8364-4064c8236e4b) At iter=50: ![image](https://github.com/user-attachments/assets/5a147ded-cedf-47aa-9e17-4444c22a0165) The log regret after 50 iterations are plotted against random (sobel). ![image](https://github.com/user-attachments/assets/73e6435a-d19d-48f8-852c-69c9f5e1daa3) All images can be reproduced by the notebook. ## Related PRs not related to any change of functionality Reviewed By: saitcakmak Differential Revision: D68752722 Pulled By: Balandat fbshipit-source-id: 233d5d09ee4b3fe200fe416c90999090bde8e5c0

esantorella added the enhancement New feature or request label Jan 30, 2023

FrankWanger mentioned this issue Jan 26, 2025

Adding a tutorial on BO constrained by probability of classification model #2700

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classification Model as Output Constraint #725

Classification Model as Output Constraint #725

jduerholt commented Mar 1, 2021

Balandat commented Mar 1, 2021

jduerholt commented Mar 2, 2021

Balandat commented Mar 4, 2021

Balandat commented Mar 4, 2021

jduerholt commented Mar 4, 2021

Balandat commented Mar 4, 2021

bletham commented Mar 5, 2021

Balandat commented Mar 5, 2021

jduerholt commented Mar 5, 2021

mshvartsman commented Mar 7, 2021

bletham commented Mar 8, 2021

mshvartsman commented Mar 8, 2021

Balandat commented Dec 8, 2021

JoseValenR commented Jan 21, 2022

Balandat commented Jan 21, 2022

JoseValenR commented Jan 21, 2022

jduerholt commented Jan 21, 2022

jduerholt commented Dec 12, 2023

jduerholt commented Dec 18, 2023

Balandat commented Dec 21, 2023

jduerholt commented Dec 21, 2023

Balandat commented Dec 26, 2023

FrankWanger commented Jan 23, 2025

saitcakmak commented Jan 23, 2025

jduerholt commented Jan 24, 2025

FrankWanger commented Jan 24, 2025

jduerholt commented Jan 24, 2025

Classification Model as Output Constraint #725

Classification Model as Output Constraint #725

Comments

jduerholt commented Mar 1, 2021

Balandat commented Mar 1, 2021

jduerholt commented Mar 2, 2021

Balandat commented Mar 4, 2021

Balandat commented Mar 4, 2021

jduerholt commented Mar 4, 2021

Balandat commented Mar 4, 2021

bletham commented Mar 5, 2021

Balandat commented Mar 5, 2021

jduerholt commented Mar 5, 2021

mshvartsman commented Mar 7, 2021

bletham commented Mar 8, 2021

mshvartsman commented Mar 8, 2021

Balandat commented Dec 8, 2021

JoseValenR commented Jan 21, 2022

Balandat commented Jan 21, 2022

JoseValenR commented Jan 21, 2022

jduerholt commented Jan 21, 2022

jduerholt commented Dec 12, 2023

jduerholt commented Dec 18, 2023

Balandat commented Dec 21, 2023

jduerholt commented Dec 21, 2023

Balandat commented Dec 26, 2023

FrankWanger commented Jan 23, 2025

saitcakmak commented Jan 23, 2025

jduerholt commented Jan 24, 2025

FrankWanger commented Jan 24, 2025

jduerholt commented Jan 24, 2025