-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Classification Model as Output Constraint #725
Comments
Hi @jduerholt, great question. Using a Probit classification model with a Bernoulli likelihood for actual classification should work in this context. You're right that since such model maps the latent function directly to a probability of success, one should be able to use the same approach as with the existing constrained formulation, but using this success probability directly (here it is: https://github.com/cornellius-gp/gpytorch/blob/master/gpytorch/likelihoods/bernoulli_likelihood.py#L26). We could probably just use the vanilla GP Regression model from https://github.com/cornellius-gp/gpytorch/blob/master/examples/04_Variational_and_Approximate_GPs/Non_Gaussian_Likelihoods.ipynb to prototype something (note that this uses variational inference). The main reason we haven't tried this out is that we so far didn't run into any concrete use cases in our work, but it would be interesting to try this out. I recall reading a paper that did this within the last year or two, but I can't find it right now. An interesting question is whether one would want to use a multi-task model for modeling the latent constraint function and the objective(s), the rationale being that performance and feasibility may be quite correlated. But that's a refinement and not necessary as a first step. |
Hi @Balandat, thanks for the quick response. A prototype based on a single task model would be sufficient for the start, if this works properly one could extend it to a multi-task model, which could boost performance again. Would you do the prototyping, or is it too far from the typical Facebook use cases? Best, Johannes |
I can throw something together in a notebook on some toy problem, kind of curious myself how this works. Might take a bit to get this cleaned up and make it conveniently usable though. |
Made some initial strides, making a BoTorch model based on the simple classification GP is easy, hooking things up into the model API will require a bit of thinking since it wasn't initially designed for classification models. We can probably (i) implement a |
Hi @Balandat, this sounds like a plan. Modelling outcome and feasibility independently looks like the key for me. Another possibility for cetain cases could be to X values in the objective to run them through a simple classifier (like logistic regression) at this point. But this violates somehow the way botorch is designed. Thanks for your efforts. If we get it running properly this could be from my perspective also something publisheable since I have not found so many things on this topic. Best, Johannes |
wondering if @bletham or @mshvartsman have any thoughts here, as they were working on similar / related things. |
Yeah ultimately in constrained EI the constraint shows up in the acquisition function as Probability(x is feasible). In the usual case where constraints are continuous valued and observed this probability can be computed from a GP, but it could just as easily be a probability from a logistic regression. The challenge is just differentiability and being able to backprop through that Prob(feasible). My only comment would be that we've done something along these lines with a GP classifier, in which a GP is used to model the latent probability and then it gets a Bernoulli likelihood on top. And that doesn't work well in a lot of practical settings because the GP will assume smoothness in the latent fail/success probabilities, and often in real problems there are in reality hard edges where a small changes in parameters can take it from failing every time to succeeding every time. So something like an SVM or RandomForest classifier tends to work better for those types of problems. But then you can't get the backprop inside Botorch. |
@jduerholt, curious if you have any intuition as to whether in your case there is reason to expect / not expect smoothness in the probability of feasibility. |
Thanks for the comments @bletham. @Balandat, I will take some time next week to investigate which type of classifier fits best to the problem and if one can expect smoothness. When I have the answer, I come back to you and we can discuss what is then the best solution. Have a nice weekend! Best, Johannes |
I probably don't have as much experience as @bletham with constrained BO of this flavor, so consider all of this with a grain of salt, but:
I'd probably try getting the differentiable GP classifier to work first even if you have doubts about smoothness, before going down the harder path of non-differentiable constraint modeling. |
The issue we ran into when trying to do something very similar to this was that a sharp transition from p=0 to p=1 (always fails to always succeeds) forced the GP to fit a very short lengthscale, which then meant it couldn't generalize. But I agree that there are GP formulations (perhaps with a non-stationary kernel) that probably could have worked. |
Good point! The setting where we've seen reasonable behavior wasn't stationary, strictly speaking. Non-stationary kernel or input warping seems like it might be useful if one were to try this. |
Hi all, I was also looking for handling experiment failures and came across this issue. |
Yeah we're working on implementing this with a new probabilistic classification model. I don't have a timeline for open source for this yet, but hopefully we have something preliminary ready in a couple of months or so. If you need something quicker it may make sense to try to hook something like this up: https://github.com/cornellius-gp/gpytorch/blob/master/examples/04_Variational_and_Approximate_GPs/PolyaGamma_Binary_Classification.ipynb |
Look forward to it... in the meantime I ll look into the example provided |
Same here, looking forward to it. This will be very useful for some use cases ;) |
Coming back to this issue after a long time. Let's assume that we have an arbitrary differentiable classification model An example: From my perspective, this is a lot of overhead for an indentity operation, as we directly want to use the probability cc @gmancino |
@Balandat: any thoughts on this? |
Sorry for the delay, busy times...
Hmm I am not sure I understand this part. Why do you need to invert the sigmoid operation? |
Hmm, perhaps this was unclear. Let me try again ;) The desired probability In our case, I have already this probability and do not need any sigmoid operation. For this reason, one can transform the desired probability Hope, this makes it clearer ... |
I think so? I guess my first idea here would be to just extract the latent (un-sigmoidified) model prediction and pass that in as the outcome together with the transformed bound. One way of doing that could be to just wrap the model. I guess I'm not sure what classification model exactly you're using if you have a pointer that would be helpful. |
Hey all, I am quite interested in this topic as I am working on wet-lab experimentation. Following your idea I've managed to draft something using GPyTorch's approximate GP and extracting the likelihood and convert it back to probability by inversing it with sigmoidal transformation. After that I just feed this probability to the contraint argument of the acqf. I wanted to contribute to BoTorch community but it's a first time thing for me, I was wondering if I can send a pull request to add a tutorial on this topic, following the style of existing ones? (or should i email the moderator first?) Thank you :) |
Hi @FrankWanger. Thanks for your interest. Feel free to submit a PR adding this under notebooks_community. |
@FrankWanger: maybe this paper from ICML24 could be also of interest for you; https://arxiv.org/abs/2402.07692. |
Fab! Will add ref to it. Thanks ;) |
We implemented also the following solution to be able to directly use the probabilities for the individual classes (after a softmax) in a sigmoid based output constraint by implementing a function that is multiplied with the probabilities which is then giving again the probablities when it is multiplied with the sigmoid transform, as a result you are multiplying the probabilities with an identity operation within the botorch output constraint. Have a look here: https://github.com/experimental-design/bofire/blob/4d6d6c6fb3b24b040dda5f4884b3f1e84b5cf76c/bofire/utils/torch_tools.py#L302 and here https://github.com/experimental-design/bofire/blob/4d6d6c6fb3b24b040dda5f4884b3f1e84b5cf76c/tests/bofire/utils/test_torch_tools.py#L1084 |
…model (#2700) Summary: ## Motivation There is no current tutorial on using the probability pulled from classification results as constraints in acquisition functions. And such an application holds strong interest from BO guided laboratory experimentation. A prior discussion (#725) was formed. ### Have you read the [Contributing Guidelines on pull requests](https://github.com/pytorch/botorch/blob/main/CONTRIBUTING.md#pull-requests)? Yes Pull Request resolved: #2700 Test Plan: In the present tutorial we show how to deal with feasibility constraints that are observed alongside the optimization process (referred to as 'outcome constriants' in BoTorch document, or sometimes as 'black-box constraints'). More specifically, the feasibility is modelled by a classification model, followed by feeding this learned probability to the acquisition funtion through the `constraint` argument in `SampleReducingMCAcquisitionFunction`. Namely, this is achieved through re-weighting the acquisition function by $\alpha_{\text{acqf-con}}=\mathbb{P}(\text{Constraint satisfied})*\alpha_{\text{acqf}}$. To achieve this, the pulled probability of classification model underwent un-sigmoid function and was inversed to fit into the API (as negative values treated as feasibility). A 2D syntheic problem of [Townsend function](https://www.chebfun.org/examples/opt/ConstrainedOptimization.html) was used. For the classification model, we implemented [approximate GP](https://docs.gpytorch.ai/en/v1.13/examples/04_Variational_and_Approximate_GPs/Non_Gaussian_Likelihoods.html) with a Bernoulli likelihood. `qLogExpectedImprovement` was selected as the aquisition function. Below are the plots of the problem landscape, acquisition function value, constraint probability, and the EI value (before weighting) at different iterations: At iter=1: ![image](https://github.com/user-attachments/assets/de4118dd-9142-438c-acf8-6afcd5459543) At iter=10: ![image](https://github.com/user-attachments/assets/b7abb75d-945f-47e9-8364-4064c8236e4b) At iter=50: ![image](https://github.com/user-attachments/assets/5a147ded-cedf-47aa-9e17-4444c22a0165) The log regret after 50 iterations are plotted against random (sobel). ![image](https://github.com/user-attachments/assets/73e6435a-d19d-48f8-852c-69c9f5e1daa3) All images can be reproduced by the notebook. ## Related PRs not related to any change of functionality Reviewed By: saitcakmak Differential Revision: D68752722 Pulled By: Balandat fbshipit-source-id: 233d5d09ee4b3fe200fe416c90999090bde8e5c0
Dear botorch developers,
I have a question regarding output constraints. So far they are used and implemented in the following way:
Now comes my question:
How to set up a optimization in which a "real" classification model is needed. Imagine achemical experiment in which only certain input combinations lead to a "successful" experiments. Successful in this context means that it is possible to measure the properties that should be optimized. For the failed experiments, no outputs are available, and within the BO loop one wants to propose only experiments that lead to succesful experiments.
I found one paper in this context. They are using an in the loop classifier to judge if a experiment will be succesful or not. I think there should be better solutions using the technique on the output constraints described above. One could for example train directly a classification model using gpytorch and multiply its output with the actual objective.
What are your thoughts on this and have you ever tried something related within botorch?
Best,
Johannes
The text was updated successfully, but these errors were encountered: