When is KroneckerMultiTaskGP theoretically more useful than SingleTaskGP with multiple outputs? #1003

nathanohara · 2021-11-29T21:59:40Z

nathanohara
Nov 29, 2021

Recently, I've been performing multi-objective Bayesian optimization with a multi-output SingleTaskGP as the surrogate. I noticed this tutorial and have been benchmarking KroneckerMultiTaskGP against SingleTaskGP. In my experiments (with different scalarization functions, qParEGO and qEHVI), KroneckerMultiTaskGP has not provided any significant improvements over SingleTaskGP, and often performs poorly in comparison.

Reading the paper linked in the tutorial, I can see how multi-task GPs can provide important additional information for optimization. Showing below a screenshot of one of their figures:

In (c.), the multi-task GP is able to give better predictions on the blue task where (1.) it is unobserved and (2.) the green and red tasks are observed. I see how modeling the task correlations allows for this effect. But with KroneckerMultiTaskGP, this model only takes into account the case where all tasks are observed for each input. So there will never be a setup with KroneckerMultiTaskGP where there is information about some tasks and not others.

Given this context, in what case is KroneckerMultiTaskGP providing an advantage? I'm having trouble thinking of an example where the correlation matrix would lead to more certain predictions of any individual task when all are observed at all the same inputs. Thanks for any insights -- just looking to better understand the model and its use cases. Since KroneckerMultiTaskGP is more computationally intensive, I really only want to utilize it in cases where it's likely to perform best.

Answered by wjmaddox

Nov 30, 2021

Hi, in general, we'd actually expect the KroneckerMultiTaskGP to outperform the batched single task GP as it's a super-class of the batched single task GP (the inter-task covariance matrix is just the identity). So, we'd expect greater sample efficiency in terms of outcome modelling than for single task GPs and so hopefully better BO loops.

In terms of advantages, there's a lot of situations where every outcome is observed at once -- most multi-objective problems and most black-box constrained problems fall into this regime. We actually just published a paper on a bunch of usecases for the KroneckerMultiTaskGP, see here :) .

Edit: If you have some example code where the KroneckerMultitaskGP…

View full answer

wjmaddox · 2021-11-30T00:46:26Z

wjmaddox
Nov 30, 2021

Hi, in general, we'd actually expect the KroneckerMultiTaskGP to outperform the batched single task GP as it's a super-class of the batched single task GP (the inter-task covariance matrix is just the identity). So, we'd expect greater sample efficiency in terms of outcome modelling than for single task GPs and so hopefully better BO loops.

In terms of advantages, there's a lot of situations where every outcome is observed at once -- most multi-objective problems and most black-box constrained problems fall into this regime. We actually just published a paper on a bunch of usecases for the KroneckerMultiTaskGP, see here :) .

Edit: If you have some example code where the KroneckerMultitaskGP is failing, please let me know and I can take a look at it to see if there's any obvious fitting issues.

3 replies

Balandat Nov 30, 2021
Collaborator

in general, we'd actually expect the KroneckerMultiTaskGP to outperform the batched single task GP as it's a super-class of the batched single task GP (the inter-task covariance matrix is just the identity)

So one qualification here: If you use different hyperparameters for each of the models in the batched SingleTaskGP, then KroneckerMultiTaskGP is not really a super-class, since KroneckerMultiTaskGP assumes a shared set of hyperparameters for a single data covariance, whereas the batched SingleTaskGP has the freedom to choose different hyperparameters for the different outputs. So in case the lenghtscales of the individual outputs are quite different between models, then KroneckerMultiTaskGP is at a disadvantage.

Going further, if outputs are observed without noise, then you can actually show that if all outputs are observed at all points an ICM style model (which KroneckerMultiTaskGP is) is equivalent to independent models since transfer between tasks cancels out (assuming shared hyperparameters across models). This is also called "autokrigeability".

So the setting where KroneckerMultiTaskGP matters is if you have (i) correlated outcomes, and (ii) noisy observations. Depending on the level of both (i) and (ii) you may or may not see gains.

There are some other considerations that tilt the scales in terms of scalability towards the KroneckerMultiTaskGP if you have a lot of outputs and you're using a smart sampling strategy to reduce computational complexity (this is the paper linked above). But that really only matters if you have a large number (dozens, hundreds or thousands) of outputs that you're modeling jointly.

nathanohara Nov 30, 2021
Author

Thanks to both for the detailed follow-up! I think I am understanding now -- exploiting the correlations between objectives can lead to more confident predictions when an objective is noisy, taking advantage of the additional information.

The benchmarks I'm running are noiseless, so I think the SingleTaskGP has the advantage due to the separate kernel parameters learned for each objective as Balandat points out.

In noisy scenarios I will definitely consider the KroneckerMultiTaskGP, but I have some concerns about the assumptions of the underlying generative model. Will KroneckerMultiTaskGP degrade performance significantly if there isn't a constant covariance between each objective across the entire input domain? (Noting that there is of course a single covariance term learned between each pair of tasks.) There are some applications where I want to apply MOBO and believe certain objectives may be correlated, but I won't be able to check this assumption of constant covariance without evaluating the black box function several times -- is it safer to run with the SingleTaskGP until I can confirm this, or is this assumption unlikely to severely degrade the performance, considering the potential gains?

Thanks for sharing your expertise, I've really appreciated the pointers as I'm learning about BO.

Balandat Nov 30, 2021
Collaborator

Will KroneckerMultiTaskGP degrade performance significantly if there isn't a constant covariance between each objective across the entire input domain?

Yeah that could be the case, but it really depends on the problem. One thing you can do is either collect some random data or start off with the batched SingleTaskGP as you said and then compare the fit of SingleTaskGP and KroneckerMultiTaskGP (e.g. by doing cross validation).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When is KroneckerMultiTaskGP theoretically more useful than SingleTaskGP with multiple outputs? #1003

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

When is KroneckerMultiTaskGP theoretically more useful than SingleTaskGP with multiple outputs? #1003

nathanohara Nov 29, 2021

Replies: 1 comment · 3 replies

wjmaddox Nov 30, 2021

Balandat Nov 30, 2021 Collaborator

nathanohara Nov 30, 2021 Author

Balandat Nov 30, 2021 Collaborator

nathanohara
Nov 29, 2021

Replies: 1 comment 3 replies

wjmaddox
Nov 30, 2021

Balandat Nov 30, 2021
Collaborator

nathanohara Nov 30, 2021
Author

Balandat Nov 30, 2021
Collaborator