Performance & runtime improvements to info-theoretic acquisition functions (2/N) - AcqOpt initializer #2751

hvarfner · 2025-02-20T06:47:37Z

A series of improvements directed towards improving the performance of PES & JES, as well as their MultiObj counterparts.

This PR adds an initializer for the acquisition function optimization, which drastically speeds up the number of required forward passes from ~150-250 --> ~25 by providing suggestions close to the sampled optima obtained during acquisition function construction.

@esantorella
Moreover, better acquisition function values are found (PR 1's BO loop, but both acq opts are run in parallel):

Moreover, it is a lot faster:

This does not always improve performance, however (PR1 is more local due to sample_around_best dominating candidate generation, which is generally good):

Lastly, a nice comp to LogNEI with the introduced mods:

Moreover, they are now much closer in terms of runtime:

And here's the allocation between posterior sampling time and acq optimization time.

So apart from Michalewicz, it does pretty good now!

Related PRs

Previous one

codecov · 2025-02-20T17:02:12Z

Codecov Report

Attention: Patch coverage is 94.52055% with 4 lines in your changes missing coverage. Please review.

Project coverage is 99.97%. Comparing base (9a7c517) to head (ed81a46).

Files with missing lines	Patch %	Lines
botorch/optim/initializers.py	95.71%	3 Missing ⚠️
botorch/optim/optimize.py	66.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2751      +/-   ##
==========================================
- Coverage   99.99%   99.97%   -0.03%     
==========================================
  Files         203      203              
  Lines       18690    18726      +36     
==========================================
+ Hits        18689    18721      +32     
- Misses          1        5       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

code, reshuffling of other sampling methods (that don't take an acqf)

Co-authored-by: Elizabeth Santorella <[email protected]>

facebook-github-bot · 2025-02-25T16:49:32Z

@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

improve performance and runtime of PES/JES

facebook-github-bot · 2025-02-25T23:24:37Z

@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

esantorella · 2025-02-25T23:32:50Z

botorch/optim/initializers.py

+    options: dict[str, bool | float | int] | None = None,
+    inequality_constraints: list[tuple[Tensor, Tensor, float]] | None = None,
+    equality_constraints: list[tuple[Tensor, Tensor, float]] | None = None,
+):
+    options = options or {}
+    device = bounds.device
+    if not hasattr(acq_function, "optimal_inputs"):
+        raise AttributeError(
+            "gen_optimal_input_initial_conditions can only be used with "
+            "an AcquisitionFunction that has an optimal_inputs attribute."
+        )
+    frac_random: float = options.get("frac_random", 0.0)
+    if not 0 <= frac_random <= 1:
+        raise ValueError(
+            f"frac_random must take on values in (0,1). Value: {frac_random}"
+        )
+
+    batch_limit = options.get("batch_limit")
+    num_optima = acq_function.optimal_inputs.shape[:-1].numel()
+    suggestions = acq_function.optimal_inputs.reshape(num_optima, -1)
+    X = torch.empty(0, q, bounds.shape[1], dtype=bounds.dtype)
+    num_random = round(raw_samples * frac_random)
+    if num_random > 0:
+        X_rnd = sample_q_batches_from_polytope(
+            n=num_random,
+            q=q,
+            bounds=bounds,
+            n_burnin=options.get("n_burnin", 10000),
+            n_thinning=options.get("n_thinning", 32),
+            equality_constraints=equality_constraints,
+            inequality_constraints=inequality_constraints,
+        )
+        X = torch.cat((X, X_rnd))
+
+    if num_random < raw_samples:
+        X_perturbed = sample_points_around_best(
+            acq_function=acq_function,
+            n_discrete_points=q * (raw_samples - num_random),
+            sigma=options.get("sample_around_best_sigma", 1e-2),
+            bounds=bounds,
+            best_X=suggestions,
+        )
+        X_perturbed = X_perturbed.view(
+            raw_samples - num_random, q, bounds.shape[-1]
+        ).cpu()
+        X = torch.cat((X, X_perturbed))
+
+    if options.get("sample_around_best", False):
+        X_best = sample_points_around_best(
+            acq_function=acq_function,
+            n_discrete_points=q * raw_samples,
+            sigma=options.get("sample_around_best_sigma", 1e-2),
+            bounds=bounds,
+        )
+        X_best = X_best.view(raw_samples, q, bounds.shape[-1]).cpu()
+        X = torch.cat((X, X_best))
+
+    with torch.no_grad():
+        if batch_limit is None:
+            batch_limit = X.shape[0]
+        # Evaluate the acquisition function on `X_rnd` using `batch_limit`
+        # sized chunks.
+        acq_vals = torch.cat(
+            [
+                acq_function(x_.to(device=device)).cpu()
+                for x_ in X.split(split_size=batch_limit, dim=0)
+            ],
+            dim=0,
+        )
+    idx = boltzmann_sample(
+        function_values=acq_vals,
+        num_samples=num_restarts,
+        eta=options.get("eta", 2.0),


By passing these individually rather than as a dict, we help static analysis tools (and people) see that the code isn't obviously wrong, and prevent unused options from being passed and silently dropped. That can be especially helpful in guarding against typos or when refactoring.

You could then update the call sites to pass **options instead of options -- personally I'd pass them individually everywhere, but it may be a matter of taste.

Suggested change

options: dict[str, bool | float | int] | None = None,

inequality_constraints: list[tuple[Tensor, Tensor, float]] | None = None,

equality_constraints: list[tuple[Tensor, Tensor, float]] | None = None,

):

options = options or {}

device = bounds.device

if not hasattr(acq_function, "optimal_inputs"):

raise AttributeError(

"gen_optimal_input_initial_conditions can only be used with "

"an AcquisitionFunction that has an optimal_inputs attribute."

)

frac_random: float = options.get("frac_random", 0.0)

if not 0 <= frac_random <= 1:

raise ValueError(

f"frac_random must take on values in (0,1). Value: {frac_random}"

)

batch_limit = options.get("batch_limit")

num_optima = acq_function.optimal_inputs.shape[:-1].numel()

suggestions = acq_function.optimal_inputs.reshape(num_optima, -1)

X = torch.empty(0, q, bounds.shape[1], dtype=bounds.dtype)

num_random = round(raw_samples * frac_random)

if num_random > 0:

X_rnd = sample_q_batches_from_polytope(

n=num_random,

q=q,

bounds=bounds,

n_burnin=options.get("n_burnin", 10000),

n_thinning=options.get("n_thinning", 32),

equality_constraints=equality_constraints,

inequality_constraints=inequality_constraints,

)

X = torch.cat((X, X_rnd))

if num_random < raw_samples:

X_perturbed = sample_points_around_best(

acq_function=acq_function,

n_discrete_points=q * (raw_samples - num_random),

sigma=options.get("sample_around_best_sigma", 1e-2),

bounds=bounds,

best_X=suggestions,

)

X_perturbed = X_perturbed.view(

raw_samples - num_random, q, bounds.shape[-1]

).cpu()

X = torch.cat((X, X_perturbed))

if options.get("sample_around_best", False):

X_best = sample_points_around_best(

acq_function=acq_function,

n_discrete_points=q * raw_samples,

sigma=options.get("sample_around_best_sigma", 1e-2),

bounds=bounds,

)

X_best = X_best.view(raw_samples, q, bounds.shape[-1]).cpu()

X = torch.cat((X, X_best))

with torch.no_grad():

if batch_limit is None:

batch_limit = X.shape[0]

# Evaluate the acquisition function on `X_rnd` using `batch_limit`

# sized chunks.

acq_vals = torch.cat(

[

acq_function(x_.to(device=device)).cpu()

for x_ in X.split(split_size=batch_limit, dim=0)

],

dim=0,

)

idx = boltzmann_sample(

function_values=acq_vals,

num_samples=num_restarts,

eta=options.get("eta", 2.0),

frac_random: float = 0.0,

batch_limit: int | None = None,

n_burnin: int = 10000,

n_thinning: int = 32,

sample_around_best: bool = False,

sample_around_best_sigma: float = 1e-2,

eta: float = 2.0,

inequality_constraints: list[tuple[Tensor, Tensor, float]] | None = None,

equality_constraints: list[tuple[Tensor, Tensor, float]] | None = None,

):

options = options or {}

device = bounds.device

if not hasattr(acq_function, "optimal_inputs"):

raise AttributeError(

"gen_optimal_input_initial_conditions can only be used with "

"an AcquisitionFunction that has an optimal_inputs attribute."

)

frac_random: float = options.get("frac_random", 0.0)

if not 0 <= frac_random <= 1:

raise ValueError(

f"frac_random must take on values in (0,1). Value: {frac_random}"

)

batch_limit = options.get("batch_limit")

num_optima = acq_function.optimal_inputs.shape[:-1].numel()

suggestions = acq_function.optimal_inputs.reshape(num_optima, -1)

X = torch.empty(0, q, bounds.shape[1], dtype=bounds.dtype)

num_random = round(raw_samples * frac_random)

if num_random > 0:

X_rnd = sample_q_batches_from_polytope(

n=num_random,

q=q,

bounds=bounds,

n_burnin=options.get("n_burnin", 10000),

n_thinning=options.get("n_thinning", 32),

equality_constraints=equality_constraints,

inequality_constraints=inequality_constraints,

)

X = torch.cat((X, X_rnd))

if num_random < raw_samples:

X_perturbed = sample_points_around_best(

acq_function=acq_function,

n_discrete_points=q * (raw_samples - num_random),

sigma=options.get("sample_around_best_sigma", 1e-2),

bounds=bounds,

best_X=suggestions,

)

X_perturbed = X_perturbed.view(

raw_samples - num_random, q, bounds.shape[-1]

).cpu()

X = torch.cat((X, X_perturbed))

if options.get("sample_around_best", False):

X_best = sample_points_around_best(

acq_function=acq_function,

n_discrete_points=q * raw_samples,

sigma=options.get("sample_around_best_sigma", 1e-2),

bounds=bounds,

)

X_best = X_best.view(raw_samples, q, bounds.shape[-1]).cpu()

X = torch.cat((X, X_best))

with torch.no_grad():

if batch_limit is None:

batch_limit = X.shape[0]

# Evaluate the acquisition function on `X_rnd` using `batch_limit`

# sized chunks.

acq_vals = torch.cat(

[

acq_function(x_.to(device=device)).cpu()

for x_ in X.split(split_size=batch_limit, dim=0)

],

dim=0,

)

idx = boltzmann_sample(

function_values=acq_vals,

num_samples=num_restarts,

eta=options.get("eta", 2.0),

Sure, I'm okay doing it this way! It just seems that changing ic_generator alone wouldn't suffice if one (for some reason) wanted to change between them since they would be inconsistent?

esantorella · 2025-02-25T23:37:30Z

botorch/optim/initializers.py

@@ -468,6 +468,91 @@ def gen_batch_initial_conditions(
    return batch_initial_conditions


+def gen_optimal_input_initial_conditions(


Could you add a docstring explaining the behavior of this function?

esantorella · 2025-02-25T23:38:11Z

botorch/optim/initializers.py

+        X = torch.cat((X, X_rnd))
+
+    if num_random < raw_samples:
+        X_perturbed = sample_points_around_best(


It's a bit nonintuitive that we do this even when sample_around_best is False, no?

Possibly! My though was, since it is not actually sampling around the incumbent but around the sampled optima, I could keep it and re-use its logic. I tried to mimic the KG logic for it, and that uses frac_random for a similar reason.

esantorella · 2025-02-25T23:44:38Z

Thanks for this! I'm looking forward to seeing the plots.

…st coverage

facebook-github-bot · 2025-02-26T18:11:31Z

@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

hvarfner · 2025-02-26T21:07:45Z

I have not quite figured out why the test coverage is not there, since I thought I addressed it today. I will also figure out the conflicts ASAP!

facebook-github-bot · 2025-02-26T21:29:47Z

@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Feb 20, 2025

hvarfner force-pushed the es_initializer branch from 75bf7a0 to ed81a46 Compare February 20, 2025 14:18

hvarfner force-pushed the es_initializer branch 4 times, most recently from 938d9be to f2db5ac Compare February 21, 2025 13:25

hvarfner changed the title ~~Performance & runtime improvements to info-theoretic acquisition functions (1/N) - AcqOpt initializer~~ Performance & runtime improvements to info-theoretic acquisition functions (2/N) - AcqOpt initializer Feb 21, 2025

hvarfner force-pushed the es_initializer branch from f2db5ac to 24781e9 Compare February 21, 2025 15:23

hvarfner and others added 4 commits February 25, 2025 10:26

Boltzmann sampling function added in utils/sampling to remove duplicate

541cb68

code, reshuffling of other sampling methods (that don't take an acqf)

Update botorch/utils/sampling.py

3ec3cff

Co-authored-by: Elizabeth Santorella <[email protected]>

Update botorch/utils/sampling.py

1cdd777

Update botorch/utils/sampling.py

6995223

hvarfner force-pushed the es_initializer branch from 24781e9 to c157b57 Compare February 25, 2025 10:42

Improvements of get_optimal_samples and optimize_posterior_samples to

9d39469

improve performance and runtime of PES/JES

hvarfner force-pushed the es_initializer branch from c157b57 to 211f79b Compare February 25, 2025 20:12

esantorella reviewed Feb 25, 2025

View reviewed changes

Added entropy search acquisition function initializer, ensured unitte…

1ec824c

…st coverage

hvarfner force-pushed the es_initializer branch from 211f79b to 1ec824c Compare February 26, 2025 11:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance & runtime improvements to info-theoretic acquisition functions (2/N) - AcqOpt initializer #2751

Performance & runtime improvements to info-theoretic acquisition functions (2/N) - AcqOpt initializer #2751

hvarfner commented Feb 20, 2025 •

edited

Loading

codecov bot commented Feb 20, 2025

facebook-github-bot commented Feb 25, 2025

facebook-github-bot commented Feb 25, 2025

esantorella Feb 25, 2025 •

edited

Loading

hvarfner Feb 26, 2025

esantorella Feb 25, 2025

hvarfner Feb 26, 2025

esantorella Feb 25, 2025

hvarfner Feb 26, 2025

esantorella commented Feb 25, 2025

facebook-github-bot commented Feb 26, 2025

hvarfner commented Feb 26, 2025

facebook-github-bot commented Feb 26, 2025

		@@ -468,6 +468,91 @@ def gen_batch_initial_conditions(
		return batch_initial_conditions


		def gen_optimal_input_initial_conditions(

Performance & runtime improvements to info-theoretic acquisition functions (2/N) - AcqOpt initializer #2751

Are you sure you want to change the base?

Performance & runtime improvements to info-theoretic acquisition functions (2/N) - AcqOpt initializer #2751

Conversation

hvarfner commented Feb 20, 2025 • edited Loading

Related PRs

codecov bot commented Feb 20, 2025

Codecov Report

facebook-github-bot commented Feb 25, 2025

facebook-github-bot commented Feb 25, 2025

esantorella Feb 25, 2025 • edited Loading

Choose a reason for hiding this comment

hvarfner Feb 26, 2025

Choose a reason for hiding this comment

esantorella Feb 25, 2025

Choose a reason for hiding this comment

hvarfner Feb 26, 2025

Choose a reason for hiding this comment

esantorella Feb 25, 2025

Choose a reason for hiding this comment

hvarfner Feb 26, 2025

Choose a reason for hiding this comment

esantorella commented Feb 25, 2025

facebook-github-bot commented Feb 26, 2025

hvarfner commented Feb 26, 2025

facebook-github-bot commented Feb 26, 2025

hvarfner commented Feb 20, 2025 •

edited

Loading

esantorella Feb 25, 2025 •

edited

Loading