Fit surrogate model on existing population #158

schmoelder · 2024-07-11T08:06:15Z

This PR implements a Surrogate class for fitting GPs on existing data from optimizations.

Supersedes #45

To do

Note, there is some WIP in #152 which will also affect this PR. Should we already rebase onto that branch s.t. we can adapt the corresponding interfaces but risk some friction should there be some more changes upstream?

Fit GP on existing population
Provide method to evaluate objective functions, constraint functions etc.
Validate surrogate model accuracy with (simple) test cases
Test if optimizers converge to true solution using surrogate

Open questions

Alternative surrogate modeling approches

Currently, only GPs are implemented. However, other surrogate models can be envisioned (e.g. ANNs). To improve a more modular architecture, we could subclasses a SurrogateBase.

Follow-up projects

Once this is merged, we can also start working on other features which would improve or apply the surrogate models. Eventually, these should be moved to their own issues / PRs but for now, this is just a collection of ideas.

The interface

Currently, the SurrogateModel class somewhat mimics an OptimizationProblem as it also provides methods for estimating objectives, nonlinear constraints etc. To demonstrate this, compare the OptimizationProblem

sequenceDiagram
    User->>+OptimizationProblem: evaluate_objectives(x)
    OptimizationProblem->>+User: f(x)

with the SurrogateModel

sequenceDiagram
    User->>+SurrogateModel: estimate_objectives(x)
    SurrogateModel->>+User: f*(x)

However, it is important to note that the SurrogateModel will never provide all functionality of the OptimizationProblem, such as specifying variables, constraints etc. Hence, it cannot directly be used as an OptimizationProblem e.g. to interface with an Optimizer.

To me, this means we should rethink the architecture and consider what exactly does the SurrogateModel replace? In the context of an OptimizationProblem, I would say, it actually replaces the evaluation toolchain (that which returns the values x->f/g/m/...).

Consequently, we should consider moving the evaluation toolchain from the OptimizationProblem to its own module (which in the process would also make the OptimizationProblem less of a "god class" and would even allow reusing the toolchain in other places) and introduce an EvaluationInterface.

The architecture would then look something like the following:

sequenceDiagram
    User->>+OptimizationProblem: evaluate_objectives(x)
    OptimizationProblem->>+EvaluationInterface: evaluate(x)
    EvaluationInterface->>+OptimizationProblem: f(x)
    OptimizationProblem->>+User: f(x)

Where the EvaluationInterface is then implemented by both the EvaluationPipeline (i.e. the current "toolchain") and the SurrogateModel:

classDiagram
    class OptimizationProblem {
        evaluate_objectives(np.ndarray): np.ndarray
    }
    OptimizationProblem "1" *-- "1" EvaluationInterface
    
    class EvaluationInterface {
        <<interface>>
        +evaluate(np.ndarray): np.ndarray
    }
    
    SurrogateModel <|-- EvaluationInterface
    EvaluationPipeline <|-- EvaluationInterface

Conditioned optimization problems

One of the original ideas for this projects came from optimization problems where we want to fix the value of one of the variables and then run the optimization to find the best point given this value.

For this purpose, we should implement a ConditionedOptimizationProblem which wraps the original OptimizationProblem and provides an interface where the fixed variables are removed. While this is trivial to implement for bound constrained problems, it becomes potentially more complicated for problems with linear and nonlinear constraints.

Plots

For process design, we are often not really interested in just the optimal point but in the general topology of the parameter space. E.g. we are interested in the contours of regions with a given purity. Finely sampling the parameter space would be very expensive so the idea could be to use a surrogate model for this purpose. See also partial dependence plots and #33.

Previously, there were two interfaces in the `OptimizationProblem` for calling evaluation functions (e.g. objectives): one for evaluating individuals, and one for populations. To simplify the code base, these two methods were now unified. To ensure backward compatibility, a 1D-Array is returned if a single individual is passed to the function.

Co-authored-by: r.jaepel <[email protected]>

Previously, the cadet path set in Cadet(install_path="path") was not inherited into cadet instances created from the run() method.

Fix and extend tests about .calculate_interstitial_rt/velocity

Previously, this was required because CADET-Core was not setting the PATH correctly. Now, this can lead to inconsistent behaviour. Note, this requires CADET>4.4.0

Recently, the option to (not) plot the time axis using minutes was introduced. This commit fixes some methods that were not properly implemented. Moreover, the name of the flag was changed from `use_minutes` to `x_axis_in_minutes` to make clear that only plotting is affected and not other parameter values (e.g. start and end times).

Co-authored-by: daklauss <[email protected]>

Updates and Fixes plot_at_position by adding start, end and x_axis_in_minutes as parameters in solution.py

Co-authored-by: Johannes Schmölder <[email protected]> Co-authored-by: daklauss <[email protected]> Co-authored-by: Lanzrath, Hannah <[email protected]>

Co-authored-by: Johannes Schmölder <[email protected]> Co-authored-by: daklauss <[email protected]>

Co-authored-by: daklauss <[email protected]> Co-authored-by: Johannes Schmölder <[email protected]>

Co-authored-by: Johannes Schmölder <[email protected]> Co-authored-by: Lanzrath, Hannah <[email protected]>

Co-authored-by: daklauss <[email protected]> Co-authored-by: Johannes Schmölder <[email protected]>

Co-authored-by: Johannes Schmölder <[email protected]> Co-authored-by: daklauss <[email protected]> Co-authored-by: Lanzrath, Hannah <[email protected]>

Co-authored-by: Johannes Schmölder <[email protected]> Co-authored-by: Lanzrath, Hannah <[email protected]>

Co-authored-by: Lanzrath, Hannah <[email protected]> Co-authored-by: Johannes Schmölder <[email protected]>

Co-authored-by: Johannes Schmölder <[email protected]>

Use relative imports in tests, to make it compatible with pytest.

Adds create_LWE, a collection of functions to quickly and semi-modularly set up a load-wash-elude process with CADET-Process Co-authored-by: Johannes Schmölder <[email protected]>

Adds tests for the LWE process and simulation results Co-authored-by: Johannes Schmölder <[email protected]>

…mizations

ronald-jaepel and others added 30 commits June 18, 2024 21:13

Add qNParEGO Ax MOO Interface

508983a

Relax tolerance for MOO convergence test

fdb7dcf

Ensure callback dirs are created for final callback

1a4debb

Update docstrings

9720966

Add method to create fraction

83fff8f

Add start and end times to Fraction

8cc02e1

Specify and validate reference type for difference metrics

159b374

Add classes to __all__

11c59f4

Add only_transforms_solution flag

4d7fc10

Only resample if flag is set

a8ed736

Formatting

d20d9cd

Fix typo

52e33bf

Add FractionationReference

a59372a

Add FractionationSSE

8f0faf6

Fix add_concentration_profile (#140)

6bbb6c8

Co-authored-by: r.jaepel <[email protected]>

Always inherit cadet path

12d862b

Previously, the cadet path set in Cadet(install_path="path") was not inherited into cadet instances created from the run() method.

Add method to calculate volumetric flow rate from velocity.

19e5457

Rename .t0 to .calculate_interstitial_rt in Cstr

3356cde

Fix and extend tests about .calculate_interstitial_rt/velocity

Do not modify LD_LIBRARY_PATH when setting install_path

6fd5a44

Previously, this was required because CADET-Core was not setting the PATH correctly. Now, this can lead to inconsistent behaviour. Note, this requires CADET>4.4.0

Adapt to new DG interface in CADET-Core

618fb1f

Freeze discretization attributes

1c9251f

Update docstrings

4f1671b

Fix incorrect declaration in AntiLangmuir as optional

7ddabeb

Change AntiLangmuir coefficient from SizedUnsignedList to SizedList

cdc11b9

Raise exception when adding connections to Inlets or from Outlets

6150667

Co-authored-by: daklauss <[email protected]>

Fix header level in documentation for unit operation

e706472

Fix bug when adding linear constraints

2e6a7e3

Fix plot_at_position

39402fe

Updates and Fixes plot_at_position by adding start, end and x_axis_in_minutes as parameters in solution.py

schmoelder and others added 17 commits August 14, 2024 13:38

Add MCTRecorder

c17b91c

Co-authored-by: Johannes Schmölder <[email protected]> Co-authored-by: daklauss <[email protected]> Co-authored-by: Lanzrath, Hannah <[email protected]>

Add ports to unitOperation

65591fa

Co-authored-by: Johannes Schmölder <[email protected]> Co-authored-by: daklauss <[email protected]>

Add ports to flowSheet

c5c06c7

Co-authored-by: daklauss <[email protected]> Co-authored-by: Johannes Schmölder <[email protected]>

Add ports to process

5ae2b84

Co-authored-by: daklauss <[email protected]> Co-authored-by: Johannes Schmölder <[email protected]>

Add ports to carouselBuilder

29404d1

Co-authored-by: Johannes Schmölder <[email protected]> Co-authored-by: Lanzrath, Hannah <[email protected]>

Add ports to compartmentBuilder

f7774f1

Co-authored-by: daklauss <[email protected]> Co-authored-by: Johannes Schmölder <[email protected]>

Add ports to simulationResults

9f1e689

Co-authored-by: Johannes Schmölder <[email protected]> Co-authored-by: daklauss <[email protected]> Co-authored-by: Lanzrath, Hannah <[email protected]>

Add ports to cadetAdapter

546830c

Co-authored-by: Johannes Schmölder <[email protected]> Co-authored-by: Lanzrath, Hannah <[email protected]>

Add singelton dimension handling to solution

c0babf2

Co-authored-by: Lanzrath, Hannah <[email protected]> Co-authored-by: Johannes Schmölder <[email protected]>

Add method to get information about CADET version

4e8dad5

Co-authored-by: Johannes Schmölder <[email protected]>

Formatting

e1e624f

Update tests to pytest

5fed4a8

Use relative imports in tests, to make it compatible with pytest.

Add create_LWE

29b5066

Adds create_LWE, a collection of functions to quickly and semi-modularly set up a load-wash-elude process with CADET-Process Co-authored-by: Johannes Schmölder <[email protected]>

Add LWE tests to test_cadet_adapter

0794a23

Adds tests for the LWE process and simulation results Co-authored-by: Johannes Schmölder <[email protected]>

Add Quadratic TestProblem to optimization fixtures

386fcf7

Add Hypersphere TestProblem to optimization fixtures

2f39aa5

Implement Surrogate class for fitting a GP on existing data from opti…

5252709

…mizations

schmoelder force-pushed the feature/surrogate branch from 2759b01 to 5252709 Compare August 14, 2024 11:39

schmoelder force-pushed the dev branch 11 times, most recently from 35e0c67 to d97cf31 Compare December 4, 2024 16:47

schmoelder force-pushed the dev branch from 7b500ea to 1cf3a54 Compare December 16, 2024 14:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fit surrogate model on existing population #158

Fit surrogate model on existing population #158

schmoelder commented Jul 11, 2024 •

edited

Loading

Fit surrogate model on existing population #158

Are you sure you want to change the base?

Fit surrogate model on existing population #158

Conversation

schmoelder commented Jul 11, 2024 • edited Loading

To do

Open questions

Alternative surrogate modeling approches

Follow-up projects

The interface

Conditioned optimization problems

Plots

schmoelder commented Jul 11, 2024 •

edited

Loading