This is a c++ implementation of the BFGS algorithm.
- header only
- no dependencies
- automatic derivative possible
- lbfgs version possible
The automatic derivative is an experimental option. Often this does not give advantages over numerical derivatives.
Include header
#include <bfgs/bfgs.h>
Create and configure BFGS class
BFGS bfgs;
bfgs.set_grad_eps(1e-8);
bfgs.set_stop_grad_eps(1e-7);
bfgs.set_stop_step_eps(1e-7);
bfgs.set_max_iter(1000);
Create the problem and solve it
auto f = [](const double* const x, int n) -> double
{
return x[0] * x[0] + x[1] * x[1] + x[0] + 2 * x[1];
};
const int n = 2;
double x[n] = {0.0, 0.0};
double y = bfgs.find_min(f, x, n);
You can use analytic derivatives
auto f = [](const double* const x, double* const g, int n) -> double
{
g[0] = 2 * x[0] + 1;
g[1] = 2 * x[1] + 2;
return x[0] * x[0] + x[1] * x[1] + x[0] + 2 * x[1];
};
const int n = 2;
double x[n] = {0.0, 0.0};
double y = bfgs.find_min(f, x, n);
It is possible to use automatic derivatives. It can be enabled with a macro BFGS_AUTO
before include.
#define BFGS_AUTO
#include <bfgs/bfgs.h>
You can use automatic derivatives with fixed dimension.
auto f = [](const DVal<2>* const x, uint32_t n) -> DVal<2>
{
return x[0] * x[0] + x[1] * x[1] + x[0] + 2 * x[1];
};
const int n = 2;
double x[n] = {0.0, 0.0};
double y = bfgs.find_min_auto<2>(f, x, n);
You can use automatic derivatives with dynamic dimension.
auto f = [](const DVal<0>* const x, uint32_t n) -> DVal<0>
{
return x[0] * x[0] + x[1] * x[1] + x[0] + 2 * x[1];
};
const int n = 2;
double x[n] = {0.0, 0.0};
double y = bfgs.find_min_auto(f, x, n);
grad_eps
Epsilon for calculating gradients. It is also used for some other checks.
Default:1e-8
.bfgs.set_grad_eps(1e-8);
stop_grad_eps
Stop criterion. Gradient Norm.
Default:1e-7
.bfgs.set_stop_grad_eps(1e-7);
stop_step_eps
Stop criterion. The function change size between iterations.
Default:1e-7
.bfgs.set_stop_step_eps(1e-7);
max_iter
Stop criterion. The maximum number of iterations of the algorithm.
Default:1000
.bfgs.set_max_iter(1000);
c1 and c2
Wolfe criterion constants (0.0 < c1 < c2 < 1.0
).
Default:0.01
and0.9
.bfgs.set_c1_c2(0.01, 0.9);
line_max_iter
The maximum number of iterations of the line search.
Default:100
.bfgs.set_line_max_iter(100);
central_diff
Only for the case of a numerical derivative.
Use central difference or forward difference for the gradient.
Default:true
.bfgs.set_central_diff(true);
line_central_diff
Only for the case of a numerical derivative.
Use central difference or forward difference for the line serach.
Maybe we don't want too much precision here.
You can try it to reduce the number of function calls (set tofalse
).
Default:true
.bfgs.set_line_central_diff(false);
min_f
Estimated minimum of the function.
Used for stop criterion if
fk - grad_eps < min_f
Also used for step_tweak option.
Default:inf
(not use).bfgs.set_min_f(-1.0);
step_tweak
Experimental setting of the first step of the linear search.
If set, then the first step of the linear search will be:
a = min(1, step_tweak * (f_min - f) / grad(f))
Default:0.0
(not use step_tweak if <=0.0
)bfgs.set_step_tweak(2.0);
reuse_hessian
Experimental setting.
When solving a large number of similar problems, the value from the previous run can be used as the initial approximation of the inverse Hessian.
Default:false
.bfgs.set_reuse_hessian(true);
memory_save
If we want to save some memory for a large problem.
In the usual case, the memory size is proportional to:
n * (n + 4)
In the case of storing only the upper triangular inverse Hessian matrix, the memory size is proportional to:
n * (n + 9) / 2
Not too important for the case of automatic differentiation.
Default:false
.bfgs.set_memory_save(true);
select_hessian
Selection of the initial approximation of the inverse hessian.
Default:false
.bfgs.set_select_hessian(true);
strong_wolfe
Use the strong Wolfe conditions (true
).
Use the normal Wolfe condition (false
).
Defaul:true
.bfgs.set_strong_wolfe(true);
dval_size
Specifies the amount of memory for automatic derivative with dynamic dimension.
If set to0
then memory is not reserved in advance (may be slower).
Defaul:100
.bfgs.set_dval_size(100);
line_force_num
Force use of numerical derivative in linear search for the case of an analytic derivative.
In the case when the gradient is not needed when calling the function, the gradient pointer will benullptr
.
Defaul:false
.bfgs.set_line_force_num(true);
lbfgs_m
History size for lbfgs version. If0
then the regular version is used.
The memory size is proportional to:
2 * (n * m + n + m)
.
Defaul:0
.bfgs.set_lbfgs_m(0);