Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Nonlinear] add new nonlinear interface #3106

Merged
merged 23 commits into from
Aug 31, 2023
Merged

[Nonlinear] add new nonlinear interface #3106

merged 23 commits into from
Aug 31, 2023

Conversation

odow
Copy link
Member

@odow odow commented Oct 10, 2022

Documentation

https://jump.dev/JuMP.jl/previews/PR3106/manual/nonlinear/

About

This PR is new implementation of nonlinear programming for JuMP.

It introduces a new GenericNonlinearExpr data structure, which replicates the structure of basic Julia Expr objects. The motivation for this is that we already create these in the macros, and it seems to work okay, so why not make it official!

Demos

Cribbing @pulsipher's demos in infiniteopt/InfiniteOpt.jl#161, we have:

Operator overloading

Nonlinear expressions can be defined outside of macros like their quad/affine
counterparts:

julia> using JuMP; m = Model(); @variable(m, x);

julia> x^2.3 + sin(x)
(x ^ 2.3) + sin(x)

Macro definitions

By making the necessary extensions to MutableArithmetics, we can define
nonlinear expressions inside of @expression, @objective, and @constraint (i.e.,
no need for the special NL macros):

julia> @expression(m, x + x^3 * sin(x)^x)
x + ((x ^ 3) * (sin(x) ^ x))

LinearAlgebra support

We can also do linear algebra operations:

julia> @variable(m, X[1:3, 1:2]); @variable(m, v[1:3]); @variable(m, w[1:2]);

julia> @expression(m, v' * X * w)
0.0 + ((X[1,1]*v[1] + X[2,1]*v[2] + X[3,1]*v[3]) * w[1]) + ((X[1,2]*v[1] + X[2,2]*v[2] + X[3,2]*v[3]) * w[2])

Function tracing

Because we support operator definition, we can trace functions in like manner to
Symbolics.jl:

julia> my_func(y) = 2^y + exp(y^-2.3);

julia> my_func(x)
(2.0 ^ x) + exp(x ^ -2.3)

User-defined operators

For user-defined operators that are not compatible with tracing, we allow them
to be registered like in JuMP and Symbolics.jl.

julia> my_func2(a) = sin(a) * eta(a);

julia> @operator(m, op_my_func, 1, my_func2)
NonlinearOperator(:op_my_func, my_func2)

julia> @expression(m, op_my_func(x)^2)
op_my_func(x) ^ 2.0

You can also implement this by defining a new overload:

julia> add_nonlinear_operator(m, 1, my_func2; name = :op_my_func2)
NonlinearOperator(:op_my_func2, my_func2)

julia> my_func2(a::VariableRef) = NonlinearExpr(:op_my_func, Any[a])
my_func2 (generic function with 2 methods)

julia> @expression(m, my_func2(x)^2)
op_my_func2(x) ^ 2.0

Open questions

So there are couple of open questions:

  • What to do with univariate operators that aren't in Base (like :dawson)
  • What to do with ifelse. In Julia 1.6, It's a builtin function so we can't extend it. It's a generic function in Julia 1.8+
    • We could overload on 1.8+:
      @static if VERSION >= v"1.8"
          function Base.ifelse(a::NonlinearExpr, x::_Scalar, y::_Scalar)
              return NonlinearExpr(:ifelse, Any[a, x, y])
          end
      end
      but the tricky thing is what to do with logical/inequality operators. We've generally avoided overloading >= et al. for variables etc because it just leads to problems.
      I think the answer is to ignore this for now. If you want to use ifelse, use @NL.
  • What to do with user-defined functions
    • The tricky thing is that we eagerly evaluate unknown functions in the normal macros, which often works for linear/quadratic terms via operator overloading. So we have to distinguish cases of "I want this traced" vs. "I want this registered."
    • I guess register could return a UserDefinedFunction object that people could use instead. Then use the function to trace, and the returned object to register.
    • I've also added a test where we just ask the user to overload a new method that dispatches on AbstractJuMPScalar. That seems like a reasonable work-around.
    • On the nonlinear call, we came to the conclusion that the method overloading was the correct approach. A macro to simplify things might be more user-friendly in future.
  • What to do with these methods:

    JuMP.jl/src/operators.jl

    Lines 166 to 196 in 1a5b487

    function Base.:^(lhs::AbstractVariableRef, rhs::Integer)
    if rhs == 2
    return lhs * lhs
    elseif rhs == 1
    return convert(GenericQuadExpr{Float64,variable_ref_type(lhs)}, lhs)
    elseif rhs == 0
    return one(GenericQuadExpr{Float64,variable_ref_type(lhs)})
    else
    error(
    "Only exponents of 0, 1, or 2 are currently supported. Are you " *
    "trying to build a nonlinear problem? Make sure you use " *
    "@NLconstraint/@NLobjective.",
    )
    end
    end
    function Base.:^(lhs::GenericAffExpr{T}, rhs::Integer) where {T}
    if rhs == 2
    return lhs * lhs
    elseif rhs == 1
    return convert(GenericQuadExpr{T,variable_ref_type(lhs)}, lhs)
    elseif rhs == 0
    return one(GenericQuadExpr{T,variable_ref_type(lhs)})
    else
    error(
    "Only exponents of 0, 1, or 2 are currently supported. Are you " *
    "trying to build a nonlinear problem? Make sure you use " *
    "@NLconstraint/@NLobjective.",
    )
    end
    end
    • There are two options. Currently, we error on x^3, requiring x^3.0. Alternatively, we could return a NonlinearExpr here, but that would make the return type unstable.
    • I've gone back and forwards with this. It has bitten me a couple of times, but the cost of making x^2 type unstable seems high and likely to cause a problem in code that is in the wild. Especially since the method is non-intuitive:
      julia> typeof(x)
      VariableRef
      
      julia> typeof(x^0)
      QuadExpr (alias for GenericQuadExpr{Float64, VariableRef})
      
      julia> typeof(x^1)
      QuadExpr (alias for GenericQuadExpr{Float64, VariableRef})
      
      julia> typeof(x^2)
      QuadExpr (alias for GenericQuadExpr{Float64, VariableRef})
      I think I'm going to leave as-is for now.

Latest benchmarks

See the test/perf/nonlinear_expr.jl file for details. The results aren't a perfect comparison between systems because:

  • Symbolics.jl doesn't actually solve any problems. We just time constructing the expressions, so the times are optimistic.
  • InfiniteOpt.jl isn't built for finite NLP problems, so it throws a warning and forward to JuMP. This results in some additional overhead, so the times are pessimistic.
4-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "@NL" => 7-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "prod" => Trial(2.431 ms)
	  "sum" => Trial(2.402 ms)
	  "clnlbeam" => Trial(99.744 ms)
	  "jump_2788" => Trial(889.958 ms)
	  "rosenbrock" => Trial(8.120 ms)
	  "many_constraints" => Trial(14.010 ms)
	  "mle" => Trial(4.840 ms)
  "Symbolics" => 5-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "prod" => Trial(7.679 ms)
	  "sum" => Trial(1.182 s)
	  "rosenbrock" => Trial(9.337 μs)
	  "many_constraints" => Trial(39.998 ms)
	  "mle" => Trial(9.496 ms)
  "InfiniteOpt" => 7-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "prod" => Trial(11.657 ms)
	  "sum" => Trial(11.758 s)
	  "clnlbeam" => Trial(2.239 s)
	  "jump_2788" => Trial(852.399 ms)
	  "rosenbrock" => Trial(8.496 ms)
	  "many_constraints" => Trial(38.681 ms)
	  "mle" => Trial(220.088 ms)
  "NonlinearExpr" => 7-element BenchmarkTools.BenchmarkGroup:
	  tags: []
	  "prod" => Trial(8.747 ms)
	  "sum" => Trial(8.745 ms)
	  "clnlbeam" => Trial(106.958 ms)
	  "jump_2788" => Trial(693.037 ms)
	  "rosenbrock" => Trial(8.156 ms)
	  "many_constraints" => Trial(12.549 ms)
	  "mle" => Trial(12.202 ms)

My takeaways are that we can ignore the Symbolics.jl and InfiniteOpt approaches. InfiniteOpt might be better than the benchmarks show, but it's still quite a way from where NonlinearExpr already is.

The main hit is that NonlinearExpr is much slower for the sum, prod, and mle benchmarks because they all involve generators, and @NL macros build these up as n-ary operators in-place, whereas NonlinearExpr is an out-of-place non-mutating arithmetic. For now, the difference might be acceptable (use @NLxxx if performance is a problem), but we could implement a mutable arithmetic for NonlinearExpr if it became important.

Another---not directly comparable---set of benchmarks are these ones with Pyomo:

(pyomo-nl) oscar@Oscars-MBP nlp-expressions % python ~/.julia/dev/JuMP/test/perf/nonlinear_expr.py
perf_pyomo_sum => 24.600 ms
perf_pyomo_prod => 29.666 ms
perf_pyomo_many_constraints => 87.142 ms
perf_pyomo_mle => 46.727 ms
perf_pyomo_clnlbeam => 356.307 ms
perf_pyomo_rosenbrock => 31.529 ms
perf_pyomo_jump_2788 => 634.583 ms

Pyomo is generally slower, except for the 2788 example, which is quite surprising! It clearly demonstrates that JuMP is suboptimal for this example, I think because of how we manage stored nonlinear expressions?

Comparative analysis of different systems

The following image is a screenshot of a table in this Google doc.

image

I think this table helps explain a lot of the benchmarks. I haven't timed MadDiff, but @sshin23 gave a talk at a nonlinear call demonstrating the compilation problem, which I think is a veto over making it a default in JuMP (MadDiff is a great alternative for problems with repeated structure and small expressions).

My takeaways are:

  • The MOI data structure is perfect for low-level stuff
  • I think we want a tree structure at the JuMP level. It's more intuitive and easier to work with.
  • If we have a tree at the JuMP level, we have two options:
    1. Type everything ala MadDiff
    2. Have some abstract types
  • Option 1. is out, so if we're having abstract types, then we may as well go simple with Vector{Any}. Anything else is just overkill.

Things I tried

  • args::Any: this was waaay slower than args::Vector{Any}. Conclusion is that a length-1 vector for univariate functions isn't that much of an issue
  • Various permutations of args::Vector{Union{AbstractJuMPScalar,Float64}} but the union is quite large, so union splitting doesn't happen.

Things to try

  • A variation on a node like

    struct Node
        type::NodeTypeEnum
        left::UInt64
        right::UInt64
    end
    

    with expressions type Dict{UInt64,Node}.
    where the .left and .right fields are re-interpreted to be the Int64 in MOI.VariableIndex for variables, a Float64 for coefficients, or a hash corresponding to the the left and right arguments of a binary expression.

  • A mutable (for + and *) version of NonlinearExpr

@odow odow added the Category: Nonlinear Related to nonlinear programming label Oct 10, 2022
test/nlp_expr.jl Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Oct 10, 2022

Codecov Report

Patch coverage: 99.16% and project coverage change: +0.08% 🎉

Comparison is base (9bb61c2) 98.00% compared to head (475aba4) 98.09%.
Report is 9 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3106      +/-   ##
==========================================
+ Coverage   98.00%   98.09%   +0.08%     
==========================================
  Files          36       37       +1     
  Lines        5068     5501     +433     
==========================================
+ Hits         4967     5396     +429     
- Misses        101      105       +4     
Files Changed Coverage Δ
src/JuMP.jl 94.87% <ø> (ø)
src/macros.jl 97.97% <97.61%> (-0.71%) ⬇️
src/nlp_expr.jl 99.24% <99.24%> (ø)
src/aff_expr.jl 97.32% <100.00%> (+0.04%) ⬆️
src/complement.jl 100.00% <100.00%> (ø)
src/mutable_arithmetics.jl 91.89% <100.00%> (+3.54%) ⬆️
src/operators.jl 96.84% <100.00%> (+0.39%) ⬆️
src/optimizer_interface.jl 97.01% <100.00%> (+0.11%) ⬆️
src/quad_expr.jl 100.00% <100.00%> (ø)
src/variables.jl 99.00% <100.00%> (+<0.01%) ⬆️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/mutable_arithmetics.jl Outdated Show resolved Hide resolved
src/nlp_expr.jl Outdated Show resolved Hide resolved
@odow

This comment was marked as outdated.

test/nlp_expr.jl Outdated Show resolved Hide resolved
@odow
Copy link
Member Author

odow commented Oct 11, 2022

I'm reasonably happy where this is for now. It's a good reference implementation to start building/comparing against.

Next up: an approach like infiniteopt/InfiniteOpt.jl#161 by @pulsipher

@odow
Copy link
Member Author

odow commented Oct 11, 2022

@pulsipher did you ever try benchmarking against this naive implementation? I've been playing with your implementation, and it seems to have more allocations.

The fundamental problem is that your tree implementation has

struct NodeData
    value::Any
end

https://github.com/infiniteopt/InfiniteOpt.jl/blob/5989fed8fbe323ddbf2a6dca12b61532c44e7c09/src/nlp.jl#L109-L111
so all the tree implementation does is move a Vector{Any} into a nested tree.

@pulsipher
Copy link
Contributor

@pulsipher did you ever try benchmarking against this naive implementation? I've been playing with your implementation, and it seems to have more allocations.

The fundamental problem is that your tree implementation has

struct NodeData
    value::Any
end

https://github.com/infiniteopt/InfiniteOpt.jl/blob/5989fed8fbe323ddbf2a6dca12b61532c44e7c09/src/nlp.jl#L109-L111 so all the tree implementation does is move a Vector{Any} into a nested tree.

The short answer is no. I benchmarked a lot of different tree structures, including a tape-like one similar to MOI, but I didn't think to do a naive one sinilar to Julia ASTs. It would make sense that having a concrete valued composite type with Vector{Any} would allocate less. Nice work so far!

@odow
Copy link
Member Author

odow commented Oct 11, 2022

Yeah, there's a trade-off about the representation and how we use it.

The Vector{Any} is bad if you want to do any computation walking the tree, because every access isn't type stable (so lots of allocations).

But if we just want to build a representation, potentially change it a little, and the convert it into the MOI data structure, the Vector{Any} is hard to beat, particularly because it's what Expr uses, so I guess a lot of work went into making storage etc efficient. Especially because the macros end up creating an Expr object anyway.

@odow
Copy link
Member Author

odow commented Oct 11, 2022

One issue with the current design is that it uses recursion, which will fail for deeply nested expressions. These are usually quite hard to write by hand, but generators can cause it:

julia> function fail(N)
           model = Model()
           @variable(model, x)
           @objective(model, Min, sum(sin(x) for i in 1:N))
           return
       end
fail (generic function with 2 methods)

julia> fail(15_000)

julia> fail(16_000)
ERROR: StackOverflowError:
Stacktrace:
 [1] ht_keyindex(h::Dict{Symbol, Int64}, key::Symbol)
   @ Base ./dict.jl:280

In this case, sum is converted into a binary tree with +(sin(x), +(sin(x), +(...))).

The @NL macros don't have the problem because they rewrite sum and prod as n-ary + and * instead of binary.

@odow
Copy link
Member Author

odow commented Oct 11, 2022

I've flattened the nested sums and products which removes the worst offenders for the recursion depth.

julia> fail(16_000)

julia> fail(20_000)

julia> fail(90_000)

At the moment they still allocate temporaries, but implementing a full mutable arithmetic is a lower priority. It's not obvious that the mutable trade-off will be worth it, because then every NonlinearExpr stuct will need to be mutable. I removed all recursion.

@odow
Copy link
Member Author

odow commented Oct 13, 2022

I found the Pyomo discussions:

I think we hit most of the points. The two biggies would be no recursion and keep expressions immutable. I just pushed a commit which removed our usage of recursion (my original thought was, what sort of non-toy, practical model would have deeply nested expressions?) and added a test.

cc @Robbybp

Copy link
Contributor

@frapac frapac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It overall looks good to me. I would suggest maybe to clarify one error message.

Impressive that it works with so few changes in JuMP's code!

src/operators.jl Outdated Show resolved Hide resolved
@Robbybp
Copy link

Robbybp commented Oct 13, 2022

@odow That's more than I was even aware of, I just knew about the discussion in the docs.

@odow
Copy link
Member Author

odow commented Oct 13, 2022

So it'd be interesting to know if anyone actually has any practical examples with deep expressions. It seems like the Pyomo folks had removing recursion as a design motivation in their Pyomo5 expression system, but, other than this PR, none of the Julia systems support it. I've certainly never had a question on the forum of someone complaining about it.

julia> using JuMP

julia> import InfiniteOpt

julia> import Ipopt

julia> import Symbolics

julia> function perf_nl_deep_recursion()
           model = Model(Ipopt.Optimizer)
           set_silent(model)
           @variable(model, x)
           y = Expr(:call, :sin, x)
           for _ in 1:20_000
               y = Expr(:call, :^, Expr(:call, :sqrt, y), 2)
           end
           set_nonlinear_objective(model, MOI.MIN_SENSE, Expr(:call, :^, y, 2))
           optimize!(model)
           return 
       end
perf_nl_deep_recursion (generic function with 1 method)

julia> function perf_nlexpr_deep_recursion()
           model = Model(Ipopt.Optimizer)
           set_silent(model)
           @variable(model, x)
           y = sin(x)
           for _ in 1:20_000
               y = sqrt(y)^2
           end
           @objective(model, Min, y^2)
           optimize!(model)
           return 
       end
perf_nlexpr_deep_recursion (generic function with 1 method)

julia> function perf_infopt_deep_recursion()
           model = InfiniteOpt.InfiniteModel(Ipopt.Optimizer)
           set_silent(model)
           @variable(model, x)
           y = sin(x)
           for _ in 1:20_000
               y = sqrt(y)^2
           end
           @objective(model, Min, y^2)
           optimize!(model)
           return
       end
perf_infopt_deep_recursion (generic function with 1 method)

julia> function perf_symbolics_deep_recursion()
           Symbolics.@variables x
           y = sin(x)
           for _ in 1:20_000
               y = sqrt(y)^2
           end
           return string(y)
       end
perf_symbolics_deep_recursion (generic function with 1 method)

julia> perf_nlexpr_deep_recursion()

julia> perf_nl_deep_recursion()
ERROR: StackOverflowError:
Stacktrace:
# [...]

julia> perf_symbolics_deep_recursion()
ERROR: StackOverflowError:
Stacktrace:
# [...]

julia> perf_infopt_deep_recursion()
ERROR: StackOverflowError:
Stacktrace:
# [...]

@Robbybp
Copy link

Robbybp commented Oct 13, 2022

In https://github.com/IDAES/idaes-pse, we use quite large expressions that come from nested thermophysical property calculations for many chemical species all the time, but I don't think these actually get that deep. I should find some offenders and measure this when have time. Deep expressions could arise when encoding a neural network in an expression system, as in those generated by https://github.com/cog-imperial/OMLT.

@odow
Copy link
Member Author

odow commented Oct 13, 2022

The tricky part about this is how we document things.

Because we're effectively building two separate nonlinear interfaces that are compatible with minor quirks between them.

For example:

  • Normal macros rewrite expressions using MutableArithmetics, which can lead to expressions graphs which are not the same as the user inputted. This seems like a major problem: Rewrite of sum()*constant is suboptimal MutableArithmetics.jl#169
  • you can write sum(X^3 .- X^4) in @expression but not in @NLexpression
  • you can write ifelse(x > 0, 1, 2) in @NLexpression but not in @expression
  • you have to register user-defined functions, but not if it's traceable in @expression, but if it isn't, you need to register and implement a new method
  • if you add a nonlinear constraint via @constraint then things like name(con) doesn't work. (Proof of concept done, should be added to MOI)
  • setting a nonlinear objective via @objective overwrites a @NLobjective, but setting a linear objective via @objective doesn't This was wrong.
  • constraint_object(con) doesn't work for nonlinear constraints added via @constraint (but we could make it work) Implemented

I wonder if we should make this an opt-in external package that (carefully) pirates JuMP methods.

@odow
Copy link
Member Author

odow commented Oct 14, 2022

I'm starting to see what first-class nonlinear support would look like.

Since this PR demonstrates that the simple expression type has reasonable performance, we could create a new function in MOI that is the MOI version of it.

Then solution is not to pass the nonlinear model to solvers, jump-dev/MathOptInterface.jl#1998, it's to pass the variation of these NonlinearExpr objects as constraints, jump-dev/MathOptInterface.jl#846, and have the solvers build the MOI.Nonlinear.Model object. That removes more complexity from JuMP and would resolve #1185.

Intermediate MOI layers could intercept add_constraint(model, ::NonlinearExpr, ::Set) and do reformulations (e.g., DCP). And we could have VectorNonlinearFunction = Vector{NonlinearExpr} to support complementarity. Alpine.jl would consume NonlinearExpr directly, no need to muck about.

We could also write a copy_to(::MOI.Nonlinear.Model, ::MOI.ModelLike) which would let us simplify nonlinear solvers.

The complications are that we'd need new MOI functions along the lines of:

  • MOI.register_user_defined_operator
  • MOI.add_nonlinear_parameter
  • MOI.add_nonlinear_expression

@odow
Copy link
Member Author

odow commented Oct 21, 2022

@ccoffrin minimal impact on AC-OPF. On 10k nodes, build time goes from 1.33s to 1.64s and solve time is >100s. So perhaps a bit slower because of operator overloading, but I think an acceptable amount. Solve times bound around on my laptop, so I didn't put much stock I the difference in total runtime.

# @NL

Dict{String, Any} with 10 entries:
  "time_callbacks" => 0.0939076
  "cost"           => 97213.6
  "variables"      => 1088
  "constraints"    => 1539
  "case"           => "data/pglib_opf_case118_ieee.m"
  "time_total"     => 0.44728
  "time_build"     => 0.010298
  "time_solve"     => 0.394519
  "time_data"      => 0.0424631
  "feasible"       => true

Dict{String, Any} with 10 entries:
  "time_callbacks" => 24.0821
  "cost"           => 1.35403e6
  "variables"      => 76804
  "constraints"    => 112352
  "case"           => "data/pglib_opf_case10000_goc.m"
  "time_total"     => 124.34
  "time_build"     => 1.33178
  "time_solve"     => 119.927
  "time_data"      => 3.08087
  "feasible"       => true

# NonlinearExpr

Dict{String, Any} with 9 entries:
  "cost"        => 97213.6
  "variables"   => 1088
  "constraints" => 1539
  "case"        => "data/pglib_opf_case118_ieee.m"
  "time_total"  => 0.413031
  "time_build"  => 0.0152421
  "time_solve"  => 0.357061
  "time_data"   => 0.0407269
  "feasible"    => true

Dict{String, Any} with 9 entries:
  "cost"        => 1.35403e6
  "variables"   => 76804
  "constraints" => 112352
  "case"        => "data/pglib_opf_case10000_goc.m"
  "time_total"  => 116.081
  "time_build"  => 1.64141
  "time_solve"  => 111.833
  "time_data"   => 2.60745
  "feasible"    => true

@odow
Copy link
Member Author

odow commented Oct 21, 2022

Another option we should look into is https://github.com/SymbolicML/DynamicExpressions.jl

But it looks like they do some scary stuff with eval in the global scope:
https://github.com/SymbolicML/DynamicExpressions.jl/blob/292deb96c6d7ad92193aedbc41229776d67e9520/src/OperatorEnumConstruction.jl#L51-L61

Their nodes are essentially the concrete left-right design I suggested above:
https://github.com/SymbolicML/DynamicExpressions.jl/blob/292deb96c6d7ad92193aedbc41229776d67e9520/src/Equation.jl#L7-L45

cc @MilesCranmer: interested to hear what factors you considered/benchmarks you ran to end up with this design, and what alternatives you tried and rejected.

@MilesCranmer
Copy link

Thanks for the tag!

But it looks like they do some scary stuff with eval in the global scope:

These functions are purely for convenience in the REPL. You can turn them off with extend_user_operators=false. The more robust way is to use Node(...) to construct trees and eval_tree_array (and eval_grad_tree_array for gradients) to evaluate them - which is what SymbolicRegression.jl does internally.

interested to hear what factors you considered/benchmarks you ran to end up with this design, and what alternatives you tried and rejected.

I tried a few alternatives:

  1. Putting functions or symbols inside the nodes, rather than using an enum.
  2. Arbitrary degree nodes.

Trying 1., resulted in tree construction becoming a bottleneck. In SymbolicRegression.jl I essentially need to churn through billions of different expressions (not even including parameter tuning for a single expression), so the penalty of construction/mutation of trees really starts to take a toll unless they are as lightweight as possible.

Trying 2. severely hurt evaluation performance. It might just have been how I implemented it, but I think it would be an immense engineering challenge to get it right with the same level of type stability and evaluation speed as the current implementation. Right now the evaluation code (for scalar operators; though more generic ones are available with GenericOperatorEnum) is about as optimized as I could get it - it even does things like fuse the enum operators together (e.g., cos(cos(x)), cos(sin(x)), cos(x+y) are each fused into a single SIMD kernel), and detect if a sub-expression is constant before traversing (which means I don't need to do an array operation). I even managed to get a ~10% gain in performance by pre-constructing a tuple of Val(i) (used to specialize an operator kernel) in a global constant - which apparently became a bottleneck!

@MilesCranmer
Copy link

MilesCranmer commented Oct 21, 2022

Just to clarify - you could essentially take an operator enum like this:

struct MyOperatorEnum{A<:Tuple,B<:Tuple} <: AbstractOperatorEnum
    unaops::A
    binops::B
end

operators = MyOperatorEnum((+, -, *), (cos, sin))

and it would be basically the same as the one used in DynamicExpressions.jl (and would even work, if you pass it to eval_tree_array). The rest of the stuff in the constructor is building convenience functions and creating the Zygote-compiled derivative operators (put in another tuple), so that the expression traversal is just as fast for derivatives as it is for normal evaluation.

@odow
Copy link
Member Author

odow commented Oct 21, 2022

Thanks for the tag!

👍

These functions are purely for convenience in the REPL.

Ah okay, that makes more sense. You don't actually do the overloading behind the scenes to generate the trees.

Putting functions or symbols inside the nodes, rather than using an enum.

You might want to take a look at the MOI data structure for other ideas: https://jump.dev/MathOptInterface.jl/dev/submodules/Nonlinear/overview/#Expression-graph-representation

It is concretely typed, but perhaps not as friendly to build and mutate.

Arbitrary degree nodes.

Yeah. We're really exploring the trade-off between ease of construction and usefulness for computation. For computation, the nodes I have with Vector{Any} are not good, which is why we convert to the MOI data structure. But it might make sense at the JuMP level to provide a nice simple interface, that solvers can do what they want with later.

are each fused into a single SIMD kernel

This is something we haven't tried, but it's on my mind.

@MilesCranmer
Copy link

You might want to take a look at the MOI data structure for other ideas: https://jump.dev/MathOptInterface.jl/dev/submodules/Nonlinear/overview/#Expression-graph-representation

Interesting, thanks! I did explore a stack-based expression design like Expression here at one point, with all nodes in one array. My conclusion was that: (1) evaluation speed is similar, although stack-based expressions will likely be better for GPU-based evaluation. (2) code complexity is simpler in some areas (e.g., copying trees, or measuring aggregate quantities), but more complex in others (like evaluation - whereas a binary tree is just a simple recursion), and (3) mutating the tree is significantly more complex and expensive.

(3) here is especially painful for symbolic regression, where some mutations take you from a binary node to a unary node (like if you change a (+) into a (cos), letting the garbage collector delete the right node). For cases like this, it really made sense in the end to use a standard binary tree. Although maybe if I one day find some time it would be interesting to try a GPU version again.

Btw, one other thing which might be unique to SymbolicRegression.jl's needs is that I technically want to allow for graph expressions, rather than just trees. In other words, you could construct a tree like this:

ops = OperatorEnum(; unary_operators=[cos, exp], binary_operators=[+, *, -])
x1 = Node(Float64; feature=1)
base = cos(x1 - 3.2)  
# ^ Or, could write this as: Node(1, Node(3, x1, Node(; val=3.2)))
expression = exp(base) - base 
# ^ Or, could write this as: Node(3, Node(2, base), base)

This is not a tree but actually a graph - the base expression is re-used by two other nodes. Since the tree structure is just pointers, it will actually just point to the same node in memory. When combined with an IdDict, it lets me do things like assign the expression a reduced complexity since it is sharing symbols. So I basically want to (eventually) use it to encourage the symbolic regression to discover re-usable abstractions.

Anyways, in the end, things like this seemed simpler to get running and optimize with a vanilla binary tree implementation rather than a stack-based representation of a tree, since I can just let pointers point at the same node.

I also tried doing a tree->stack conversion, simply for evaluation (i.e., evaluate in a big loop over a stack, rather than recursion on a tree), but performance did not actually improve - it was actually harder to fuse kernels here, so the tree-based approach actually ended up being faster (which was counter-intuitive to me).

@odow
Copy link
Member Author

odow commented Nov 17, 2022

Just to update the status of this PR: it is blocked by #3125. We need to simplify how we're doing the rewriting in order for us to construct saner expression trees.

Copy link
Contributor

@pulsipher pulsipher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. My only suggestions would be a few documentation improvements:

  • Change the language to all match NonlinearOperator
  • Make it clear with GenericModel{T} that T must be Float64 if the model is nonlinear
  • Perhaps have something to track which solvers support the new interface

docs/src/manual/nonlinear.md Outdated Show resolved Hide resolved
src/nlp_expr.jl Outdated Show resolved Hide resolved
@blegat
Copy link
Member

blegat commented Aug 28, 2023

It might be confusing that op_less_than_or_equal_to is like MOI.LessThan but op_less_than is not. Maybe we should go for op_nonstrict_less_than and op_strict_less_than to avoid any risk of confusion

@odow
Copy link
Member Author

odow commented Aug 28, 2023

Changed to op_strictly_greater_than and op_greater_than.

docs/src/manual/nonlinear.md Outdated Show resolved Hide resolved
@blegat
Copy link
Member

blegat commented Aug 29, 2023

Changed to op_strictly_greater_than and op_greater_than.

It's nice to be consistent with MOI but for someone not knowing MOI, he would expect op_greater_than to be the strict one according to https://en.wikipedia.org/wiki/Less-than_sign. But maybe it's not a big deal

@odow
Copy link
Member Author

odow commented Aug 29, 2023

Since this is at the JuMP level, it probably make more sense to revert to what I had before, even if it's inconsistent with MOI? Users don't write MOI.LessThan very often...

@blegat
Copy link
Member

blegat commented Aug 29, 2023

Yes, or op_strictly_less_than and op_less_than_or_equal_to. It's not only about MOI, in optimization, we usually have nonstrict inequality so op_less_than might be thought to be <=

@odow
Copy link
Member Author

odow commented Aug 29, 2023

@mlubin: one final review. We've changed @register to @operator and register_nonlinear_operator to add_nonlinear_operator.

docs/src/manual/nonlinear.md Outdated Show resolved Hide resolved
@odow
Copy link
Member Author

odow commented Aug 30, 2023

Barring further suggestions, I'll leave this PR open one more day and then merge.

Once merged, I'll wait at least a week before tagging a new release so that we have more time for review.

That'll also give me time to do things like add documentation on parameters.

@odow odow merged commit a573de2 into master Aug 31, 2023
@odow odow deleted the od/nlp-expr branch August 31, 2023 08:27
@odow
Copy link
Member Author

odow commented Aug 31, 2023

Thanks all for the feedback. It took a while and quite a few iterations, but I think we landed somewhere nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: Nonlinear Related to nonlinear programming
Development

Successfully merging this pull request may close these issues.