-
-
Notifications
You must be signed in to change notification settings - Fork 399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Nonlinear] add new nonlinear interface #3106
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #3106 +/- ##
==========================================
+ Coverage 98.00% 98.09% +0.08%
==========================================
Files 36 37 +1
Lines 5068 5501 +433
==========================================
+ Hits 4967 5396 +429
- Misses 101 105 +4
☔ View full report in Codecov by Sentry. |
This comment was marked as outdated.
This comment was marked as outdated.
I'm reasonably happy where this is for now. It's a good reference implementation to start building/comparing against. Next up: an approach like infiniteopt/InfiniteOpt.jl#161 by @pulsipher |
@pulsipher did you ever try benchmarking against this naive implementation? I've been playing with your implementation, and it seems to have more allocations. The fundamental problem is that your tree implementation has struct NodeData
value::Any
end https://github.com/infiniteopt/InfiniteOpt.jl/blob/5989fed8fbe323ddbf2a6dca12b61532c44e7c09/src/nlp.jl#L109-L111 |
The short answer is no. I benchmarked a lot of different tree structures, including a tape-like one similar to MOI, but I didn't think to do a naive one sinilar to Julia ASTs. It would make sense that having a concrete valued composite type with |
Yeah, there's a trade-off about the representation and how we use it. The But if we just want to build a representation, potentially change it a little, and the convert it into the MOI data structure, the |
One issue with the current design is that it uses recursion, which will fail for deeply nested expressions. These are usually quite hard to write by hand, but generators can cause it: julia> function fail(N)
model = Model()
@variable(model, x)
@objective(model, Min, sum(sin(x) for i in 1:N))
return
end
fail (generic function with 2 methods)
julia> fail(15_000)
julia> fail(16_000)
ERROR: StackOverflowError:
Stacktrace:
[1] ht_keyindex(h::Dict{Symbol, Int64}, key::Symbol)
@ Base ./dict.jl:280 In this case, The |
julia> fail(16_000)
julia> fail(20_000)
julia> fail(90_000)
|
I found the Pyomo discussions:
I think we hit most of the points. The two biggies would be no recursion and keep expressions immutable. I just pushed a commit which removed our usage of recursion (my original thought was, what sort of non-toy, practical model would have deeply nested expressions?) and added a test. cc @Robbybp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It overall looks good to me. I would suggest maybe to clarify one error message.
Impressive that it works with so few changes in JuMP's code!
@odow That's more than I was even aware of, I just knew about the discussion in the docs. |
So it'd be interesting to know if anyone actually has any practical examples with deep expressions. It seems like the Pyomo folks had removing recursion as a design motivation in their Pyomo5 expression system, but, other than this PR, none of the Julia systems support it. I've certainly never had a question on the forum of someone complaining about it. julia> using JuMP
julia> import InfiniteOpt
julia> import Ipopt
julia> import Symbolics
julia> function perf_nl_deep_recursion()
model = Model(Ipopt.Optimizer)
set_silent(model)
@variable(model, x)
y = Expr(:call, :sin, x)
for _ in 1:20_000
y = Expr(:call, :^, Expr(:call, :sqrt, y), 2)
end
set_nonlinear_objective(model, MOI.MIN_SENSE, Expr(:call, :^, y, 2))
optimize!(model)
return
end
perf_nl_deep_recursion (generic function with 1 method)
julia> function perf_nlexpr_deep_recursion()
model = Model(Ipopt.Optimizer)
set_silent(model)
@variable(model, x)
y = sin(x)
for _ in 1:20_000
y = sqrt(y)^2
end
@objective(model, Min, y^2)
optimize!(model)
return
end
perf_nlexpr_deep_recursion (generic function with 1 method)
julia> function perf_infopt_deep_recursion()
model = InfiniteOpt.InfiniteModel(Ipopt.Optimizer)
set_silent(model)
@variable(model, x)
y = sin(x)
for _ in 1:20_000
y = sqrt(y)^2
end
@objective(model, Min, y^2)
optimize!(model)
return
end
perf_infopt_deep_recursion (generic function with 1 method)
julia> function perf_symbolics_deep_recursion()
Symbolics.@variables x
y = sin(x)
for _ in 1:20_000
y = sqrt(y)^2
end
return string(y)
end
perf_symbolics_deep_recursion (generic function with 1 method)
julia> perf_nlexpr_deep_recursion()
julia> perf_nl_deep_recursion()
ERROR: StackOverflowError:
Stacktrace:
# [...]
julia> perf_symbolics_deep_recursion()
ERROR: StackOverflowError:
Stacktrace:
# [...]
julia> perf_infopt_deep_recursion()
ERROR: StackOverflowError:
Stacktrace:
# [...] |
In https://github.com/IDAES/idaes-pse, we use quite large expressions that come from nested thermophysical property calculations for many chemical species all the time, but I don't think these actually get that deep. I should find some offenders and measure this when have time. Deep expressions could arise when encoding a neural network in an expression system, as in those generated by https://github.com/cog-imperial/OMLT. |
The tricky part about this is how we document things. Because we're effectively building two separate nonlinear interfaces that are compatible with minor quirks between them. For example:
I wonder if we should make this an opt-in external package that (carefully) pirates JuMP methods. |
I'm starting to see what first-class nonlinear support would look like. Since this PR demonstrates that the simple expression type has reasonable performance, we could create a new function in MOI that is the MOI version of it. Then solution is not to pass the nonlinear model to solvers, jump-dev/MathOptInterface.jl#1998, it's to pass the variation of these Intermediate MOI layers could intercept We could also write a The complications are that we'd need new MOI functions along the lines of:
|
@ccoffrin minimal impact on AC-OPF. On 10k nodes, build time goes from 1.33s to 1.64s and solve time is >100s. So perhaps a bit slower because of operator overloading, but I think an acceptable amount. Solve times bound around on my laptop, so I didn't put much stock I the difference in total runtime.
|
Another option we should look into is https://github.com/SymbolicML/DynamicExpressions.jl But it looks like they do some scary stuff with Their nodes are essentially the concrete left-right design I suggested above: cc @MilesCranmer: interested to hear what factors you considered/benchmarks you ran to end up with this design, and what alternatives you tried and rejected. |
Thanks for the tag!
These functions are purely for convenience in the REPL. You can turn them off with
I tried a few alternatives:
Trying 1., resulted in tree construction becoming a bottleneck. In SymbolicRegression.jl I essentially need to churn through billions of different expressions (not even including parameter tuning for a single expression), so the penalty of construction/mutation of trees really starts to take a toll unless they are as lightweight as possible. Trying 2. severely hurt evaluation performance. It might just have been how I implemented it, but I think it would be an immense engineering challenge to get it right with the same level of type stability and evaluation speed as the current implementation. Right now the evaluation code (for scalar operators; though more generic ones are available with |
Just to clarify - you could essentially take an operator enum like this: struct MyOperatorEnum{A<:Tuple,B<:Tuple} <: AbstractOperatorEnum
unaops::A
binops::B
end
operators = MyOperatorEnum((+, -, *), (cos, sin)) and it would be basically the same as the one used in DynamicExpressions.jl (and would even work, if you pass it to |
👍
Ah okay, that makes more sense. You don't actually do the overloading behind the scenes to generate the trees.
You might want to take a look at the MOI data structure for other ideas: https://jump.dev/MathOptInterface.jl/dev/submodules/Nonlinear/overview/#Expression-graph-representation It is concretely typed, but perhaps not as friendly to build and mutate.
Yeah. We're really exploring the trade-off between ease of construction and usefulness for computation. For computation, the nodes I have with
This is something we haven't tried, but it's on my mind. |
Interesting, thanks! I did explore a stack-based expression design like (3) here is especially painful for symbolic regression, where some mutations take you from a binary node to a unary node (like if you change a (+) into a (cos), letting the garbage collector delete the right node). For cases like this, it really made sense in the end to use a standard binary tree. Although maybe if I one day find some time it would be interesting to try a GPU version again. Btw, one other thing which might be unique to SymbolicRegression.jl's needs is that I technically want to allow for graph expressions, rather than just trees. In other words, you could construct a tree like this: ops = OperatorEnum(; unary_operators=[cos, exp], binary_operators=[+, *, -])
x1 = Node(Float64; feature=1)
base = cos(x1 - 3.2)
# ^ Or, could write this as: Node(1, Node(3, x1, Node(; val=3.2)))
expression = exp(base) - base
# ^ Or, could write this as: Node(3, Node(2, base), base) This is not a tree but actually a graph - the base expression is re-used by two other nodes. Since the tree structure is just pointers, it will actually just point to the same node in memory. When combined with an Anyways, in the end, things like this seemed simpler to get running and optimize with a vanilla binary tree implementation rather than a stack-based representation of a tree, since I can just let pointers point at the same node. I also tried doing a tree->stack conversion, simply for evaluation (i.e., evaluate in a big loop over a stack, rather than recursion on a tree), but performance did not actually improve - it was actually harder to fuse kernels here, so the tree-based approach actually ended up being faster (which was counter-intuitive to me). |
Just to update the status of this PR: it is blocked by #3125. We need to simplify how we're doing the rewriting in order for us to construct saner expression trees. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. My only suggestions would be a few documentation improvements:
- Change the language to all match
NonlinearOperator
- Make it clear with
GenericModel{T}
thatT
must beFloat64
if the model is nonlinear - Perhaps have something to track which solvers support the new interface
It might be confusing that |
Changed to |
It's nice to be consistent with MOI but for someone not knowing MOI, he would expect |
Since this is at the JuMP level, it probably make more sense to revert to what I had before, even if it's inconsistent with MOI? Users don't write |
Yes, or |
@mlubin: one final review. We've changed |
Barring further suggestions, I'll leave this PR open one more day and then merge. Once merged, I'll wait at least a week before tagging a new release so that we have more time for review. That'll also give me time to do things like add documentation on parameters. |
Thanks all for the feedback. It took a while and quite a few iterations, but I think we landed somewhere nice. |
Documentation
https://jump.dev/JuMP.jl/previews/PR3106/manual/nonlinear/
About
This PR is new implementation of nonlinear programming for JuMP.
It introduces a new
GenericNonlinearExpr
data structure, which replicates the structure of basic Julia Expr objects. The motivation for this is that we already create these in the macros, and it seems to work okay, so why not make it official!Demos
Cribbing @pulsipher's demos in infiniteopt/InfiniteOpt.jl#161, we have:
Operator overloading
Nonlinear expressions can be defined outside of macros like their quad/affine
counterparts:
Macro definitions
By making the necessary extensions to MutableArithmetics, we can define
nonlinear expressions inside of
@expression
,@objective
, and@constraint
(i.e.,no need for the special NL macros):
LinearAlgebra support
We can also do linear algebra operations:
Function tracing
Because we support operator definition, we can trace functions in like manner to
Symbolics.jl:
User-defined operators
For user-defined operators that are not compatible with tracing, we allow them
to be registered like in JuMP and Symbolics.jl.
You can also implement this by defining a new overload:
Open questions
So there are couple of open questions:
Base
(like:dawson
)ifelse
. In Julia 1.6, It's a builtin function so we can't extend it. It's a generic function in Julia 1.8+>=
et al. for variables etc because it just leads to problems.I think the answer is to ignore this for now. If you want to use
ifelse
, use@NL
.register
could return aUserDefinedFunction
object that people could use instead. Then use the function to trace, and the returned object to register.AbstractJuMPScalar
. That seems like a reasonable work-around.JuMP.jl/src/operators.jl
Lines 166 to 196 in 1a5b487
x^3
, requiringx^3.0
. Alternatively, we could return aNonlinearExpr
here, but that would make the return type unstable.x^2
type unstable seems high and likely to cause a problem in code that is in the wild. Especially since the method is non-intuitive:Latest benchmarks
See the
test/perf/nonlinear_expr.jl
file for details. The results aren't a perfect comparison between systems because:My takeaways are that we can ignore the Symbolics.jl and InfiniteOpt approaches. InfiniteOpt might be better than the benchmarks show, but it's still quite a way from where NonlinearExpr already is.
The main hit is that
NonlinearExpr
is much slower for thesum
,prod
, andmle
benchmarks because they all involve generators, and@NL
macros build these up as n-ary operators in-place, whereasNonlinearExpr
is an out-of-place non-mutating arithmetic. For now, the difference might be acceptable (use@NLxxx
if performance is a problem), but we could implement a mutable arithmetic forNonlinearExpr
if it became important.Another---not directly comparable---set of benchmarks are these ones with Pyomo:
Pyomo is generally slower, except for the 2788 example, which is quite surprising! It clearly demonstrates that JuMP is suboptimal for this example, I think because of how we manage stored nonlinear expressions?
Comparative analysis of different systems
The following image is a screenshot of a table in this Google doc.
I think this table helps explain a lot of the benchmarks. I haven't timed MadDiff, but @sshin23 gave a talk at a nonlinear call demonstrating the compilation problem, which I think is a veto over making it a default in JuMP (MadDiff is a great alternative for problems with repeated structure and small expressions).
My takeaways are:
Vector{Any}
. Anything else is just overkill.Things I tried
args::Any
: this was waaay slower thanargs::Vector{Any}
. Conclusion is that a length-1 vector for univariate functions isn't that much of an issueargs::Vector{Union{AbstractJuMPScalar,Float64}}
but the union is quite large, so union splitting doesn't happen.Things to try
A variation on a node like
with expressions type
Dict{UInt64,Node}
.where the
.left
and.right
fields are re-interpreted to be theInt64
inMOI.VariableIndex
for variables, aFloat64
for coefficients, or a hash corresponding to the the left and right arguments of a binary expression.A mutable (for
+
and*
) version of NonlinearExpr