-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve performance of IndexAtom a little #615
Conversation
I could use some help figuring out how to update the tests, since I am not familiar with the new |
I think they will be fewer tests to update once #614 is done, since there will be less indexing in various atoms |
I'll take a look after we merge #614. And yes, the variable order of the |
I'll rebase and take a look. |
2eb085c
to
288f874
Compare
I don't know about this one. It isn't obvious that we want to add every index operation as a scalar variable. |
Good point, this is probably pushing work to solve time… on a technical level, when we do scalar indexing, we end up with a 1xn matrix. There’s two issues: row indexing of CSC matrices is slow, and we’d kind of like to stop composing forward with future operations, do do the rest of them, probably ending up with a 1x1 matrix, then compose that with our 1xn, instead of repeatedly updating our 1xn matrix with each operation. Here, we don’t directly solve the first issue but we at least do the row indexing with minimal allocations, and we solve the second issue by starting a new variable and constraining it to be equal to the output of the old one. To improve the row indexing thing, we could convert to another representation that has fast row indexing, but that’s super inefficient if we have to do it separately each time we index that matrix. We could probably do some caching though… for the second, we could go back to a lazier representation. At first I tried a fully lazy representation where we keep each operation separate and only compose when we have to. That’s why it’s called a SparseTape, originally it had a vector of operations. I think it could make sense to do something in between, like only being lazy when it seems likely to be profitable. Maybe also with a linked list rather than a vector. if I have time I will give it a try |
This reverts commit 55547c5.
I tried a few things without much luck. The current state now is a fairly simple change, just optimizing the indexing directly without adding any more caches or auxiliary variables etc. It provides a ~2x speedup for indexing heavy-code and reduces allocations: julia> using Convex
julia> import MathOptInterface as MOI
julia> let
m = 10_000
x = Variable(m, 1)
constraints = []
for i in 1:m
push!(constraints, x[i] >= 0)
end
problem = minimize(sum(x), identity.(constraints))
@time context = Convex.Context(problem, MOI.Utilities.Model{Float64})
end;
1.309713 seconds (621.66 k allocations: 2.269 GiB, 28.88% gc time) # master
0.623630 seconds (431.66 k allocations: 1.513 GiB, 18.64% gc time) # PR
However if I backport the old/slow ExpAtom definition, I still OOM on the #254 problem: using Convex, JLD2, MathOptInterface
import MathOptInterface as MOI
file = JLD2.load(joinpath("/Users/eph/Downloads", "cvx_example.jld2"))
function Convex.new_conic_form!(context::Convex.Context{T}, e::Convex.ExpAtom) where {T}
# exp(x) \leq z <=> (x,1,z) \in ExpCone
x = e.children[1]
m, n = size(x)
z = Variable(m, n)
for i in 1:m, j in 1:n
f = vcat(x[i, j], 1, z[i, j])
add_constraint!(context, Convex.GenericConstraint{MOI.ExponentialCone}(f))
end
return Convex.conic_form!(context, z)
end
let
# m = 2000
# Adj = file["Adj"][1:m, :]
# N = file["N"][1:m, :, :]
Adj = file["Adj"]
N = file["N"]
x = Variable(2730)
S, V, T = size(N)
lN = log.(N)
M = vec(sum(N[:, :, 2:T]; dims=(2, 3)))
H = Adj * x
problem = maximize(-sum(logsumexp(lN[:, v, t] - H) for t = 1:T-1, v = 1:V) - dot(M, H), [x ≥ -1e2, x ≤ 1e2])
@time context = Convex.Context(problem, MOI.Utilities.Model{Float64});
end; My goal with this PR was to make it so that if users use indexing the way ExpAtom used to, it would be more tolerable and e.g. not OOM on #254. But the only way I've found to do that so far is with the auxiliary variables, which maybe is just pushing the issue down the line to the solver to handle. Here I still OOM. I also tried an approach with an extra cache, where we cache I also tried a semi-lazy way, but I realized/remembered the issue with laziness is that we have to collapse it to a single operation as soon as we want to do do something like So anyway, I can't really figure out a better approach here. I do think the auxiliary variable thing could be worth it in the end, but we would need to performance test it end-to-end to account for solve time, and that could be a different PR. I think this one is at least a strict improvement over the status quo, but indexing is still very slow. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #615 +/- ##
==========================================
- Coverage 97.88% 97.80% -0.09%
==========================================
Files 88 88
Lines 5062 5095 +33
==========================================
+ Hits 4955 4983 +28
- Misses 107 112 +5 ☔ View full report in Codecov by Sentry. |
Let me take a look at the test failure. |
This makes the issues highlighted in #614 less bad. It is still suboptimal (and IMO we should address #614 directly by removing our uses of scalar indexing), but it would still be nice if indexing wasn't quite so darn slow.
The issue is for each individual index we do a sparse matrix multiplication to extract just the rows of the underlying sparse matrix that we want. We can optimize this by doing indexing on that level, like
A[i:i, :]
, but that is still pretty slow. What seems to be faster is to create an auxiliary variable to represent the output, and constraint each entry on the MOI level viaScalarAffineFunctions
to the value it is supposed to be. This essentially lowers things more quickly to MOI, so at the convex level after indexing, we only have the output variables, and we have dropped all the inputs, rather than holding onto them until the very end to lower everything to MOI then.With this, the problem in #254 is much improved over the status quo (~190 GB of allocations, 20min+ runtime, or OOM):
although still not as fast as in #613.
So IMO this shows that this is a good optimization to make indexing a lot faster/allocation-heavy, while #614 is still valuable for this codebase.
So far in this PR, I have only addressed one of the two forms of indexing, and only for real valued problems. If we like this approach though, it should be somewhat straightforward to extend to the others.I did theseAlso, some of the tests are failing, since introducing a new auxiliary variable changes the problem formulations. I don't see any failures that look like an implementation issue though.