improve performance of IndexAtom a little #615

ericphanson · 2024-05-04T14:05:32Z

This makes the issues highlighted in #614 less bad. It is still suboptimal (and IMO we should address #614 directly by removing our uses of scalar indexing), but it would still be nice if indexing wasn't quite so darn slow.

The issue is for each individual index we do a sparse matrix multiplication to extract just the rows of the underlying sparse matrix that we want. We can optimize this by doing indexing on that level, like A[i:i, :], but that is still pretty slow. What seems to be faster is to create an auxiliary variable to represent the output, and constraint each entry on the MOI level via ScalarAffineFunctions to the value it is supposed to be. This essentially lowers things more quickly to MOI, so at the convex level after indexing, we only have the output variables, and we have dropped all the inputs, rather than holding onto them until the very end to lower everything to MOI then.

With this, the problem in #254 is much improved over the status quo (~190 GB of allocations, 20min+ runtime, or OOM):

 27.692581 seconds (30.11 M allocations: 1.775 GiB, 7.83% gc time)

although still not as fast as in #613.
So IMO this shows that this is a good optimization to make indexing a lot faster/allocation-heavy, while #614 is still valuable for this codebase.

So far in this PR, I have only addressed one of the two forms of indexing, and only for real valued problems. If we like this approach though, it should be somewhat straightforward to extend to the others. I did these

Also, some of the tests are failing, since introducing a new auxiliary variable changes the problem formulations. I don't see any failures that look like an implementation issue though.

ericphanson · 2024-05-04T22:45:22Z

I could use some help figuring out how to update the tests, since I am not familiar with the new test_atoms stuff.

ericphanson · 2024-05-04T23:45:28Z

I think they will be fewer tests to update once #614 is done, since there will be less indexing in various atoms

odow · 2024-05-05T00:21:51Z

I'll take a look after we merge #614.

And yes, the variable order of the test_atoms is finicky 😄. So moving the order in which atoms are conic_form!ed can make a difference.

odow · 2024-05-05T01:38:17Z

I'll rebase and take a look.

odow · 2024-05-05T01:48:15Z

I don't know about this one. It isn't obvious that we want to add every index operation as a scalar variable.

ericphanson · 2024-05-05T09:58:13Z

Good point, this is probably pushing work to solve time… on a technical level, when we do scalar indexing, we end up with a 1xn matrix. There’s two issues: row indexing of CSC matrices is slow, and we’d kind of like to stop composing forward with future operations, do do the rest of them, probably ending up with a 1x1 matrix, then compose that with our 1xn, instead of repeatedly updating our 1xn matrix with each operation.

Here, we don’t directly solve the first issue but we at least do the row indexing with minimal allocations, and we solve the second issue by starting a new variable and constraining it to be equal to the output of the old one.

To improve the row indexing thing, we could convert to another representation that has fast row indexing, but that’s super inefficient if we have to do it separately each time we index that matrix. We could probably do some caching though…

for the second, we could go back to a lazier representation. At first I tried a fully lazy representation where we keep each operation separate and only compose when we have to. That’s why it’s called a SparseTape, originally it had a vector of operations. I think it could make sense to do something in between, like only being lazy when it seems likely to be profitable. Maybe also with a linked list rather than a vector.

if I have time I will give it a try

This reverts commit 55547c5.

ericphanson · 2024-05-05T11:51:41Z

I tried a few things without much luck. The current state now is a fairly simple change, just optimizing the indexing directly without adding any more caches or auxiliary variables etc.

It provides a ~2x speedup for indexing heavy-code and reduces allocations:

julia> using Convex

julia> import MathOptInterface as MOI

julia> let
           m = 10_000
           x = Variable(m, 1)
           constraints = []
           for i in 1:m
               push!(constraints, x[i] >= 0)
           end
           problem = minimize(sum(x), identity.(constraints))
           @time context = Convex.Context(problem, MOI.Utilities.Model{Float64})
       end;
  1.309713 seconds (621.66 k allocations: 2.269 GiB, 28.88% gc time) # master
  0.623630 seconds (431.66 k allocations: 1.513 GiB, 18.64% gc time) # PR

However if I backport the old/slow ExpAtom definition, I still OOM on the #254 problem:

using Convex, JLD2, MathOptInterface
import MathOptInterface as MOI
file = JLD2.load(joinpath("/Users/eph/Downloads", "cvx_example.jld2"))

function Convex.new_conic_form!(context::Convex.Context{T}, e::Convex.ExpAtom) where {T}
    # exp(x) \leq z  <=>  (x,1,z) \in ExpCone
    x = e.children[1]
    m, n = size(x)
    z = Variable(m, n)
    for i in 1:m, j in 1:n
        f = vcat(x[i, j], 1, z[i, j])
        add_constraint!(context, Convex.GenericConstraint{MOI.ExponentialCone}(f))
    end
    return Convex.conic_form!(context, z)
end

let
    # m = 2000
    # Adj = file["Adj"][1:m, :]
    # N = file["N"][1:m, :, :]

    Adj = file["Adj"]
    N = file["N"]
    x = Variable(2730)
    S, V, T = size(N)
    lN = log.(N)
    M = vec(sum(N[:, :, 2:T]; dims=(2, 3)))
    H = Adj * x
    problem = maximize(-sum(logsumexp(lN[:, v, t] - H) for t = 1:T-1, v = 1:V) - dot(M, H), [x ≥ -1e2, x ≤ 1e2])

    @time context = Convex.Context(problem, MOI.Utilities.Model{Float64});
end;

My goal with this PR was to make it so that if users use indexing the way ExpAtom used to, it would be more tolerable and e.g. not OOM on #254. But the only way I've found to do that so far is with the auxiliary variables, which maybe is just pushing the issue down the line to the solver to handle. Here I still OOM. I also tried an approach with an extra cache, where we cache copy(transpose(A)) inside _index, and then use it as copy(transpose(A_transpose[:, keep_rows])) instead of A[keep_rows, :]. This provides another ~2x speedup on these benchmark problems, but it doesn't really solve the issue (the full #254 problem still OOMs) and I don't think it's worth the downsides (including potentially causing performance issues elsewhere for problems that only index a little bit, but on a lot of different large objects).

I also tried a semi-lazy way, but I realized/remembered the issue with laziness is that we have to collapse it to a single operation as soon as we want to do do something like vcat or + that combines two different branches of the tree, as opposed to just extending the tree with another operation. So any amount of laziness just adds complexity and not much benefit, since we can't defer things very long.

So anyway, I can't really figure out a better approach here. I do think the auxiliary variable thing could be worth it in the end, but we would need to performance test it end-to-end to account for solve time, and that could be a different PR. I think this one is at least a strict improvement over the status quo, but indexing is still very slow.

src/VectorAffineFunctionAsMatrix.jl

codecov · 2024-05-05T12:07:35Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.80%. Comparing base (9b619a2) to head (0a991f1).
Report is 5 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #615      +/-   ##
==========================================
- Coverage   97.88%   97.80%   -0.09%     
==========================================
  Files          88       88              
  Lines        5062     5095      +33     
==========================================
+ Hits         4955     4983      +28     
- Misses        107      112       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

odow · 2024-05-06T00:39:33Z

Let me take a look at the test failure.

ericphanson marked this pull request as draft May 4, 2024 15:24

ericphanson marked this pull request as ready for review May 4, 2024 22:45

ericphanson added 7 commits May 5, 2024 13:38

wip

70484c5

add timings

26ffe8b

keep good attempt

dfcf8c2

format

99b4a2a

wip

ac2a005

rm slow option

cc20976

generalize to complex case and linear indexing

288f874

odow force-pushed the eph/optimize-indexing branch from 2eb085c to 288f874 Compare May 5, 2024 01:38

odow mentioned this pull request May 5, 2024

[docs] update Changelog with latest changes #589

Merged

57 tasks

ericphanson added 3 commits May 5, 2024 13:08

simplify

2bba5b8

try semi-lazy approach

55547c5

Revert "try semi-lazy approach"

826fe0b

This reverts commit 55547c5.

ericphanson changed the title ~~WIP: optimize IndexAtom~~ improve performance of IndexAtom a little May 5, 2024

ericphanson commented May 5, 2024

View reviewed changes

src/VectorAffineFunctionAsMatrix.jl Outdated Show resolved Hide resolved

Update src/VectorAffineFunctionAsMatrix.jl

122bbf8

add test to cover lines

bc90214

ericphanson requested a review from odow May 5, 2024 22:58

ericphanson and others added 2 commits May 6, 2024 01:33

rm incorrect assertion, test out-of-order indexing

6d461bb

Update IndexAtom.jl

a3f3ad0

odow added 2 commits May 6, 2024 12:40

Update

473f44a

Update

0a991f1

odow approved these changes May 6, 2024

View reviewed changes

odow merged commit 5fea8c3 into master May 6, 2024
9 of 10 checks passed

odow deleted the eph/optimize-indexing branch May 6, 2024 01:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve performance of IndexAtom a little #615

improve performance of IndexAtom a little #615

ericphanson commented May 4, 2024 •

edited

Loading

ericphanson commented May 4, 2024

ericphanson commented May 4, 2024

odow commented May 5, 2024

odow commented May 5, 2024

odow commented May 5, 2024

ericphanson commented May 5, 2024

ericphanson commented May 5, 2024 •

edited

Loading

codecov bot commented May 5, 2024 •

edited

Loading

odow commented May 6, 2024

improve performance of IndexAtom a little #615

improve performance of IndexAtom a little #615

Conversation

ericphanson commented May 4, 2024 • edited Loading

ericphanson commented May 4, 2024

ericphanson commented May 4, 2024

odow commented May 5, 2024

odow commented May 5, 2024

odow commented May 5, 2024

ericphanson commented May 5, 2024

ericphanson commented May 5, 2024 • edited Loading

codecov bot commented May 5, 2024 • edited Loading

Codecov Report

odow commented May 6, 2024

ericphanson commented May 4, 2024 •

edited

Loading

ericphanson commented May 5, 2024 •

edited

Loading

codecov bot commented May 5, 2024 •

edited

Loading