-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparsity-preserving outer products #24980
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. Thanks for adding this functionality.
|
5421d1d
to
0aa366b
Compare
@jmert Would it be possible for you to revive this PR so that it can help resolve various open issues? |
I have been meaning to return to this for quite some time, so yes, but possibly not until mid-January or so. Between the holidays and my own academic work, I don't have much free time in the next few weeks. (Maybe I'll squeeze it in between things as my own form of relaxing and getting away from Matlab ;-P) |
As a reference point for myself — current master (v1.2.0-DEV.27) using similar benchmarks as the original post destroys my computer for a few minutes. I think the second
|
I've reimplemented this PR on the current master. The big differences from before are:
The updated benchmark I'm working with is: using BenchmarkTools, SparseArrays, Test
m,n = (2_000, 10_000);
k = 1000;
A = sprand(m, n, 0.01);
a = view(A, :, k);
u = A[:, k];
x = sparse(rand(n));
X = sparse(collect(1:n), fill(k, n), x, n, n);
suite = BenchmarkGroup()
# outer products only
suite["x × u'"] = @benchmarkable $x * $(u');
suite["x × a'"] = @benchmarkable $x * $(a');
# quadratic matrix products which motivated
suite["A × X × A'"] = @benchmarkable $A * $X * $(A');
suite["(A × x) × a'"] = @benchmarkable ($A * $x) * $(a');
tune!(suite)
before = run(suite)
using Revise
Revise.track(Base)
after = run(suite)
if true
println("Before vs after for specific operations")
println("=======================================\n")
for key in keys(after)
t1, t0 = minimum.((after[key], before[key]))
r = judge(t1, t0)
print(key, ": ")
show(stdout, MIME"text/plain"(), r)
println()
end
println("\n\n")
println("Optimized quadratic product, before and after")
println("=============================================\n")
for (trial,name) in [(before,"before"), (after,"after")]
key1 = "(A × x) × a'"
key2 = "A × X × A'"
t1, t0 = minimum.((trial[key1], trial[key2]))
r = judge(t1, t0)
print("$key1 vs $key2 ($name): ")
show(stdout, MIME"text/plain"(), r)
println()
end
end and gives results Before vs after for specific operations
=======================================
(A × x) × a': BenchmarkTools.TrialJudgement:
time: -99.82% => improvement (5.00% tolerance)
memory: -97.70% => improvement (1.00% tolerance)
x × a': BenchmarkTools.TrialJudgement:
time: -99.96% => improvement (5.00% tolerance)
memory: -97.89% => improvement (1.00% tolerance)
A × X × A': BenchmarkTools.TrialJudgement:
time: +0.38% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
x × u': BenchmarkTools.TrialJudgement:
time: -98.03% => improvement (5.00% tolerance)
memory: -98.95% => improvement (1.00% tolerance)
Optimized quadratic product, before and after
=============================================
(A × x) × a' vs A × X × A' (before): BenchmarkTools.TrialJudgement:
time: +6941.20% => regression (5.00% tolerance)
memory: +290.58% => regression (1.00% tolerance)
(A × x) × a' vs A × X × A' (after): BenchmarkTools.TrialJudgement:
time: -87.66% => improvement (5.00% tolerance)
memory: -91.02% => improvement (1.00% tolerance) |
cc @mbauman, @KristofferC |
The outer product part is straightforward. If @mbauman or @KristofferC can check the broadcasting bits, we can probably merge. @andreasnoack mentioned about Also, it is fine to get this performance improvement in and do further improvements in other PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm impressed — the sparse broadcasting machinery has survived a quite a few incremental code changes and is a bit of a tangled mess. You've adeptly hooked in right where it'd make sense.
I think doing this kind of peep-hole optimization is worth doing and with a few minor renames we can set the stage for supporting more and more array types without needing to _sparsifystructured
them.
Am I correct in understanding that the only functional change here is that |
IIUC, this used to be ok, but after the whole adjoint business, the kron of sparse with a transposed matrix became dense. That's one of the cases this fixes, views being the other one. |
Correct. The old PR had put in a stub function, but any performance "improvement" implied by providing an inplace update would have been a lie since I wasn't implementing such a feature at that time. If someone wants to tackle that, I agree that it can be in a new PR.
Thanks!! I did have to abandon several approaches after realizing it wasn't going to work out as I'd hoped.
@ViralBShah has the current story correct — since I started this PR a year ago, the switch to After the rewrite to current master, I dropped one of my original goals — having dense-sparse outer products produce a sparse output. I decided I was unsure enough of the broadcasting implementation at this point to spend time working on that goal; issue #26613 would be a good place to revive that conversation and let those enhancements occur in synchrony with a future PR to extend this implementation. |
@mbauman @andreasnoack Is this good to merge? |
I did complete a test of just SparseArrays and broadcast test suites that passed locally when switching |
Since it was requested, please go ahead. |
Commit added. This is ready to go from my perspective if/when it passes CI. |
Can you rebase on master (unless you did it in the last day or two) so that we pick up all the CI fixes? |
* Change is_specialcase_sparse_broadcast -> can_skip_sparsification. * Lift parent(y) to one function earlier for clarify
No problem — done. |
Note that CI runs on the merge commit so rebasing doesn't really do anything w.r.t to CI. I don't really see how this can be backported to a patch release since it is changing behavior? Tentatively removing backport label. Also, this needs a NEWS entry. |
Agreed — this shouldn't be backported. |
Should |
This will be utterly amazing for my research and I'd like to thank you all for working on this. |
The feature was added to Julia in time for v1.2 in JuliaLang/julia#24980, so get rid of the custom `outer()` method here and rewrite `quadprod()` in terms of just standard matrix methods. Julia v1.2 is the minimum-supported version at this point, so no need to worry about backporting the functionality. In the future, this function may yet still go away since the implementation is nearly trivial at this point, but that can be a follow-up PR.
The feature was added to Julia in time for v1.2 in JuliaLang/julia#24980, so get rid of the custom `outer()` method here and rewrite `quadprod()` in terms of just standard matrix methods. Julia v1.2 is the minimum-supported version at this point, so no need to worry about backporting the functionality. In the future, this function may yet still go away since the implementation is nearly trivial at this point, but that can be a follow-up PR.
The feature was added to Julia in time for v1.2 in JuliaLang/julia#24980, so get rid of the custom `outer()` method here and rewrite `quadprod()` in terms of just standard matrix methods. Julia v1.2 is the minimum-supported version at this point, so no need to worry about backporting the functionality. In the future, this function may yet still go away since the implementation is nearly trivial at this point, but that can be a follow-up PR.
First, this removes the option to do row-wise quadratic products since they aren't being used within this package anyway. That allows removing the "keyword" argument for choosing which direction to apply. Second, the original implementation was optimized for the fast vector outer product (that I got added to SparseArrays in JuliaLang/julia#24980 and made it into Julia v1.2), but when scaling up to multiple columns the performance was disastrous because the dispatch of a transposed view led to generic matrix multiplication which did the full dense-dense style loops. By not using views, we get the desired sparse matrix multiplication instead.
First, this removes the option to do row-wise quadratic products since they aren't being used within this package anyway. That allows removing the "keyword" argument for choosing which direction to apply. Second, the original implementation was optimized for the fast vector outer product (that I got added to SparseArrays in JuliaLang/julia#24980 and made it into Julia v1.2), but when scaling up to multiple columns the performance was disastrous because the dispatch of a transposed view led to generic matrix multiplication which did the full dense-dense style loops. By not using views, we get the desired sparse matrix multiplication instead.
First, this removes the option to do row-wise quadratic products since they aren't being used within this package anyway. That allows removing the "keyword" argument for choosing which direction to apply. Second, the original implementation was optimized for the fast vector outer product (that I got added to SparseArrays in JuliaLang/julia#24980 and made it into Julia v1.2), but when scaling up to multiple columns the performance was disastrous because the dispatch of a transposed view led to generic matrix multiplication which did the full dense-dense style loops. By not using views, we get the desired sparse matrix multiplication instead.
First, this removes the option to do row-wise quadratic products since they aren't being used within this package anyway. That allows removing the "keyword" argument for choosing which direction to apply. Second, the original implementation was optimized for the fast vector outer product (that I got added to SparseArrays in JuliaLang/julia#24980 and made it into Julia v1.2), but when scaling up to multiple columns the performance was disastrous because the dispatch of a transposed view led to generic matrix multiplication which did the full dense-dense style loops. By not using views, we get the desired sparse matrix multiplication instead.
The feature was added to Julia in time for v1.2 in JuliaLang/julia#24980, so get rid of the custom `outer()` method here and rewrite `quadprod()` in terms of just standard matrix methods. Julia v1.2 is the minimum-supported version at this point, so no need to worry about backporting the functionality. In the future, this function may yet still go away since the implementation is nearly trivial at this point, but that can be a follow-up PR.
First, this removes the option to do row-wise quadratic products since they aren't being used within this package anyway. That allows removing the "keyword" argument for choosing which direction to apply. Second, the original implementation was optimized for the fast vector outer product (that I got added to SparseArrays in JuliaLang/julia#24980 and made it into Julia v1.2), but when scaling up to multiple columns the performance was disastrous because the dispatch of a transposed view led to generic matrix multiplication which did the full dense-dense style loops. By not using views, we get the desired sparse matrix multiplication instead.
This PR adds sparsity-preserving outer products of sparse vectors and views into sparse matrices, implemented as methods of
kron
and*
.The motivation that caused me to dig into this is the desire to have fast/efficient quadratic matrix products. My specific case is to compute a very large sparse-dense-sparse product, but computation of the middle dense matrix can be parallelized as computations of single columns.
Some example code showing the impact:
On master (Version 0.7.0-DEV.2770, Commit 11d5a53):
This PR:
I've marked this as a work-in-progress because I'd welcome comments on how to potentially better integrate the new methods with existing code. The first commit could be thrown away / extracted to separate PR — it was just something I noticed while grapping with current code —
and the tests still need to add a.Complex
case which exercises handling of theRowVector{<:Any, ConjArray}
case