-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solving tall, sparse linear systems is insanely slow #283
Comments
It's far worse for sparse arrays: using LinearSolve, SparseArrays, BenchmarkTools, LDLFactorizations
A = sprandn(5000, 100, 0.1)
b = randn(5000)
@assert isapprox(solve(LinearProblem(A, b)).u, ldl(A' * A) \ (A' * b))
@btime solve(LinearProblem($A, $b)).u;
@btime ldl(($A)' * $A) \ (($A)' * $b);
LinearSolve is approximately 20,000x slower here! |
I should add I'm on an Apple silicon Mac. Not sure how much difference that will make. |
That’s a OK way to solve least squares only when we know julia> using LinearSolve, LinearAlgebra
julia> vandermonde(xs, basis, m) = [basis(x, i) for x in xs,
i in 1:m]
vandermonde (generic function with 2 methods)
julia> V = vandermonde(range(0, 1, length=128), (x, i)->i==1 ? one(x) : x^(i-1), 20);
julia> b = rand(128);
julia> x_normal = (V'V) \ (V'b);
julia> x = solve(LinearProblem(V, b)).u;
julia> norm(V * x - b)
2.935841792541393
julia> norm(V * x_normal - b)
3.141337584566186
julia> norm(x - x_normal)
3.881181867110321e12 |
On my M1 mac: julia> BLAS.set_num_threads(1);
julia> @btime qr($A)\$b;
1.365 ms (9 allocations: 847.33 KiB)
julia> @btime solve(LinearProblem($A, $b)).u;
2.607 ms (29 allocations: 1.66 MiB)
julia> BLAS.set_num_threads(4);
julia> @btime qr($A)\$b;
1.142 ms (9 allocations: 847.33 KiB)
julia> @btime solve(LinearProblem($A, $b)).u;
2.238 ms (29 allocations: 1.66 MiB) So |
It should be using a QR in this instance, but it has to runtime prove the non-square assumption. With the assumption it's likely fine. |
2x overhead sounds like a hefty performance bug. |
There's way to reduce it but it would mangle the code a bit and I haven't gotten around to it. But in general I'd hope cases that want performance set the assumptions. |
@ChrisRackauckas How can I tell LinearSolve that A is a priori well conditioned? |
You don't need to know that it's well conditioned to use QR. You just need to know that the matrix is tall. That said, IMO LinearSolve should really check the shape and use QR if appropriate. |
He wants to say it is well conditioned to use Personally, I think if someone has that kind of information about the problem (i.e. how well conditioned it is), they should just choose the appropriate way to solve the problem themselves. Conditioning is expensive to calculate (more expensive than solving generally is, i.e. normally requires |
That's what it's doing, and that's exactly the problem.
We should add a |
How can a shape check take over 1 ms? |
I think it is literally calling |
Oh yeah we need to improve the cache init since there's no |
I actually care about the case where A is well conditioned and sparse, and could be solved 4 orders of magnitude faster using normal equations.
Currently that's exactly what I'm doing. It's just that LinearSolve claims to:
and I feel my experience shows it doesn't really deliver on that. The method should be at least as fas as QR for tall, sparse matrices. A normal solver could be added, as could a way to tell the system that A is well conditioned. I appreciate it's a work in progress, so please take this as constructive feedback rather than criticism. |
This isn't true. QR will be fairly similar speed (2x for dense up to 10x for sparse) slower and a lot more accurate. For the sparse example you gave, my laptop gives
|
@oscardssmith Fair point. I hadn't understood that QR was much faster for the sparse case. Full timings for me: julia> @btime solve(LinearProblem($A, $b)).u;
22.228 s (54 allocations: 20.41 MiB)
julia> @btime qr($A) \ $b;
8.853 ms (159 allocations: 23.40 MiB)
julia> @btime Symmetric(($A)' * $A) \ (($A)' * $b);
1.288 ms (75 allocations: 1.70 MiB)
julia> @btime ldl(($A)' * $A) \ (($A)' * $b);
1.206 ms (47 allocations: 1.37 MiB) So QR is already 2500x faster than what LinearSolve is doing, and accurate with poor conditioning (will test this on @YingboMa's example). This must surely be a bug in LinearSolve. However, I do care about the further 8x speedup I get using my approach, and would prefer to use it from LinearSolve, and get all the benefits that LinearSolve claims to give. |
Yes, I've said multiple times here that in this case it's just doing a QR. So there is something odd here. In fact, the profile looks like it's some kind of weird inference bug or something. That most be the oddest profile I've seen. It spends <1% of the time running code. The other 99% of the time is 🤷. Haven't seen this before, can someone double check the profile? |
I've seen things like this before. The fix is to start Julia with a single thread. |
Interesting. I found the issue. In #188 there was fd05fb1 which versioned using LinearSolve, SparseArrays, BenchmarkTools
A = sprandn(5000, 100, 0.1)
b = randn(5000)
prob = LinearProblem(A, b)
assumptions = LinearSolve.OperatorAssumptions(false)
@time solve(prob; assumptions)
using LinearAlgebra
@time qr!(A) The new @Wimmerer is this |
That was already fixed. |
using LinearSolve, SparseArrays, BenchmarkTools, LDLFactorizations, LinearAlgebra
A = sprandn(5000, 100, 0.1)
b = randn(5000)
@assert isapprox(solve(LinearProblem(A, b)).u, ldl(A' * A) \ (A' * b))
@btime solve(LinearProblem($A, $b)).u; # 16.984 ms (326 allocations: 24.16 MiB)
@btime qr($A) \ $b # 17.135 ms (159 allocations: 23.35 MiB)
@btime ldl(($A)' * $A) \ (($A)' * $b); # 1.186 ms (47 allocations: 1.37 MiB) |
This is the last thing of #283 to be fixed up. It allows for different levels of conditioning assumptions, in order to allow for making a trade-off between performance and numerical stability. Right now I've handled the very ill-conditioned cases but not the well-conditioned case (which is the case the issue is interested in).
#290 adds the new algorithms for well-conditioned matrices and adds them to the default algorithm. Results: using LinearSolve, LinearAlgebra, BenchmarkTools
A = randn(1000, 100)
b = randn(1000)
@assert isapprox(solve(LinearProblem(A, b)).u, Symmetric(A' * A) \ (A' * b))
@btime solve(LinearProblem($A, $b)).u; # 3.140 ms (26 allocations: 857.53 KiB)
@btime solve(LinearProblem($A, $b), $(NormalCholeskyFactorization())).u; # 411.000 μs (27 allocations: 163.36 KiB)
@btime solve(LinearProblem($A, $b), $(NormalBunchKaufmanFactorization())).u; # 400.500 μs (27 allocations: 210.09 KiB)
@btime solve(LinearProblem($A, $b), assumptions = $(OperatorAssumptions(false; condition = OperatorCondition.WellConditioned))).u; # 401.000 μs (18 allocations: 162.89 KiB)
@btime Symmetric(($A)' * $A) \ (($A)' * $b); # 400.000 μs (9 allocations: 209.09 KiB)
@btime factorize(Symmetric((A)' * A)) \ (($A)' * $b); # 404.300 μs (12 allocations: 209.19 KiB)
using LinearSolve, SparseArrays, BenchmarkTools, LDLFactorizations, LinearAlgebra
A = sprandn(5000, 100, 0.1)
b = randn(5000)
@assert isapprox(solve(LinearProblem(A, b)).u, ldl(A' * A) \ (A' * b))
@btime solve(LinearProblem($A, $b)).u; # 16.984 ms (326 allocations: 24.16 MiB)
@btime solve(LinearProblem($A, $b), $(NormalCholeskyFactorization())).u; # 1.581 ms (130 allocations: 1.70 MiB)
@btime solve(LinearProblem($A, $b), assumptions = $(OperatorAssumptions(false; condition = OperatorCondition.WellConditioned))).u; # 1.535 ms (124 allocations: 1.70 MiB)
@btime qr($A) \ $b # 17.135 ms (159 allocations: 23.35 MiB)
@btime factorize(Symmetric((A)' * A)) \ (($A)' * $b); # 1.186 ms (47 allocations: 1.37 MiB)
@btime cholesky(Symmetric((A)' * A)) \ (($A)' * $b); # 1.186 ms (47 allocations: 1.37 MiB)
@btime ldlt(((A)' * A)) \ (($A)' * $b); # 1.186 ms (47 allocations: 1.37 MiB)
@btime ldl(($A)' * $A) \ (($A)' * $b); # 1.186 ms (47 allocations: 1.37 MiB) I'm not entirely sure where the overhead on the sparse case is coming from, but at least it's rather close. |
Solving the normal equations is much faster for tall linear systems:
Is there a reason why LinearSolve's smart algorithm selection doesn't choose to do this?
The text was updated successfully, but these errors were encountered: