Solving tall, sparse linear systems is insanely slow #283

ojwoodford · 2023-03-23T21:24:03Z

Solving the normal equations is much faster for tall linear systems:

using LinearSolve, BenchmarkTools
A = randn(1000, 100)
b = randn(1000)
@assert isapprox(solve(LinearProblem(A, b)).u, Symmetric(A' * A) \ (A' * b))
@btime solve(LinearProblem($A, $b)).u;
@btime Symmetric(($A)' * $A) \ (($A)' * $b);

2.471 ms (32 allocations: 1.66 MiB)
179.958 μs (9 allocations: 209.09 KiB)

Is there a reason why LinearSolve's smart algorithm selection doesn't choose to do this?

ojwoodford · 2023-03-23T22:00:25Z

It's far worse for sparse arrays:

using LinearSolve, SparseArrays, BenchmarkTools, LDLFactorizations
A = sprandn(5000, 100, 0.1)
b = randn(5000)
@assert isapprox(solve(LinearProblem(A, b)).u, ldl(A' * A) \ (A' * b))
@btime solve(LinearProblem($A, $b)).u;
@btime ldl(($A)' * $A) \ (($A)' * $b);

22.392 s (54 allocations: 20.41 MiB)
1.220 ms (47 allocations: 1.37 MiB)

LinearSolve is approximately 20,000x slower here!

ojwoodford · 2023-03-23T22:07:49Z

I should add I'm on an Apple silicon Mac. Not sure how much difference that will make.

YingboMa · 2023-03-24T14:40:55Z

That’s a OK way to solve least squares only when we know A has a low condition number a priori. It's not enough to test linear algebra algorithms using random matrices. Consider:

julia> using LinearSolve, LinearAlgebra

julia> vandermonde(xs, basis, m) = [basis(x, i) for x in xs,
                                                 i in 1:m]
vandermonde (generic function with 2 methods)

julia> V = vandermonde(range(0, 1, length=128), (x, i)->i==1 ? one(x) : x^(i-1), 20);

julia> b = rand(128);

julia> x_normal = (V'V) \ (V'b);

julia> x = solve(LinearProblem(V, b)).u;

julia> norm(V * x - b)
2.935841792541393

julia> norm(V * x_normal - b)
3.141337584566186

julia> norm(x - x_normal)
3.881181867110321e12

chriselrod · 2023-03-24T14:52:07Z

On my M1 mac:

julia> BLAS.set_num_threads(1);

julia> @btime qr($A)\$b;
  1.365 ms (9 allocations: 847.33 KiB)

julia> @btime solve(LinearProblem($A, $b)).u;
  2.607 ms (29 allocations: 1.66 MiB)

julia> BLAS.set_num_threads(4);

julia> @btime qr($A)\$b;
  1.142 ms (9 allocations: 847.33 KiB)

julia> @btime solve(LinearProblem($A, $b)).u;
  2.238 ms (29 allocations: 1.66 MiB)

So qr(A)\b seems like a pretty big improvement over what solve(LinearProblem(A, b)).u is doing?

ChrisRackauckas · 2023-03-24T14:55:11Z

It should be using a QR in this instance, but it has to runtime prove the non-square assumption. With the assumption it's likely fine.

chriselrod · 2023-03-24T14:56:15Z

2x overhead sounds like a hefty performance bug.

ChrisRackauckas · 2023-03-24T15:34:31Z

There's way to reduce it but it would mangle the code a bit and I haven't gotten around to it. But in general I'd hope cases that want performance set the assumptions.

ojwoodford · 2023-03-24T16:37:39Z

In general I'd hope cases that want performance set the assumptions.

@ChrisRackauckas How can I tell LinearSolve that A is a priori well conditioned?

oscardssmith · 2023-03-24T16:39:54Z

You don't need to know that it's well conditioned to use QR. You just need to know that the matrix is tall. That said, IMO LinearSolve should really check the shape and use QR if appropriate.

chriselrod · 2023-03-24T17:04:07Z

You don't need to know that it's well conditioned to use QR.

He wants to say it is well conditioned to use Symmetric((A)' * A) \ ((A)' * b).
I suggested qr because it doesn't require A to be well conditioned.

Personally, I think if someone has that kind of information about the problem (i.e. how well conditioned it is), they should just choose the appropriate way to solve the problem themselves.

Conditioning is expensive to calculate (more expensive than solving generally is, i.e. normally requires svd [which could be used to solve]).

ChrisRackauckas · 2023-03-24T17:09:17Z

IMO LinearSolve should really check the shape and use QR if appropriate.

That's what it's doing, and that's exactly the problem.

He wants to say it is well conditioned to use Symmetric((A)' * A) \ ((A)' * b).

We should add a OperatorAssumption for conditioning.

chriselrod · 2023-03-24T17:09:51Z

That's what it's doing, and that's exactly the problem.

How can a shape check take over 1 ms?

chriselrod · 2023-03-24T17:14:59Z

I think it is literally calling qr! twice: first in init_cacheval, and then again under a sub-solve call.

ChrisRackauckas · 2023-03-24T17:18:06Z

Oh yeah we need to improve the cache init since there's no ArrayInterface.qr_instance(A) implemented.

ojwoodford · 2023-03-24T18:15:06Z

I suggested qr because it doesn't require A to be well conditioned.

I actually care about the case where A is well conditioned and sparse, and could be solved 4 orders of magnitude faster using normal equations.

Personally, I think if someone has that kind of information about the problem (i.e. how well conditioned it is), they should just choose the appropriate way to solve the problem themselves.

Currently that's exactly what I'm doing. It's just that LinearSolve claims to:

Fast implementations of linear solving algorithms (Very slow vs QR for tall, sparse A; surely a bug)
Allow for swapping out the linear solver that is used (No normal equation solver?)
A polyalgorithm that smartly chooses between these methods (No way to specify the problem is well conditioned)

and I feel my experience shows it doesn't really deliver on that. The method should be at least as fas as QR for tall, sparse matrices. A normal solver could be added, as could a way to tell the system that A is well conditioned. I appreciate it's a work in progress, so please take this as constructive feedback rather than criticism.

oscardssmith · 2023-03-24T18:21:50Z

I actually care about the case where A is well conditioned and sparse, and could be solved 4 orders of magnitude faster using normal equations.

This isn't true. QR will be fairly similar speed (2x for dense up to 10x for sparse) slower and a lot more accurate.

For the sparse example you gave, my laptop gives

julia> @btime qr(A) \ b;
  11.515 ms (167 allocations: 23.39 MiB)
julia> @btime  Symmetric(A' * A) \ (A' * b);
  2.184 ms (79 allocations: 1.70 MiB)

ojwoodford · 2023-03-24T19:24:12Z

@oscardssmith Fair point. I hadn't understood that QR was much faster for the sparse case. Full timings for me:

julia> @btime solve(LinearProblem($A, $b)).u;
  22.228 s (54 allocations: 20.41 MiB)

julia> @btime qr($A) \ $b;
  8.853 ms (159 allocations: 23.40 MiB)

julia> @btime Symmetric(($A)' * $A) \ (($A)' * $b);
  1.288 ms (75 allocations: 1.70 MiB)

julia> @btime ldl(($A)' * $A) \ (($A)' * $b);
  1.206 ms (47 allocations: 1.37 MiB)

So QR is already 2500x faster than what LinearSolve is doing, and accurate with poor conditioning (will test this on @YingboMa's example). This must surely be a bug in LinearSolve.

However, I do care about the further 8x speedup I get using my approach, and would prefer to use it from LinearSolve, and get all the benefits that LinearSolve claims to give.

ChrisRackauckas · 2023-03-24T22:33:28Z

So QR is already 2500x faster than what LinearSolve is doing, and accurate with poor conditioning (will test this on @YingboMa's example). This must surely be a bug in LinearSolve.

Yes, I've said multiple times here that in this case it's just doing a QR. So there is something odd here. In fact, the profile looks like it's some kind of weird inference bug or something.

That most be the oddest profile I've seen. It spends <1% of the time running code. The other 99% of the time is 🤷. Haven't seen this before, can someone double check the profile?

chriselrod · 2023-03-24T22:47:03Z

Haven't seen this before, can someone double check the profile?

I've seen things like this before. The fix is to start Julia with a single thread.
I profiled it earlier and saw qr! calls from two different places within solve.

ChrisRackauckas · 2023-03-24T22:47:55Z

Interesting. I found the issue. In #188 there was fd05fb1 which versioned qr! to only be used for (at the time) Julia master on sparse arrays. And if you check:

using LinearSolve, SparseArrays, BenchmarkTools
A = sprandn(5000, 100, 0.1)
b = randn(5000)
prob = LinearProblem(A, b)
assumptions = LinearSolve.OperatorAssumptions(false)
@time solve(prob; assumptions)

using LinearAlgebra
@time qr!(A)

The new qr! on sparsearrays is absurdly slow. So the fix is just to use the qr instead.

@Wimmerer is this qr! issue in sparsearrays known? I couldn't find an issue on it.

It's really slow, see #283

ChrisRackauckas · 2023-03-24T22:52:20Z

I profiled it earlier and saw qr! calls from two different places within solve.

That was already fixed.

ChrisRackauckas · 2023-03-25T10:49:45Z

using LinearSolve, SparseArrays, BenchmarkTools, LDLFactorizations, LinearAlgebra
A = sprandn(5000, 100, 0.1)
b = randn(5000)
@assert isapprox(solve(LinearProblem(A, b)).u, ldl(A' * A) \ (A' * b))
@btime solve(LinearProblem($A, $b)).u; # 16.984 ms (326 allocations: 24.16 MiB)
@btime qr($A) \ $b # 17.135 ms (159 allocations: 23.35 MiB)
@btime ldl(($A)' * $A) \ (($A)' * $b); # 1.186 ms (47 allocations: 1.37 MiB)

This is the last thing of #283 to be fixed up. It allows for different levels of conditioning assumptions, in order to allow for making a trade-off between performance and numerical stability. Right now I've handled the very ill-conditioned cases but not the well-conditioned case (which is the case the issue is interested in).

Continuing on #289 to close #283

ChrisRackauckas · 2023-03-26T06:24:08Z

#290 adds the new algorithms for well-conditioned matrices and adds them to the default algorithm. Results:

using LinearSolve, LinearAlgebra, BenchmarkTools
A = randn(1000, 100)
b = randn(1000)
@assert isapprox(solve(LinearProblem(A, b)).u, Symmetric(A' * A) \ (A' * b))
@btime solve(LinearProblem($A, $b)).u; # 3.140 ms (26 allocations: 857.53 KiB)
@btime solve(LinearProblem($A, $b), $(NormalCholeskyFactorization())).u; # 411.000 μs (27 allocations: 163.36 KiB)
@btime solve(LinearProblem($A, $b), $(NormalBunchKaufmanFactorization())).u; # 400.500 μs (27 allocations: 210.09 KiB)
@btime solve(LinearProblem($A, $b), assumptions = $(OperatorAssumptions(false; condition = OperatorCondition.WellConditioned))).u; # 401.000 μs (18 allocations: 162.89 KiB)
@btime Symmetric(($A)' * $A) \ (($A)' * $b); # 400.000 μs (9 allocations: 209.09 KiB)
@btime factorize(Symmetric((A)' * A)) \ (($A)' * $b); # 404.300 μs (12 allocations: 209.19 KiB)

using LinearSolve, SparseArrays, BenchmarkTools, LDLFactorizations, LinearAlgebra
A = sprandn(5000, 100, 0.1)
b = randn(5000)
@assert isapprox(solve(LinearProblem(A, b)).u, ldl(A' * A) \ (A' * b))
@btime solve(LinearProblem($A, $b)).u; # 16.984 ms (326 allocations: 24.16 MiB)
@btime solve(LinearProblem($A, $b), $(NormalCholeskyFactorization())).u; # 1.581 ms (130 allocations: 1.70 MiB)
@btime solve(LinearProblem($A, $b), assumptions = $(OperatorAssumptions(false; condition = OperatorCondition.WellConditioned))).u; # 1.535 ms (124 allocations: 1.70 MiB)
@btime qr($A) \ $b # 17.135 ms (159 allocations: 23.35 MiB)
@btime factorize(Symmetric((A)' * A)) \ (($A)' * $b); # 1.186 ms (47 allocations: 1.37 MiB)
@btime cholesky(Symmetric((A)' * A)) \ (($A)' * $b); # 1.186 ms (47 allocations: 1.37 MiB)
@btime ldlt(((A)' * A)) \ (($A)' * $b); # 1.186 ms (47 allocations: 1.37 MiB)
@btime ldl(($A)' * $A) \ (($A)' * $b); # 1.186 ms (47 allocations: 1.37 MiB)

I'm not entirely sure where the overhead on the sparse case is coming from, but at least it's rather close.

ojwoodford changed the title ~~Solving the normal equations is much faster for tall linear systems~~ Solving sparse, tall linear systems is insanely slow Mar 24, 2023

ojwoodford changed the title ~~Solving sparse, tall linear systems is insanely slow~~ Solving sparse, linear systems is insanely slow Mar 24, 2023

ojwoodford changed the title ~~Solving sparse, linear systems is insanely slow~~ Solving tall, sparse linear systems is insanely slow Mar 24, 2023

ChrisRackauckas added a commit that referenced this issue Mar 24, 2023

Don't use qr! on SparseMatrixCSC

bd71331

It's really slow, see #283

ChrisRackauckas mentioned this issue Mar 24, 2023

Don't use qr! on SparseMatrixCSC #285

Merged

ChrisRackauckas mentioned this issue Mar 25, 2023

Expand OperatorAssumptions to allow for choosing conditioning #289

Merged

ChrisRackauckas added a commit that referenced this issue Mar 25, 2023

Add NormalCholeskyFactorization and WellConditioned defaults

6211d4e

Continuing on #289 to close #283

ChrisRackauckas mentioned this issue Mar 25, 2023

Add NormalCholeskyFactorization and WellConditioned defaults #290

Merged

ChrisRackauckas added a commit that referenced this issue Mar 26, 2023

Add NormalCholeskyFactorization and WellConditioned defaults

5b13e5d

Continuing on #289 to close #283

ChrisRackauckas closed this as completed in #290 Mar 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solving tall, sparse linear systems is insanely slow #283

Solving tall, sparse linear systems is insanely slow #283

ojwoodford commented Mar 23, 2023 •

edited

Loading

ojwoodford commented Mar 23, 2023 •

edited

Loading

ojwoodford commented Mar 23, 2023

YingboMa commented Mar 24, 2023 •

edited

Loading

chriselrod commented Mar 24, 2023

ChrisRackauckas commented Mar 24, 2023

chriselrod commented Mar 24, 2023

ChrisRackauckas commented Mar 24, 2023

ojwoodford commented Mar 24, 2023 •

edited

Loading

oscardssmith commented Mar 24, 2023

chriselrod commented Mar 24, 2023 •

edited

Loading

ChrisRackauckas commented Mar 24, 2023

chriselrod commented Mar 24, 2023

chriselrod commented Mar 24, 2023 •

edited

Loading

ChrisRackauckas commented Mar 24, 2023

ojwoodford commented Mar 24, 2023 •

edited

Loading

oscardssmith commented Mar 24, 2023 •

edited

Loading

ojwoodford commented Mar 24, 2023 •

edited

Loading

ChrisRackauckas commented Mar 24, 2023

chriselrod commented Mar 24, 2023

ChrisRackauckas commented Mar 24, 2023

ChrisRackauckas commented Mar 24, 2023

ChrisRackauckas commented Mar 25, 2023

ChrisRackauckas commented Mar 26, 2023

Solving tall, sparse linear systems is insanely slow #283

Solving tall, sparse linear systems is insanely slow #283

Comments

ojwoodford commented Mar 23, 2023 • edited Loading

ojwoodford commented Mar 23, 2023 • edited Loading

ojwoodford commented Mar 23, 2023

YingboMa commented Mar 24, 2023 • edited Loading

chriselrod commented Mar 24, 2023

ChrisRackauckas commented Mar 24, 2023

chriselrod commented Mar 24, 2023

ChrisRackauckas commented Mar 24, 2023

ojwoodford commented Mar 24, 2023 • edited Loading

oscardssmith commented Mar 24, 2023

chriselrod commented Mar 24, 2023 • edited Loading

ChrisRackauckas commented Mar 24, 2023

chriselrod commented Mar 24, 2023

chriselrod commented Mar 24, 2023 • edited Loading

ChrisRackauckas commented Mar 24, 2023

ojwoodford commented Mar 24, 2023 • edited Loading

oscardssmith commented Mar 24, 2023 • edited Loading

ojwoodford commented Mar 24, 2023 • edited Loading

ChrisRackauckas commented Mar 24, 2023

chriselrod commented Mar 24, 2023

ChrisRackauckas commented Mar 24, 2023

ChrisRackauckas commented Mar 24, 2023

ChrisRackauckas commented Mar 25, 2023

ChrisRackauckas commented Mar 26, 2023

ojwoodford commented Mar 23, 2023 •

edited

Loading

ojwoodford commented Mar 23, 2023 •

edited

Loading

YingboMa commented Mar 24, 2023 •

edited

Loading

ojwoodford commented Mar 24, 2023 •

edited

Loading

chriselrod commented Mar 24, 2023 •

edited

Loading

chriselrod commented Mar 24, 2023 •

edited

Loading

ojwoodford commented Mar 24, 2023 •

edited

Loading

oscardssmith commented Mar 24, 2023 •

edited

Loading

ojwoodford commented Mar 24, 2023 •

edited

Loading