Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solving tall, sparse linear systems is insanely slow #283

Closed
ojwoodford opened this issue Mar 23, 2023 · 23 comments · Fixed by #290
Closed

Solving tall, sparse linear systems is insanely slow #283

ojwoodford opened this issue Mar 23, 2023 · 23 comments · Fixed by #290

Comments

@ojwoodford
Copy link

ojwoodford commented Mar 23, 2023

Solving the normal equations is much faster for tall linear systems:

using LinearSolve, BenchmarkTools
A = randn(1000, 100)
b = randn(1000)
@assert isapprox(solve(LinearProblem(A, b)).u, Symmetric(A' * A) \ (A' * b))
@btime solve(LinearProblem($A, $b)).u;
@btime Symmetric(($A)' * $A) \ (($A)' * $b);

2.471 ms (32 allocations: 1.66 MiB)
179.958 μs (9 allocations: 209.09 KiB)

Is there a reason why LinearSolve's smart algorithm selection doesn't choose to do this?

@ojwoodford
Copy link
Author

ojwoodford commented Mar 23, 2023

It's far worse for sparse arrays:

using LinearSolve, SparseArrays, BenchmarkTools, LDLFactorizations
A = sprandn(5000, 100, 0.1)
b = randn(5000)
@assert isapprox(solve(LinearProblem(A, b)).u, ldl(A' * A) \ (A' * b))
@btime solve(LinearProblem($A, $b)).u;
@btime ldl(($A)' * $A) \ (($A)' * $b);

22.392 s (54 allocations: 20.41 MiB)
1.220 ms (47 allocations: 1.37 MiB)

LinearSolve is approximately 20,000x slower here!

@ojwoodford
Copy link
Author

I should add I'm on an Apple silicon Mac. Not sure how much difference that will make.

@YingboMa
Copy link
Member

YingboMa commented Mar 24, 2023

That’s a OK way to solve least squares only when we know A has a low condition number a priori. It's not enough to test linear algebra algorithms using random matrices. Consider:

julia> using LinearSolve, LinearAlgebra

julia> vandermonde(xs, basis, m) = [basis(x, i) for x in xs,
                                                 i in 1:m]
vandermonde (generic function with 2 methods)

julia> V = vandermonde(range(0, 1, length=128), (x, i)->i==1 ? one(x) : x^(i-1), 20);

julia> b = rand(128);

julia> x_normal = (V'V) \ (V'b);

julia> x = solve(LinearProblem(V, b)).u;

julia> norm(V * x - b)
2.935841792541393

julia> norm(V * x_normal - b)
3.141337584566186

julia> norm(x - x_normal)
3.881181867110321e12

@chriselrod
Copy link
Contributor

On my M1 mac:

julia> BLAS.set_num_threads(1);

julia> @btime qr($A)\$b;
  1.365 ms (9 allocations: 847.33 KiB)

julia> @btime solve(LinearProblem($A, $b)).u;
  2.607 ms (29 allocations: 1.66 MiB)

julia> BLAS.set_num_threads(4);

julia> @btime qr($A)\$b;
  1.142 ms (9 allocations: 847.33 KiB)

julia> @btime solve(LinearProblem($A, $b)).u;
  2.238 ms (29 allocations: 1.66 MiB)

So qr(A)\b seems like a pretty big improvement over what solve(LinearProblem(A, b)).u is doing?

@ChrisRackauckas
Copy link
Member

It should be using a QR in this instance, but it has to runtime prove the non-square assumption. With the assumption it's likely fine.

@chriselrod
Copy link
Contributor

2x overhead sounds like a hefty performance bug.

@ChrisRackauckas
Copy link
Member

There's way to reduce it but it would mangle the code a bit and I haven't gotten around to it. But in general I'd hope cases that want performance set the assumptions.

@ojwoodford
Copy link
Author

ojwoodford commented Mar 24, 2023

In general I'd hope cases that want performance set the assumptions.

@ChrisRackauckas How can I tell LinearSolve that A is a priori well conditioned?

@oscardssmith
Copy link
Contributor

You don't need to know that it's well conditioned to use QR. You just need to know that the matrix is tall. That said, IMO LinearSolve should really check the shape and use QR if appropriate.

@chriselrod
Copy link
Contributor

chriselrod commented Mar 24, 2023

You don't need to know that it's well conditioned to use QR.

He wants to say it is well conditioned to use Symmetric((A)' * A) \ ((A)' * b).
I suggested qr because it doesn't require A to be well conditioned.

Personally, I think if someone has that kind of information about the problem (i.e. how well conditioned it is), they should just choose the appropriate way to solve the problem themselves.

Conditioning is expensive to calculate (more expensive than solving generally is, i.e. normally requires svd [which could be used to solve]).

@ChrisRackauckas
Copy link
Member

IMO LinearSolve should really check the shape and use QR if appropriate.

That's what it's doing, and that's exactly the problem.

He wants to say it is well conditioned to use Symmetric((A)' * A) \ ((A)' * b).

We should add a OperatorAssumption for conditioning.

@chriselrod
Copy link
Contributor

That's what it's doing, and that's exactly the problem.

How can a shape check take over 1 ms?

@chriselrod
Copy link
Contributor

chriselrod commented Mar 24, 2023

I think it is literally calling qr! twice: first in init_cacheval, and then again under a sub-solve call.

@ChrisRackauckas
Copy link
Member

Oh yeah we need to improve the cache init since there's no ArrayInterface.qr_instance(A) implemented.

@ojwoodford
Copy link
Author

ojwoodford commented Mar 24, 2023

I suggested qr because it doesn't require A to be well conditioned.

I actually care about the case where A is well conditioned and sparse, and could be solved 4 orders of magnitude faster using normal equations.

Personally, I think if someone has that kind of information about the problem (i.e. how well conditioned it is), they should just choose the appropriate way to solve the problem themselves.

Currently that's exactly what I'm doing. It's just that LinearSolve claims to:

  • Fast implementations of linear solving algorithms (Very slow vs QR for tall, sparse A; surely a bug)
  • Allow for swapping out the linear solver that is used (No normal equation solver?)
  • A polyalgorithm that smartly chooses between these methods (No way to specify the problem is well conditioned)

and I feel my experience shows it doesn't really deliver on that. The method should be at least as fas as QR for tall, sparse matrices. A normal solver could be added, as could a way to tell the system that A is well conditioned. I appreciate it's a work in progress, so please take this as constructive feedback rather than criticism.

@oscardssmith
Copy link
Contributor

oscardssmith commented Mar 24, 2023

I actually care about the case where A is well conditioned and sparse, and could be solved 4 orders of magnitude faster using normal equations.

This isn't true. QR will be fairly similar speed (2x for dense up to 10x for sparse) slower and a lot more accurate.

For the sparse example you gave, my laptop gives

julia> @btime qr(A) \ b;
  11.515 ms (167 allocations: 23.39 MiB)
julia> @btime  Symmetric(A' * A) \ (A' * b);
  2.184 ms (79 allocations: 1.70 MiB)

@ojwoodford
Copy link
Author

ojwoodford commented Mar 24, 2023

@oscardssmith Fair point. I hadn't understood that QR was much faster for the sparse case. Full timings for me:

julia> @btime solve(LinearProblem($A, $b)).u;
  22.228 s (54 allocations: 20.41 MiB)

julia> @btime qr($A) \ $b;
  8.853 ms (159 allocations: 23.40 MiB)

julia> @btime Symmetric(($A)' * $A) \ (($A)' * $b);
  1.288 ms (75 allocations: 1.70 MiB)

julia> @btime ldl(($A)' * $A) \ (($A)' * $b);
  1.206 ms (47 allocations: 1.37 MiB)

So QR is already 2500x faster than what LinearSolve is doing, and accurate with poor conditioning (will test this on @YingboMa's example). This must surely be a bug in LinearSolve.

However, I do care about the further 8x speedup I get using my approach, and would prefer to use it from LinearSolve, and get all the benefits that LinearSolve claims to give.

@ojwoodford ojwoodford changed the title Solving the normal equations is much faster for tall linear systems Solving sparse, tall linear systems is insanely slow Mar 24, 2023
@ojwoodford ojwoodford changed the title Solving sparse, tall linear systems is insanely slow Solving sparse, linear systems is insanely slow Mar 24, 2023
@ojwoodford ojwoodford changed the title Solving sparse, linear systems is insanely slow Solving tall, sparse linear systems is insanely slow Mar 24, 2023
@ChrisRackauckas
Copy link
Member

So QR is already 2500x faster than what LinearSolve is doing, and accurate with poor conditioning (will test this on @YingboMa's example). This must surely be a bug in LinearSolve.

Yes, I've said multiple times here that in this case it's just doing a QR. So there is something odd here. In fact, the profile looks like it's some kind of weird inference bug or something.

Screenshot 2023-03-24 at 6 31 41 PM

That most be the oddest profile I've seen. It spends <1% of the time running code. The other 99% of the time is 🤷. Haven't seen this before, can someone double check the profile?

@chriselrod
Copy link
Contributor

Haven't seen this before, can someone double check the profile?

I've seen things like this before. The fix is to start Julia with a single thread.
I profiled it earlier and saw qr! calls from two different places within solve.

@ChrisRackauckas
Copy link
Member

Interesting. I found the issue. In #188 there was fd05fb1 which versioned qr! to only be used for (at the time) Julia master on sparse arrays. And if you check:

using LinearSolve, SparseArrays, BenchmarkTools
A = sprandn(5000, 100, 0.1)
b = randn(5000)
prob = LinearProblem(A, b)
assumptions = LinearSolve.OperatorAssumptions(false)
@time solve(prob; assumptions)

using LinearAlgebra
@time qr!(A)

The new qr! on sparsearrays is absurdly slow. So the fix is just to use the qr instead.

@Wimmerer is this qr! issue in sparsearrays known? I couldn't find an issue on it.

ChrisRackauckas added a commit that referenced this issue Mar 24, 2023
@ChrisRackauckas
Copy link
Member

I profiled it earlier and saw qr! calls from two different places within solve.

That was already fixed.

@ChrisRackauckas
Copy link
Member

using LinearSolve, SparseArrays, BenchmarkTools, LDLFactorizations, LinearAlgebra
A = sprandn(5000, 100, 0.1)
b = randn(5000)
@assert isapprox(solve(LinearProblem(A, b)).u, ldl(A' * A) \ (A' * b))
@btime solve(LinearProblem($A, $b)).u; # 16.984 ms (326 allocations: 24.16 MiB)
@btime qr($A) \ $b # 17.135 ms (159 allocations: 23.35 MiB)
@btime ldl(($A)' * $A) \ (($A)' * $b); # 1.186 ms (47 allocations: 1.37 MiB)

ChrisRackauckas added a commit that referenced this issue Mar 25, 2023
This is the last thing of #283 to be fixed up. It allows for different levels of conditioning assumptions, in order to allow for making a trade-off between performance and numerical stability. Right now I've handled the very ill-conditioned cases but not the well-conditioned case (which is the case the issue is interested in).
ChrisRackauckas added a commit that referenced this issue Mar 25, 2023
@ChrisRackauckas
Copy link
Member

#290 adds the new algorithms for well-conditioned matrices and adds them to the default algorithm. Results:

using LinearSolve, LinearAlgebra, BenchmarkTools
A = randn(1000, 100)
b = randn(1000)
@assert isapprox(solve(LinearProblem(A, b)).u, Symmetric(A' * A) \ (A' * b))
@btime solve(LinearProblem($A, $b)).u; # 3.140 ms (26 allocations: 857.53 KiB)
@btime solve(LinearProblem($A, $b), $(NormalCholeskyFactorization())).u; # 411.000 μs (27 allocations: 163.36 KiB)
@btime solve(LinearProblem($A, $b), $(NormalBunchKaufmanFactorization())).u; # 400.500 μs (27 allocations: 210.09 KiB)
@btime solve(LinearProblem($A, $b), assumptions = $(OperatorAssumptions(false; condition = OperatorCondition.WellConditioned))).u; # 401.000 μs (18 allocations: 162.89 KiB)
@btime Symmetric(($A)' * $A) \ (($A)' * $b); # 400.000 μs (9 allocations: 209.09 KiB)
@btime factorize(Symmetric((A)' * A)) \ (($A)' * $b); # 404.300 μs (12 allocations: 209.19 KiB)

using LinearSolve, SparseArrays, BenchmarkTools, LDLFactorizations, LinearAlgebra
A = sprandn(5000, 100, 0.1)
b = randn(5000)
@assert isapprox(solve(LinearProblem(A, b)).u, ldl(A' * A) \ (A' * b))
@btime solve(LinearProblem($A, $b)).u; # 16.984 ms (326 allocations: 24.16 MiB)
@btime solve(LinearProblem($A, $b), $(NormalCholeskyFactorization())).u; # 1.581 ms (130 allocations: 1.70 MiB)
@btime solve(LinearProblem($A, $b), assumptions = $(OperatorAssumptions(false; condition = OperatorCondition.WellConditioned))).u; # 1.535 ms (124 allocations: 1.70 MiB)
@btime qr($A) \ $b # 17.135 ms (159 allocations: 23.35 MiB)
@btime factorize(Symmetric((A)' * A)) \ (($A)' * $b); # 1.186 ms (47 allocations: 1.37 MiB)
@btime cholesky(Symmetric((A)' * A)) \ (($A)' * $b); # 1.186 ms (47 allocations: 1.37 MiB)
@btime ldlt(((A)' * A)) \ (($A)' * $b); # 1.186 ms (47 allocations: 1.37 MiB)
@btime ldl(($A)' * $A) \ (($A)' * $b); # 1.186 ms (47 allocations: 1.37 MiB)

I'm not entirely sure where the overhead on the sparse case is coming from, but at least it's rather close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants