-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use power iteration for opnorm2 #49487
use power iteration for opnorm2 #49487
Conversation
Who would be good to review? Maybe @andreasnoack or he could recommend someone to review? |
Clever! Of course it's better to use lanczos for this, but maybe as a quick attempt it's acceptable. The performance with this is very unpredictable though... You should use randn rather than rand : randn is invariant under rotations and therefore picks points on the sphere uniformly, while rand is very biased: even if you shift it to cover (-1,1), it'll eg pick (1,1,...,1) sqrt(n) more often than (1,0,..., 0). I'm concerned about the squaring of A, which can mean that even when the algorithm converges, having a small residual does not mean having a small error. Maybe doing the power method (or at least the residual check) on (0 A';A 0) is better? I'll think about it. |
yeah I did about 15 minutes of reading theory before choosing this algorithm and I knew that I wasn't picking the best method, but the others seemed more annoying to implement. for the performance unpredictably, do I get anything from the fact that I'm only relying on the eigenvalue rather than the eigenvector? |
Not really... Sorry I didn't mean unnpredictable in the sense that the convergence is unpredictable. The power method is actually one of the most predictable methods in numerical analysis: the convergence rate is the ratio of second-largest to largest singular value squared. I just meant that for some matrices you're going to have a fast runtime, and for some others it's going to be very slow, so that might trip people up. This is especially true because you're running N iterations, so eg you could very well have a situation where the iteration doesn't converge for the matrix A, but it does for the matrix [A 0;0 A] (simply because it's run for two times more iterations), which would result in the former being much slower than the latter. All in all I'm not completely convinced about this approach. opnorm is definitely not intended for "production" use (I remember the days when Did you actually have a need for this, or was it just for fun? |
For the convergence issues, is there any way to determine if we are in the condition where this will converge badly from a small number of iterations? This came from slack (https://julialang.slack.com/archives/C6A044SQH/p1682353744395679).
Absolutely there definitely is something better to do there. |
If after a few iterations the convergence rate stabilises to something but the residual remains large), it probably means the algorithm has entered the asymptotic regime, and you can more or less predict how many iterations it's going to take by extrapolating using the convergence rate. But again that's a bit ad-hoc. The options here are:
Honestly I think option 3 is better here, because of the reasons above. I just looked at what other languages are doing: matlab appears to use max(svd()) for |
Also cc @stevengj |
tmp = zeros(Tnorm, m) | ||
At = A' | ||
v = one(Tnorm) | ||
# this will converge quickly as long as the top two eigenvalues are distinct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we use a Lanczos iteration to be more robust and converge quickly even if a few eigenvalues are similar in magnitude?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paper seems to be a popular algorithm: https://doi.org/10.1137/04060593X and is even fancier (restarting, bidiagonalization).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. There's a python implementation of that paper with a good license here: https://github.com/bwlewis/irlbpy/blob/master/irlb/irlb.py. I'll see how it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good news and bad news. Good news is I have it working. Bad news is it takes about 84 lines of code. Performance testing to come.
About rand vs randn (I started replying to @stevengj 's deleted comment, I think the experiment below is interesting so I'm posting it anyway):
yields very similar results, so apparently it's not that bad (looks like a fun exercice in probability theory, at which I'm very bad...). So using rand is probably OK here, but since the perf impact is negligible, might as well take a rotationally invariant distribution... |
You basically want to avoid something that is nearly orthogonal to the dominant eigenvector, and the odds of that are about as negligible for |
@stevengj yeah, I'd have guessed a sqrt(n) discrepancy seeing as the max bias for the rand distribution is of that order, but I have a very weak intuition for high dimensional probability... So, I'm still on the fence about this pr. First, having the same function do two very different things (svd or iterative method) feels weird (do we already have an example of that? Usual polyalgorithms simply dispatch between different exact factorizations). Second, iterative methods are more suited to sparse matrices, so do we want this function to also work for sparse? But then do we want to fallback to svd? Also it'd be good for this function to take as parameter a tolerance, for early exit of the method. All in all this points to having a second method besides the current opnorm, like matlab's normest. And third, it does look a bit specialized for LinearAlgebra, and there are already packages doing lanczos, so maybe it's better to just make sure there's a good function in one of the packages, and then point to it in the docs? |
We could have an |
Glad that you found some optimization hotspots :) The tough part seems to be finding the right initial julia> opnorm2(complex.(randn(2,2),randn(2,2)))
ERROR: InexactError: Float64(1.894212444018611 - 0.6331930071371694im)
[...] I added a comment to this effect in the code. Have I got this wrong? Also, wasn't there some BLAS/LAPACK code for power method and related iterative methods? |
Sorry, but unfortunately, I would also like to report that somehow the benchmarks on my system are different. The application is very interesting; I was attempting to figure out what the impact on small to medium sized matrices might be, since (likely) convergence will fail and they'll fall back to # Uses power iteration to compute the maximal singular value.
# falls back to svdvals if it runs into numerics issues.
function opnorm2(A::AbstractMatrix{T}) where T
LinearAlgebra.require_one_based_indexing(A)
m,n = size(A)
Tnorm = typeof(float(real(zero(T))))
if m == 0 || n == 0 return zero(Tnorm) end
if m == 1 || n == 1 return norm2(A) end
# to minimize the chance of x being orthogonal to the largest eigenvector
x = randn(Tnorm, n)
tmp = zeros(Tnorm, m)
At = A'
v = one(Tnorm)
# this will converge quickly as long as the top two eigenvalues are distinct
for i in 1:n
mul!(x, At, mul!(tmp, A, x))
newv = norm(x)
# the numerics got very wonky
!isfinite(newv) && return first(svdvals(A))
isapprox(v, newv; rtol=10*eps(Tnorm)) && return sqrt(newv)
v = newv
x .*= inv(newv)
end
# iteration failed to converge
return first(svdvals(A))
end Benchmarks on my system: julia> using LinearAlgebra
julia> using BenchmarkTools
julia> include("opnorm.jl")
opnorm2 (generic function with 1 method)
julia> A = randn(1024,1024) ;
julia> methods(opnorm2)
# 1 method for generic function "opnorm2" from Main:
[1] opnorm2(A::AbstractMatrix{T}) where T
@ ~/opnorm.jl:3
julia> @benchmark opnorm2(A)
BenchmarkTools.Trial: 16 samples with 1 evaluation.
Range (min … max): 145.561 ms … 395.863 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 338.414 ms ┊ GC (median): 0.00%
Time (mean ± σ): 330.935 ms ± 52.644 ms ┊ GC (mean ± σ): 0.01% ± 0.03%
[...]
Memory estimate: 16.27 KiB, allocs estimate: 3.
julia> methods(LinearAlgebra.opnorm2)
# 1 method for generic function "opnorm2" from LinearAlgebra:
[1] opnorm2(A::AbstractMatrix{T}) where T
@ ~/julia/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/src/generic.jl:673
julia> @benchmark LinearAlgebra.opnorm2(A)
BenchmarkTools.Trial: 34 samples with 1 evaluation.
Range (min … max): 129.942 ms … 164.007 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 151.122 ms ┊ GC (median): 0.00%
Time (mean ± σ): 148.022 ms ± 10.197 ms ┊ GC (mean ± σ): 0.05% ± 0.12%
[...]
Memory estimate: 8.59 MiB, allocs estimate: 10. Also the numerical issues: julia> LinearAlgebra.opnorm2(complex.(randn(2,2),randn(2,2)))
1.8431336899908002
julia> opnorm2(complex.(randn(2,2),randn(2,2)))
ERROR: InexactError: Float64(0.041218366090337966 + 0.0144244121071116im)
[...] |
I've now pushed some improvements that fix the squaring of the condition number (turns out you don't need the
|
Hello, I'm a new guy here and I don't mean to be disrespectful to anyone. Power algorithm has its place, but it's more a learning tool as a first step into numerical algorithms. There's a range of inputs it works well on, but I feel that it has no place in Since a lot of good work has been done in this PR, I'd suggest the follows:
This is analogous to the approach taken by BLAS/LAPACK wrt power iterations. Furthermore, the method is simple and straightforward enough for a user to implement and experiment with it for their specific needs. |
We have moved the LinearAlgebra stdlib to an external repo: https://github.com/JuliaLang/LinearAlgebra.jl @oscardssmith If you think that this PR is still relevant, please open a new PR on the LinearAlgebra.jl repo. |
This is much faster and allocates
O(m)
instead ofO(m*n)