Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qrdelcol! and qrdelcol yielding different results! #11

Closed
tinatorabi opened this issue Oct 22, 2024 · 4 comments
Closed

qrdelcol! and qrdelcol yielding different results! #11

tinatorabi opened this issue Oct 22, 2024 · 4 comments

Comments

@tinatorabi
Copy link

tinatorabi commented Oct 22, 2024

Hi,

I encountered an unexpected bug in my tests after replacing qrdelcol with qrdelcol!, and I can't figure out why it's happening. It seems like qrdelcol! doesn't return the correct R matrix. I'll share the exact matrices in a JLD2 file and a script that reproduces the issue. I would greatly appreciate a prompt review of this problem.

using QRupdate, JLD2, LinearAlgebra, InvertedIndices
@load "matrices.jld2" S_array R_array
S = S_array
R = R_array

## doing the following slicing to mimic how qrdelcol saves matrices
S_ = S_array[:,1:5]
R_ = R_array[1:5,1:5]

qrdelcol!(S, R, 3) #inplace

S_ = S_[:, Not(3)]
R_ = qrdelcol(R_, 3)

norm(S[:,1:4]-S_) # this gives back 0
norm(R[1:4,1:4] - R_) #!!!

norm( R_'*R_ - S_'*S_ ) # seems right
norm( R'*R - S'*S ) # weird

matrices.jld2.zip

I think this is the problematic part:

    # Shift columns. This is apparently faster than copying views.
    for j in (k+1):n, i in 1:m
        @inbounds R[i,j-1] = R[i, j]
        @inbounds A[i,j-1] = A[i, j]
    end

Changing it to

    for j in (k+1):n
        @inbounds for i in 1:m
            R[i, j-1] = R[i, j]   
            A[i, j-1] = A[i, j]  
        end
    end

solves the issue. I'm not entirely sure but I think the first version kinda overwrites matrix entries during the shifting, but it's more memory/cache efficient obviously.

@tinatorabi
Copy link
Author

I also noticed this:

using QRupdate
using LinearAlgebra
using TimerOutputs
using InvertedIndices

m, n = 10000, 100
A = randn(m, n)
R = qr(A).R
Rin = deepcopy(R)
Ain = deepcopy(A)
for i in 30:50
    println("======", i)
    @timeit "dynamic" global A = A[:, Not(i)]
    @timeit "dynamic" global R = qrdelcol(R, i)
    @timeit "in-place" qrdelcol!(Ain, Rin, i)
end
print_timer()

julia> norm(Ain[:,1:79] - A)
0.0
julia> norm(Rin[1:79,1:79] - R)
720.1343678995846
                                     Time                           Allocations      

───────────────────────────────────────────────
Tot / % measured: 426s / 0.3% 1.20GiB / 63.9%

Section ncalls time %tot avg alloc %tot avg
─────────────────────────────────────────────────────────
dynamic 237 962ms 87.3% 4.06ms 783MiB 99.6% 3.30MiB
in-place 118 141ms 12.7% 1.19ms 3.24MiB 0.4% 28.1KiB
─────────────────────────────────────────────────────────

There is something inherently wrong with how R is being saved!

@nabw
Copy link
Contributor

nabw commented Oct 25, 2024

Hi Tina, thank you very much for looking into this.

I modified the qrdelcol! tests locally to involve the way in which you measure the difference, but it still passes the tests:

@testset "qrdelcol!" begin
    m = 100
    A = randn(m,m)
    Ain = copy(A)
    Q, R = qr(A)
    Qin, Rin = qr(A)
    actual_size = m
    for i in 100:-1:1
        k = rand(1:i)
        A = A[:,1:i .!= k]
        R = qrdelcol(R, k)
        qrdelcol!(Ain, Rin, k)
        actual_size -= 1
        @test norm(R - Rin[1:actual_size,1:actual_size]) < 1e-10
        @test norm(A - Ain[:,1:actual_size]) < 1e-10
    end
end

I am running the first code you sent. The critical row size seems to be 1441 (on my PC). From there on, I get errors on the column copy method as you mention. This seems not to be a problem on the column size also.

I found a problem on the matrix dimensions in the delcol! function, which I fixed in an upcoming commit (thanks again for checking). If you want to get it working quickly now just to use the code, this is my current coldel! function:

function qrdelcol!(A::AbstractMatrix{T}, R::AbstractMatrix{T}, k::Integer) where {T}

    # Note that R is n x n
    m, n = size(A)
    mR,nR = size(R)

    # Shift columns. This is apparently faster than copying views.
    @inbounds for j in (k+1):n
        R[:,j-1] .= @view R[:, j]
        A[:,j-1] .= @view A[:, j]
    end
    A[:,n] .= zero(T)
    R[:,n] .= zero(T)

This plus some minor modifications (not related to functionality) are what is coming up. I also noted that the function seems to be slower than the common del. I computed average times and mem usage with @Btime, which seems to be more accurate, with the following simple script:

using QRupdate
using LinearAlgebra
using BenchmarkTools

for mm in [1000,2000,4000,10000, 20000,50000,100000]
    #reset_timer()
    m, n = mm, 100
    A = randn(m, n)
    R = qr(A).R
    Rin = deepcopy(R)
    Ain = deepcopy(A)

    actual_size = n
    i = 20
    println("====== R ", i)
    @btime $R = qrdelcol($R, $i)
    println("====== Rin ", i)
    @btime qrdelcol!($Ain, $Rin, $i)
end

which yields

====== R 20
  12.974 μs (12 allocations: 154.38 KiB)
====== Rin 20
  13.878 μs (0 allocations: 0 bytes)
====== R 20
  13.351 μs (12 allocations: 154.38 KiB)
====== Rin 20
  32.164 μs (0 allocations: 0 bytes)
====== R 20
  13.277 μs (12 allocations: 154.38 KiB)
====== Rin 20
  81.384 μs (0 allocations: 0 bytes)
====== R 20
  13.373 μs (12 allocations: 154.38 KiB)
====== Rin 20
  197.214 μs (0 allocations: 0 bytes)
====== R 20
  13.315 μs (12 allocations: 154.38 KiB)
====== Rin 20
  525.806 μs (0 allocations: 0 bytes)
====== R 20
  14.081 μs (12 allocations: 154.38 KiB)
====== Rin 20
  2.020 ms (0 allocations: 0 bytes)
====== R 20
  13.963 μs (12 allocations: 154.38 KiB)
====== Rin 20
  4.539 ms (0 allocations: 0 bytes)

I found this a bit surprising (but I'm glad of the 0 allocations). Still, note that the in-place 'delcol!' function also takes care of the 'A' matrix, which delcol doesn't. This is a big load for the function, and indeed commenting out all calls to shifting the matrix 'A' as well yields the expected results:

====== R 20
  13.768 μs (12 allocations: 154.38 KiB)
====== Rin 20
  2.336 μs (0 allocations: 0 bytes)
====== R 20
  13.727 μs (12 allocations: 154.38 KiB)
====== Rin 20
  2.352 μs (0 allocations: 0 bytes)
====== R 20
  13.400 μs (12 allocations: 154.38 KiB)
====== Rin 20
  2.321 μs (0 allocations: 0 bytes)
====== R 20
  14.196 μs (12 allocations: 154.38 KiB)
====== Rin 20
  2.366 μs (0 allocations: 0 bytes)

I believe that the function should have the modification also for A, as modifying the R matrix is something you do only because A was modified. This usage aspect
(I pasted only the last ones). I hope this closes the issue, and sorry for taking longer than expected.

@nabw
Copy link
Contributor

nabw commented Nov 6, 2024

Ok, added in this PR: #12

@tinatorabi
Copy link
Author

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants