Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue with (sparse) backslash (and/or QR factors?) #30288

Closed
briochemc opened this issue Dec 6, 2018 · 1 comment
Closed

Performance issue with (sparse) backslash (and/or QR factors?) #30288

briochemc opened this issue Dec 6, 2018 · 1 comment

Comments

@briochemc
Copy link
Contributor

briochemc commented Dec 6, 2018

Hi there,

I probably don't understand what is happening but I think this is worth inspecting: I ran into an odd performance issue when performing a back-substitution. Below is a not-so-minimal working example (NSMWE). What strikes me is that I always thought the expensive part in using \ was the factorization, but in this NSMWE, it is the back-substitution that is slowing things down. if you run the example below, it will print the @btime for

  • Δ \ x the direct backslash use (~15 s on my machine),
  • Δf = factorize(A) the factorization (~0.8 s), and
  • Δf \ x the back-substitution (~15 s)

[EDIT: Note that Δf is of type SuiteSparse.SPQR.QRSparse{Float64,Int64}, so this might be a QR-specific issue. In case that helps, the sparse matrix Δ itself represents a simple 2D discrete Laplacian, i.e., mostly a bunch of -4s on the "diagonal" and -1s "off-diagonal"]

This is the exact output on my machine:

julia> include("Julia_benchmark3.jl") # file where I wrote the NSMWE
Time for backslash
  15.579 s (390523 allocations: 437.50 MiB)
Time for factorizing
  816.068 ms (61573 allocations: 418.04 MiB)
Time for backslash with factors, i.e., backsubstitution
  14.560 s (328949 allocations: 19.45 MiB)

I put everything in a little module so as to not pollute the output without adding semicolons everywhere.
(Also note that I have tried the same thing in MATLAB, and using tic and toc to time \ shows me it done in about 0.5 s.)

The NSMWE:

module MWE

using NetCDF, DataDeps, LinearAlgebra, SparseArrays, SuiteSparse, BenchmarkTools

# Register the data
register(DataDep(
    "WOA13",
    "citation string",
    "https://data.nodc.noaa.gov/woa/WOA13/DATAv2/phosphate/netcdf/all/1.00/woa13_all_p00_01.nc",
    sha2_256
))

file_name = "woa13_all_p00_01.nc" ;
var_name = "p_an" ;
nc_file = @datadep_str string("WOA13/", "woa13_all_p00_01.nc")

# Read the file
woa_var_3d = ncread(nc_file, var_name) ;
fillvalue = ncgetatt(nc_file, var_name, "_FillValue")
woa_var_3d[findall(woa_var_3d .== fillvalue)] .= NaN ;
z = 102 ;
A = woa_var_3d[:,:,z] ;

# Inpaint part
f = isnan
I_unknown = findall(@. f(A))
I_known = findall(@. !f(A))

# direct neighbors
Ns = [CartesianIndex(ntuple(x -> x == d ? 1 : 0, ndims(A))) for d in 1:ndims(A)]
neighbors = [Ns; -Ns]

# "working" list: the indices of all unkowns and their neighbors
R = CartesianIndices(size(A))

function list_neighbors(R, idx, neighbors)
    out = [i + n for i in idx for n in neighbors if i + n  R]
    out = sort(unique([out; idx]))
end

Iwork = list_neighbors(R, I_unknown, neighbors)
iwork = LinearIndices(size(A))[Iwork]

# Preallocate the Laplacian (not the fastest way but easier-to-read code)
nA = length(A)
Δ = sparse([], [], Vector{Float64}(), nA, nA)

# Fill the Laplacian within the borders
for d in 1:ndims(A)
    N = Ns[d]
    # R′ is the range of indices without the borders in dimension `d`
    R′ = CartesianIndices(size(A) .- 2 .* N.I)
    R′ = [r + N for r in R′]
    # Convert to linear indices to build the Laplacian
    u = vec(LinearIndices(size(A))[R′])
    n = LinearIndices(size(A))[first(R) + N] - LinearIndices(size(A))[first(R)]
    # Build the Laplacian (not the fastest way but easier-to-read code)
    global Δ += sparse(u, u     , -2.0, nA, nA)
    Δ += sparse(u, u .- n, +1.0, nA, nA)
    Δ += sparse(u, u .+ n, +1.0, nA, nA)
end

# Use linear indices to access the sparse Laplacian
i_unknown = LinearIndices(size(A))[I_unknown]
i_known = LinearIndices(size(A))[I_known]

# Place the knowns to right hand side
rhs = -Δ[iwork, i_known] * A[i_known]

# Solve for the unknowns
B = zeros(size(A))
B[i_known] .= A[i_known]
B[i_unknown] .= Δ[iwork, i_unknown] \ rhs

println("Time for backslash")
@btime Δ[iwork, i_unknown] \ rhs

println("Time for factorizing")
@btime Δf = factorize(Δ[iwork, i_unknown])
Δf = factorize(Δ[iwork, i_unknown])

println("Time for backslash with factors, i.e., backsubstitution")
@btime Δf \ rhs

end

I would like for this performance issue to be resolved so that I can claim that Julia's version of inpaint (that I ported into a recently released package, Inpaintings.jl) is as fast as MATLAB's inpaint_nans 😃

@briochemc briochemc changed the title Performance issue with backslash Performance issue with backslash (and/or QR factors?) Dec 6, 2018
@briochemc briochemc changed the title Performance issue with backslash (and/or QR factors?) Performance issue with (sparse) backslash (and/or QR factors?) Dec 6, 2018
andreasnoack added a commit that referenced this issue Dec 6, 2018
instead of using a view to avoid slow fallback in back substitution.

Fixes #30288
andreasnoack added a commit that referenced this issue Dec 7, 2018
…30289)

instead of using a view to avoid slow fallback in back substitution.

Fixes #30288
KristofferC pushed a commit that referenced this issue Dec 7, 2018
…30289)

instead of using a view to avoid slow fallback in back substitution.

Fixes #30288

(cherry picked from commit 87c5f36)
@briochemc
Copy link
Contributor Author

Thanks guys!

KristofferC pushed a commit that referenced this issue Dec 12, 2018
…30289)

instead of using a view to avoid slow fallback in back substitution.

Fixes #30288

(cherry picked from commit 87c5f36)
KristofferC pushed a commit that referenced this issue Feb 11, 2019
…30289)

instead of using a view to avoid slow fallback in back substitution.

Fixes #30288

(cherry picked from commit 87c5f36)
KristofferC pushed a commit that referenced this issue Feb 20, 2020
…30289)

instead of using a view to avoid slow fallback in back substitution.

Fixes #30288

(cherry picked from commit 87c5f36)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant