Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output sparsity pattern of SparseMatrixCSC depends on autodiff option (finite, forward, backward/flux) #964

Open
learning-chip opened this issue Dec 7, 2021 · 5 comments

Comments

@learning-chip
Copy link

learning-chip commented Dec 7, 2021

When optimizing a function with SparseMatrixCSC input type, the output ("minimizer") is still a sparse matrix type, but with inconsistent sparse patterns depending on the autodiff option:

  1. With autodiff = :finite (default), the output is a SparseMatrixCSC type but with dense patterns -- mathematically same as dense matrix, no zero entries.
  2. With autodiff = :forward, the output has the same sparsity pattern as the original input -- zero entries are kept zero. This seems the most reasonable behavior.
  3. With Flux/Zygote via FluxOptTools, the sparsity pattern is kept (at least mathematically). The software behavior is a bit weird though -- minimizer() returns a flatten 1-D array of matrix values, not SparseMatrixCSC type. The gradient computation itself should be correct (ref Support sparse arrays/matrices as output? FluxML/Zygote.jl#163 (comment)).

Should such inconsistent behaviors be considered "expected" or "buggy"?

  • I would expect that autodiff = :finite should also keep sparsity, and shouldn't turn sparse into dense.
  • For FluxOptTools's behavior, better left to a new issue on its own repo.

To reproduce:

using Optim, SparseArrays, LinearAlgebra, Random
using Zygote, Flux, FluxOptTools

n = 8
I_n = spdiagm(ones(n))
Random.seed!(0)
A = sprand(n, n, 0.5) * 0.2 + I_n
B = sprand(n, n, 0.5) * 0.2 + I_n

f(P) = norm(A * P - I_n)  # f is zero only when P = inv(A)

B_opt_fd = Optim.minimizer(optimize(f, B, BFGS(), autodiff = :finite))  # output dense patterns
B_opt_ad = Optim.minimizer(optimize(f, B, BFGS(), autodiff = :forward))  # output sparse patterns

# the rest are specific to Flux-computed gradient

function optimize_flux(B)
    # can also compute gradient explicitly by `f_grad(P) = gradient(f, P)[1]` and pass as `g!` to Optim, 
    # but FluxOptTools simplifies the array flattening for us
    B_opt = copy(B)  # avoid in-place modification to original B
    loss() = f(B_opt)
    Zygote.refresh()
    pars   = Flux.params(B_opt)
    lossfun, gradfun, fg!, p0 = optfuns(loss, pars)
    B_vals = Optim.minimizer(optimize(Optim.only_fg!(fg!), p0, BFGS()))
    return B_opt, B_vals
end

B_opt_flux, B_vals_flux = optimize_flux(B)  # B_opt_flux has same sparsity as B, but with values modified
isapprox(B_opt_flux, B_opt_ad)  # results agree
isequal(filter(!iszero, B_vals_flux), B_opt_flux.nzval)  #  B_vals_flux is a flatten, dense 1-D array

As for the result,

  • f(B_opt_ad) and f(B_opt_flux) are finite positive values, because of the constrained sparsity patterns.
  • f(B_opt_fd) is zero within numerical precision, because all entries are allowed to change. B_opt_fd is essentially inv(Matrix(A)), i.e. dense inverse of sparse matrix.

Package Version

@longemen3000
Copy link
Contributor

longemen3000 commented Dec 7, 2021

I think that the problem is not in Optim.jl, but in NLSolversBase.jl. i can reproduce by:

using NLSolversBase, SparseArrays,Random,LinearAlgebra
n = 8
I_n = spdiagm(ones(n))
Random.seed!(0)
A = sprand(n, n, 0.5) * 0.2 + I_n
B = sprand(n, n, 0.5) * 0.2 + I_n
f(P) = norm(A * P - I_n) 
FD = OnceDifferentiable(f, B , autodiff = :finite)
AD =  OnceDifferentiable(f, B , autodiff = :forward)
dB_ad = copy(B)
dB_fd = copy(B)
AD.df(dB_ad,B);dB_ad
FD.df(dB_fd,B);dB_fd

its almost definitively related to FiniteDiff.jl (the finite difference backend). but i need to confirm

@longemen3000
Copy link
Contributor

longemen3000 commented Dec 7, 2021

a reproducer without Optim.jl

using FiniteDiff, ForwardDiff, SparseArrays,Random,LinearAlgebra
n = 8
I_n = spdiagm(ones(n))
Random.seed!(0)
A = sprand(n, n, 0.5) * 0.2 + I_n
B = sprand(n, n, 0.5) * 0.2 + I_n
f(P) = norm(A * P - I_n)
FiniteDiff.finite_difference_gradient(f,B)
ForwardDiff.gradient(f,B)

NLSolversBase (and Optim) pins the version of FiniteDiff to 2.0, but it's not relevant (the error occurs on 2.8.1)

@pkofod
Copy link
Member

pkofod commented Dec 25, 2021

Interesting. I think you'd have to use sparsedifftools.jl instead of finitediff.jl to preserve sparsity patterns. @ChrisRackauckas ?

@ChrisRackauckas
Copy link
Contributor

FiniteDiff.jl has the same sparsity API as SparseDiffTools.jl. Just pass the color vector and the sparse Jacobian. SparseDiffTools has the AD version of sparse differentiation, along with the graph algorithms for computing the color vectors, while FiniteDiff.jl has a compatible finite difference Jacobian implementation.

@ChrisRackauckas
Copy link
Contributor

Reading this more, @longemen3000 did not pass the sparsity to FiniteDiff, and so it did not know that the Jacobian it was calculating was supposed to be sparse. Having a sparse input does not necessarily mean a sparse output (example: sparse initial condition to an ODE or neural network will become dense), and so that is not sufficient information for any such differentiation code to determine what the sparsity pattern the derivative would have. And we haven't implemented sparsity on gradients since that's a rather niche case as at most it would be a sparse vector, compared to sparse Jacobians or Hessians which would be matrices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants