Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sparse setindex and stored zeros #9906

Closed
mauro3 opened this issue Jan 24, 2015 · 37 comments
Closed

sparse setindex and stored zeros #9906

mauro3 opened this issue Jan 24, 2015 · 37 comments
Labels
sparse Sparse arrays

Comments

@mauro3
Copy link
Contributor

mauro3 commented Jan 24, 2015

During construction of sparse matrices and when using setindex!, entries with values 0 are not included/dropped from the underlying storage. This is no good for certain applications. In particular when preallocating sparse matrices.

For example we currently got:

julia> sparse([1],[1], [0.], 5 ,5)
5x5 sparse matrix with 0 Float64 entries:

Instead

julia> sparse([1],[1], [1.], 5 ,5)
5x5 sparse matrix with 1 Float64 entries:
        [1, 1]  =  0.0

or

julia> sparse([1],[1], [1.], 5 ,5, keepzeros=true)
5x5 sparse matrix with 1 Float64 entries:
        [1, 1]  =  0.0

My use case are preallocated Jacobian matrices when solving PDEs. Usually the pattern of potential non-zeros in the Jacobian is known, so it can be preallocated. However, when repeatedly filling the Jacobian during the calculation there is no guarantee that its underlaying storage will be kept fixed.

Maybe there could be a sparse type for fixed size sparse matrix, say SparseMatrixCSCfixed. Constructed with sparse([1],[1], [1.], 5 ,5, keepzeros=true). Then setindex! would not drop zeros and error when trying to assign to a not-allocated entry.

@mlubin
Copy link
Member

mlubin commented Jan 24, 2015

I agree that a keepzeros flag would be useful. But anyway if you're preallocating why not just use dummy nonzero values?

@mauro3
Copy link
Contributor Author

mauro3 commented Jan 24, 2015

Yes, I use ones to preallocate, which is fine. But the other problem does not have a workaround: if I update the Jacobian J[i,j]=someval and someval happens to be zero at that iteration then the whole sparse matrix is redone, only to be restored to the same state once someval is non-zero again.

@IainNZ IainNZ added the sparse Sparse arrays label Jan 24, 2015
@JeffBezanson
Copy link
Member

dup of JuliaLang/LinearAlgebra.jl#60

Looks like it is possible to have explicit zeros by calling the SparseMatrixCSC constructor directly.

@mauro3
Copy link
Contributor Author

mauro3 commented Jan 24, 2015

Thanks for pointing out that issue. I am aware that zeros can be stored in a sparse matrix, but not easily as setindex! will just purge that entry from the matrix. I think there should be a way to keep them when using standard indexing. The Julian way to do this would be with a new type.

@mlubin
Copy link
Member

mlubin commented Jan 24, 2015

I don't want to rehash JuliaLang/LinearAlgebra.jl#60, but I believe the conclusion is that explicit zeros are allowed in SparseMatrixCSC because they're generally harmless, but that high-level interfaces like getindex are free to remove them. J[i,j]=someval in a loop is potentially a very inefficient way to update a sparse matrix anyway. If you're aiming for efficiency you have to work at the level of CSC format.

@JeffBezanson
Copy link
Member

That's true; if you want (1) fast changes to individual elements, (2) keeping zeros in place, it seems the best thing to do is directly manipulate the representation.

@tkelman
Copy link
Contributor

tkelman commented Jan 25, 2015

The Julian way to do this would be with a new type.

I don't think a new type would be a good idea for dealing with this, at least not with the current design. You'd have to reimplement a lot of methods for any new type, or introduce an AbstractCSC level to the hierarchy. Maybe a boolean type parameter could work, but that could also get pretty messy.

I think the best solution at the moment would be to add a specific setexplicitzero! method for this. I think there's already a non-exported helper function that can be used to do this, but I don't remember exactly which one.

@ViralBShah
Copy link
Member

New type is definitely not a good idea. The best thing for now is perhaps to have new methods for this. We could even have a module with various methods that store their zeros.

@mauro3
Copy link
Contributor Author

mauro3 commented Jan 25, 2015

Thanks for all the input. So I suggest two things:

  • add a keyword to the constructors keepzeros to force keeping zeros
  • mention in docs that zeros are dropped in constructors and setindex!

Let me know if that is good and I can make a pull request for those two points

Also, I just discovered this:

julia> sparse([1,1], [2,2], [1,-1])
1x2 sparse matrix with 1 Int64 entries:
        [1, 2]  =  0

Is this a bug?

@ViralBShah
Copy link
Member

I guess the only constructor where this makes sense is sparse(). We can certainly do that.

Yes - that is a bug. Could you file a separate issue for that?

@mauro3
Copy link
Contributor Author

mauro3 commented Jan 26, 2015

keepzeros also applies to spdiagm.

Filed issue #9928

@ViralBShah ViralBShah changed the title sparse matrices and zeros sparse matrices and stored zeros Feb 19, 2015
@ViralBShah
Copy link
Member

I feel that a better approach may be to have AbstractSparseMatrixCSC, and have all the methods implemented on the abstract type. The concrete types can be SparseMatrixCSC (default without stored zeros) and SparseMatrixCSCStoredZeros. They will share largely the same implementation, except that there will be special methods where stored zeros have to be treated differently.

I prefer the tags to be in the type system, rather than in the function arguments or flags in the data structure, and multiple dispatch will do the rest for us. The code will have to be refactored to use accessor methods, but I feel that overall, this is the better approach.

@tkelman
Copy link
Contributor

tkelman commented Feb 21, 2015

I thought the majority opinion in #9928 was that no one actually thinks removing structural nonzeros with values numerically equal to zero is all that important.

@ViralBShah
Copy link
Member

IMO, that will only be true until scores of users send in bug reports.

@tkelman
Copy link
Contributor

tkelman commented Feb 21, 2015

It's good enough for SciPy?

@ViralBShah
Copy link
Member

Matlab religiously squeezes out zeroes in all sparse matrix operations. This is SciPy's behaviour:

  1. What should sparse() do, when there are zeros in the input? SciPy keeps them.
  2. What should constructors like sparse() do, when arithmetic on repeated indices produces zeros? SciPy keeps them.
  3. What should find() do? Even if there are stored zeros, SciPy does not return them - I don't like the check for zeros in find.
  4. What should setindex!() do if assigning 0 or arrays containing 0 values in them? SciPy will explicitly store them.
  5. What happens when stored zeros participate in arithmetic or matrix multiply. In SciPy, they get squeezed out in the result - you cannot retain the structure in the result of the arithmetic operation, which I would have liked in a stored zeros case.
  6. What happens when stored zeros arise as a result of the arithmetic? In SciPy, they get squeezed out in the result.

@jiahao
Copy link
Member

jiahao commented Mar 5, 2015

In #6769 (comment), I also looked up UMFPACK and MKL's documentation, which both state that they allow explicitly stored zeros.

@tkelman
Copy link
Contributor

tkelman commented Mar 5, 2015

Pardiso in fact requires them on the diagonal.

@ViralBShah
Copy link
Member

Here's the behaviour I would like to have for 0.4, which will not be a ton of work.

  1. Built-in operations always remove the zeros that may arise due to arithmetic (eg. in sparse() or binary operators etc.). setindex! etc. will ignore zeros in the input. This would differ from scipy. If as a user you haven't introduced explicit zeros, Julia won't add them.
  2. If there are stored zeros, operations such as find will not weed them out. They are treated just like other nonzeros.

@nalimilan
Copy link
Member

Point 2. Sounds both dangerous and a bit odd to me, as it's inconsistent with dense arrays, and implementation-wise, it would be very easy for find to get rid of stored zeros if it finds any.

@ViralBShah
Copy link
Member

Ah I remember that now. That is why we have the current behaviour. Thanks.

@tkelman
Copy link
Contributor

tkelman commented Mar 6, 2015

Having an API to easily set an explicit zero without having to dig into the internal representation would be exceedingly helpful. That API does not have to be setindex! however, as long as things are documented well.

I'd be okay with just trying to change the behavior in the #9928 case and deferring other changes to post-0.4. But any proposed solution to #9928 should be carefully vetted that it does not cause significant performance regressions in other cases.

@ArjunNarayanan
Copy link

Hello everyone,

I was trying to follow the discussion on this thread to figure out how to explicitly store zeros in a sparse matrix. I haven't figured it out yet, so it would be helpful if someone could point out a reference.

I need this in order to implement knot interval vectors in an unstructured t-spline mesh. Such a mesh can have empty entries which store no information (which I would like to treat as the usual omitted zeros in a sparse matrix). Zero and positive real numbers are considered valid entries in the knot interval vector.

I am not anticipating performing any linear algebra directly on these matrices. I'd like to store it in a sparse format to allow for efficient storage of very large meshes. I also think that the algorithms to extract information from these meshes can be more easily implemented if the data is in a sparse format.

If you have any alternative suggestions, I'd be very glad to hear them.

Cheers,

@tkelman
Copy link
Contributor

tkelman commented Aug 9, 2015

You need to directly modify the colptr, rowval, and nzval fields of a sparse matrix to insert a stored nonzero at this time. Hopefully we'll come up with a real API for doing this in the future.

@mlubin
Copy link
Member

mlubin commented Aug 9, 2015

Or in this case, @ArjunNarayanan could build the matrix with sparse, replacing zeros with -1, and then go through nzval to replace -1 entries with zeros.
Anyway, my recommendation is to first think about the right sparse data structure for your problem. If CSC or CSR are convenient, then it's worth figuring out how to interact with SparseMatrixCSC. Otherwise, just code up the data structure yourself.

@ArjunNarayanan
Copy link

Thanks @tkelman and @mlubin. I can enter zero values by setting the nzval field directly, so @mlubin's first suggestion could work pretty well for me.

I think a CSC or CSR could work equally well in my case, as long as I am sensible about ordering my data so that it is accessed correctly (the only difference between CSC and CSR is that one would access CSC row-wise and CSR column-wise, correct?). If I could work more elegantly by dealing directly with SparseMatrixCSC, that would be great. Could you point out a reference from where I can figure out how to do this?

Thanks for the help!

@tkelman
Copy link
Contributor

tkelman commented Aug 10, 2015

Other way around, it's better to iterate through CSC by columns and CSR by rows. There's a pretty good selection of resources online if you search on google/duckduckgo for "compressed sparse column." Tim Davis' book Direct Methods for Sparse Linear Systems is also good, there's even a series of lectures on youtube - https://www.youtube.com/playlist?list=PL5EvFKC69QIyRLFuxWRnH6hIw6e1-bBXB.

The main thing to keep in mind with these references and standard data structures is whether the indexing is 1-based or 0-based. Julia and Fortran are 1-based, C and Python are 0-based.

@ArjunNarayanan
Copy link

@tkelman thanks for your reply. I was actually interested in a reference for how to directly use SparseMatrixCSC in Julia. But your references will surely come in handy in the long run, thank you.

@tkelman
Copy link
Contributor

tkelman commented Aug 11, 2015

Depends what you want to do. The source is at https://github.com/JuliaLang/julia/blob/master/base/sparse/sparsematrix.jl and in related files in that same directory. There are several packages that make use of sparse matrices in various ways, but most sparse matrix code tends to be pretty application-specific. You can also look through @ViralBShah's JuliaCon presentation notebook at https://github.com/JuliaSparse/JuliaCon2015/blob/master/Sparse%20Matrices%20in%20Julia.ipynb, hopefully the videos will get posted too one of these days.

@ArjunNarayanan
Copy link

@tkelman like I mentioned eariler, I'm only using these matrices for storage, and my only special need is the ability to easily store explicit zeros in my sparse matrix. Doing this via nzval works for me, and I was wondering how I would do it (as @mlubin suggested) by directly dealing with SparseMatrixCSC.

I'm not sure if I am at the stage where I can figure out how to do it from the source (beginner woes!) but I'll give it a shot. Thanks again!

@tkelman
Copy link
Contributor

tkelman commented Aug 11, 2015

Ah, sorry, if all you're asking for is how to translate @mlubin's suggestion into code, it would look something like this:

A = sparse(rows, cols, vals, m, n) # where vals[k] = -1 for entries you want to turn into stored zeros
A.nzval[A.nzval .== -1] = 0

@ArjunNarayanan
Copy link

Ah, I think I was completely confused!

I figured out how to do it through nzval from @mlubin's comment. I think I misunderstood the additional statement -- "then it's worth figuring out how to interact with SparseMatrixCSC" -- as meaning that there was another way of doing the same thing. My bad!

Anyway, it looks like I'm good to go for now. Thanks for all the help!

@mauro3
Copy link
Contributor Author

mauro3 commented Aug 11, 2015

Slightly off-topic: If you're on 0.4 nzrange and related can (and should) be used to do most of the direct manipulation:

help?> nzrange
search: nzrange

Base.nzrange(A, col)

   Return the range of indices to the structural nonzero values of a
   sparse matrix column. In conjunction with "nonzeros(A)" and
   "rowvals(A)", this allows for convenient iterating over a sparse
   matrix

      A = sparse(I,J,V)
      rows = rowvals(A)
      vals = nonzeros(A)
      m, n = size(A)
      for i = 1:n
         for j in nzrange(A, i)
            row = rows[j]
            val = vals[j]
            # perform sparse wizardry...
         end
      end

If on 0.3, just copy the definition of nzrange and rowvals into your code:

nzrange(S::SparseMatrixCSC, col::Integer) = S.colptr[col]:(S.colptr[col+1]-1)

@fqiang
Copy link

fqiang commented Mar 4, 2016

Is there a SparseMatrixCSC that can keep zeros in it yet?

@mauro3 thanks for pointing out the nzrange. Use this idea I come out with a version of sparse matrix that can keep zero values.
It is slow, but any suggestion to speed up will great.

function SparseMatrix.sparse(I,J,V, M, N;keepzeros=false)
    if(!keepzeros)
        return sparse(I,J,V,M,N)
    else
        full = sparse(I,J,ones(Float64,length(I)),M,N)
        actual = sparse(I,J,V,M,N)
        fill!(full.nzval,0.0)

        for c = 1:N
            crange = nzrange(actual,c)
            full.nzval[crange] = actual.nzval[crange] 
        end        
        return full
    end
end

@tkelman
Copy link
Contributor

tkelman commented Mar 5, 2016

As of #15242, sparse keeps explicit zeros and you can use dropzeros! to remove them. setindex! has not been modified though.

@Sacha0
Copy link
Member

Sacha0 commented Oct 16, 2016

Did #17404 resolve this issue? Do any points from this issue still stand? Best!

@mauro3
Copy link
Contributor Author

mauro3 commented Oct 16, 2016

Yep, resolved. Thanks for all hard work in this area!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sparse Sparse arrays
Projects
None yet
Development

No branches or pull requests