Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to turn on and off scalar indexing for sparse matrices? #189

Closed
ChrisRackauckas opened this issue Jul 17, 2022 · 25 comments · Fixed by #200
Closed

Ability to turn on and off scalar indexing for sparse matrices? #189

ChrisRackauckas opened this issue Jul 17, 2022 · 25 comments · Fixed by #200

Comments

@ChrisRackauckas
Copy link

A[i,j] = ... is very slow if A isa SparseMatrixCSC. In the wild, there's another case like this with CUDA.jl. In that package, it allows for doing CUDA.allowscalar(false) so that any A[i,j] operation (getindex and setindex!) causes an error to be thrown. This is helpful for finding unoptimized fallbacks which would otherwise "work" given the AbstractArray interface, but would be detrimental to performance. Without this, finding out which CUDA codes are hitting unwanted fallbacks can only be done by profiling.

It would be nice to have similar functionality available on SparseMatrixCSC. I am not sure I like the idea of having it global though, since then it's global, but if it's not global you need rules for how to carry it forward with broadcast etc. which might be hard to do in general and so the global is the easy option.

@rayegun
Copy link
Member

rayegun commented Jul 17, 2022

Is the idea to use this only for testing? I can't imagine it would be a good idea to do this in a package like SciML, disabling it for everything else, or to disable it in scripts like with CUDA.

@ChrisRackauckas
Copy link
Author

Yes, I'd only use it for testing. The main thing would be to ensure we don't hit any fallback that are looping scalar-wise.

@ViralBShah
Copy link
Member

If you are using with CUDA, you probably want to disable it, right? Packages can opt in to do that, I suppose.

@ChrisRackauckas
Copy link
Author

Yes, with CUDA its generally the case that you say "if using CUDA, then put CUDA.allowscalar(false) in your script because that will catch common performance pitfalls and turn them into errors". The similar fact is true for sparse matrices that if you're doing A[i,j] operations all over the place then you'll have bad performance, but there isn't a similar way with SparseMatrixCSC to turn performance pitfalls into errors.

Of course it can be used incorrectly, the CUDA.jl one could be used incorrectly as well if a package sets it "for the user" and suddenly non-performance sensitive parts of the code are overly cautious, but I think over the years people have found CUDA.allowscalar(false) useful enough for testing and scripting safety that it would be nice to have a similar feature for sparse matrices.

@rayegun
Copy link
Member

rayegun commented Jul 17, 2022

I definitely prefer the StaticSparseMatrixCSC <: AbstractSparseMatrixCSC route. But it is a little more onerous to implement.

@ChrisRackauckas
Copy link
Author

What would you do with A::StaticSparseMatrixCSC .+ B::SparseMatrixCSC?

@rayegun
Copy link
Member

rayegun commented Jul 17, 2022

Would that not just be the union of the patterns?

@ChrisRackauckas
Copy link
Author

Promote to static?

@rayegun
Copy link
Member

rayegun commented Jul 17, 2022

That seems reasonable. Especially if this is mostly for testing.

@wnoise
Copy link

wnoise commented Jul 18, 2022

I would have assumed non-static for mixed things like that -- much like a diagonal matrix plus a general matrix gives a general matrix.

@SobhanMP
Copy link
Member

SobhanMP commented Jul 18, 2022

would it make sense to just disable it i.e. by redefining base.setindex(::SparseMatrixCSC, ::Any, ::Integer, ::Integer) = error()? specially since it's only for testing. adding any sort of runtime checks seems not ideal, specially for applications that use scalar_indexing a lot.

@ViralBShah
Copy link
Member

I suppose that would work, but should be done through an API that can disable and re-enable.

@ChrisRackauckas
Copy link
Author

ChrisRackauckas commented Jul 18, 2022

specially since it's only for testing. adding any sort of runtime checks seems not ideal, specially for applications that use scalar_indexing a lot.

It already has to assume indirect access with scalar indexing so it already has a few branches that block SIMD and already has other branches that will fail branch prediction. This branch will be the most predictable (almost always true) so it would clearly be the lowest overhead one of the bunch. What would it cost, like 2% performance difference in a code path already deemed too slow to be used in cases where performance is required? Cases which care about performance would already never use the scalar indexing and would instead generate code for A.nzvals directly for this reason. I don't understand see how runtime performance could be the hold up.

using SparseArrays
const glob = Ref(true)
A = sprand(1000,1000,0.01)

function f(A,i1,i2,i4,i5,x)
    A[i1,i2] = x
    nothing
end

julia> @btime f(A,500,800,300,450,0.1)
  29.920 ns (0 allocations: 0 bytes)

julia> @btime f(A,500,800,300,450,0.1)
  29.087 ns (0 allocations: 0 bytes)

function f(A,i1,i2,i4,i5,x)
    if glob[]
        A[i1,i2] = x
    end
    nothing
end

julia> @btime f(A,500,800,300,450,0.1)
  30.191 ns (0 allocations: 0 bytes)

julia> @btime f(A,500,800,300,450,0.1)
  30.221 ns (0 allocations: 0 bytes)

@SobhanMP
Copy link
Member

SobhanMP commented Jul 18, 2022

that's going to make the three argument dot of Symmetric{SparseArraysCSC} a whooping 2% slower.

The real question is what would stop you from using a custom AbstractSparseArraysCSC that doesn't have scalar indexing? A lot of code have a too narrow type definition and use "illegal" accessors. i.e. x.nzval and such. Seem like two birds with one stone to me.

From a design perspective having global states seem less than ideal.

@ChrisRackauckas
Copy link
Author

ChrisRackauckas commented Jul 18, 2022

A lot of code have a too narrow type definition and use "illegal" accessors. i.e. x.nzval and such. Seem like two birds with one stone to me.

? Of course it wouldn't touch nzval access because that's supposed to be fast. You want to find out what codes are accidentally doing A[i,j] instead of direct nzval accesses.

that's going to make the three argument dot of Symmetric{SparseArraysCSC} a whooping 2% slower.

If that's falling back to using A[i,j], well then this just identified a performance bug that should get fix. The dot, mul!, etc. implementations don't use A[i,j] for a good reason, and if they do, we should have a way of knowing that they are doing that accidentally so we can make them like 10x faster! The purpose of this would be to make it easy to turn those accidental fallback cases into errors to make them easier to identify.

The real question is what would stop you from using a custom AbstractSparseArraysCSC that doesn't have scalar indexing?

You'd have to create an entire mirror type and would not have a good way to guaranteeing that it's hitting exactly the same dispatches as the original path. If someone writes a SparseMatrixCSC overload that fixes a bad A[i,j] path, a separate type would not hit it, so it would be a spurious failure. I would prefer to make it easy for people to write good correct sparse code than to protect tiny bits of performance on already slow code.

@SobhanMP
Copy link
Member

SobhanMP commented Jul 18, 2022

? Of course it wouldn't touch nzval access because that's supposed to be fast. You want to find out what codes are accidentally doing A[i,j] instead of direct nzval accesses.

you're missing the point, if they didn't have narrow types/did illegal access, it would be pretty easy to cook up a AbstractSparseArraysCSC variant that does what you want. and i t would be free of a global state.

@ChrisRackauckas
Copy link
Author

it would be pretty easy to cook up a AbstractSparseArraysCSC variant that does what you want. and i t would be free of a global state.

No. Counter example: define a dot overload only on SparseMatrixCSC that avoids an A[i,j] indexing. Now your tests for whether a slow path are hit give the wrong answer if it's only relying on AbstractSparseMatrixCSC dispatches unless you do some fancy AbstractInterpreter thing to automatically expand all SparseMatrixCSC dispatches or something.

@SobhanMP
Copy link
Member

i see, it's unavoidable if you use structures as you'll need concrete types but it can be still be generic? It still comes back to the fact that having functions that explicitly take SparseMatrixCSCs is a bad idea. (and fairly common)

@SobhanMP
Copy link
Member

SobhanMP commented Aug 7, 2022

@ChrisRackauckas opinions? should it also block indexings like A[2, :]?

@ChrisRackauckas
Copy link
Author

I would think those are fine. In the GPU case, those are allowed and allowscalar still catches the overwhelming majority of bad fallbacks.

@SobhanMP
Copy link
Member

SobhanMP commented Aug 7, 2022

do note that A[2, :] is equivalent to n scalar indexings in this package

@ChrisRackauckas
Copy link
Author

oh, then that's bad 😅. I guess that would error at the fallback

@ViralBShah
Copy link
Member

Well it would do what you want - disable the use of scalar indexing. You should not try to extract a row out of a CSC matrix.

@ChrisRackauckas
Copy link
Author

Agreed.

@ChrisRackauckas
Copy link
Author

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants