-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to turn on and off scalar indexing for sparse matrices? #189
Comments
Is the idea to use this only for testing? I can't imagine it would be a good idea to do this in a package like SciML, disabling it for everything else, or to disable it in scripts like with CUDA. |
Yes, I'd only use it for testing. The main thing would be to ensure we don't hit any fallback that are looping scalar-wise. |
If you are using with CUDA, you probably want to disable it, right? Packages can opt in to do that, I suppose. |
Yes, with CUDA its generally the case that you say "if using CUDA, then put Of course it can be used incorrectly, the CUDA.jl one could be used incorrectly as well if a package sets it "for the user" and suddenly non-performance sensitive parts of the code are overly cautious, but I think over the years people have found |
I definitely prefer the StaticSparseMatrixCSC <: AbstractSparseMatrixCSC route. But it is a little more onerous to implement. |
What would you do with |
Would that not just be the union of the patterns? |
Promote to static? |
That seems reasonable. Especially if this is mostly for testing. |
I would have assumed non-static for mixed things like that -- much like a diagonal matrix plus a general matrix gives a general matrix. |
would it make sense to just disable it i.e. by redefining |
I suppose that would work, but should be done through an API that can disable and re-enable. |
It already has to assume indirect access with scalar indexing so it already has a few branches that block SIMD and already has other branches that will fail branch prediction. This branch will be the most predictable (almost always true) so it would clearly be the lowest overhead one of the bunch. What would it cost, like 2% performance difference in a code path already deemed too slow to be used in cases where performance is required? Cases which care about performance would already never use the scalar indexing and would instead generate code for using SparseArrays
const glob = Ref(true)
A = sprand(1000,1000,0.01)
function f(A,i1,i2,i4,i5,x)
A[i1,i2] = x
nothing
end
julia> @btime f(A,500,800,300,450,0.1)
29.920 ns (0 allocations: 0 bytes)
julia> @btime f(A,500,800,300,450,0.1)
29.087 ns (0 allocations: 0 bytes)
function f(A,i1,i2,i4,i5,x)
if glob[]
A[i1,i2] = x
end
nothing
end
julia> @btime f(A,500,800,300,450,0.1)
30.191 ns (0 allocations: 0 bytes)
julia> @btime f(A,500,800,300,450,0.1)
30.221 ns (0 allocations: 0 bytes) |
that's going to make the three argument dot of The real question is what would stop you from using a custom From a design perspective having global states seem less than ideal. |
? Of course it wouldn't touch
If that's falling back to using
You'd have to create an entire mirror type and would not have a good way to guaranteeing that it's hitting exactly the same dispatches as the original path. If someone writes a |
you're missing the point, if they didn't have narrow types/did illegal access, it would be pretty easy to cook up a AbstractSparseArraysCSC variant that does what you want. and i t would be free of a global state. |
No. Counter example: define a |
i see, it's unavoidable if you use structures as you'll need concrete types but it can be still be generic? It still comes back to the fact that having functions that explicitly take SparseMatrixCSCs is a bad idea. (and fairly common) |
@ChrisRackauckas opinions? should it also block indexings like |
I would think those are fine. In the GPU case, those are allowed and |
do note that |
oh, then that's bad 😅. I guess that would error at the fallback |
Well it would do what you want - disable the use of scalar indexing. You should not try to extract a row out of a CSC matrix. |
Agreed. |
🎉 |
A[i,j] = ...
is very slow ifA isa SparseMatrixCSC
. In the wild, there's another case like this with CUDA.jl. In that package, it allows for doingCUDA.allowscalar(false)
so that anyA[i,j]
operation (getindex
andsetindex!
) causes an error to be thrown. This is helpful for finding unoptimized fallbacks which would otherwise "work" given the AbstractArray interface, but would be detrimental to performance. Without this, finding out which CUDA codes are hitting unwanted fallbacks can only be done by profiling.It would be nice to have similar functionality available on SparseMatrixCSC. I am not sure I like the idea of having it global though, since then it's global, but if it's not global you need rules for how to carry it forward with broadcast etc. which might be hard to do in general and so the global is the easy option.
The text was updated successfully, but these errors were encountered: