-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ReinterpretArray #23750
Implement ReinterpretArray #23750
Conversation
we always emit that as a memcpy rather than a load/store pair. However, | ||
this can give worse optimization results in certain cases because some | ||
optimizations that can handle load/store pairs cannot handle memcpys. | ||
Mem2reg is one of these optimizations. This patch adds rudamentary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rudimentary
src/intrinsics.cpp
Outdated
// appropriate coercion manually. | ||
AllocaInst *AI = cast<AllocaInst>(p); | ||
Type *AllocType = AI->getAllocatedType(); | ||
if (!AI->isArrayAllocation() && !AllocType->isAggregateType() && !AllocType->isVectorTy()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe test for what it can handle directly?
(AllocType->isFloatingPointTy() || AllocType->isIntegerTy() || AllocType->isPointerTy()) &&
(to->isFloatingPointTy() || AllocType->isIntegerTy() || AllocType->isPointerTy())
src/intrinsics.cpp
Outdated
} | ||
if (!dest) | ||
return unboxed; | ||
Type *dest_ty = unboxed->getType()->getPointerTo(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leave dest
handling here
src/cgutils.cpp
Outdated
Value *src_ptr = data_pointer(ctx, src, T_pint8); | ||
if (dest->getType() != T_pint8) | ||
dest = emit_bitcast(ctx, dest, T_pint8); | ||
Value *src_ptr = data_pointer(ctx, src); | ||
if (skip) // copy dest -> dest to simulate an undef value / conditional copy | ||
src_ptr = ctx.builder.CreateSelect(skip, dest, src_ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is broken now?
base/reinterpretarray.jl
Outdated
""" | ||
struct ReinterpretArray{T,S,N,A<:AbstractArray{S, N}} <: AbstractArray{T, N} | ||
parent::A | ||
Base.reinterpret(::Type{T}, a::A) where {T,S,N,A<:AbstractArray{S, N}} = new{T, S, N, A}(a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code assumes isbits(T) && isbits(S)
, it should be asserted here on construction (like we did before)
base/reinterpretarray.jl
Outdated
Base.reinterpret(::Type{T}, a::A) where {T,S,N,A<:AbstractArray{S, N}} = new{T, S, N, A}(a) | ||
end | ||
|
||
Base.eltype(a::ReinterpretArray{T}) where {T} = T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use Base.
for code in Base.
base/reinterpretarray.jl
Outdated
return reinterpret(T, a[inds...]) | ||
elseif sizeof(T) > sizeof(S) | ||
nels = div(sizeof(T), sizeof(S)) | ||
ind_off = (inds[1]-1) * nels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not inbounds if called as ReinterpretArray(a)[]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe define that method specifically? Base.getindex(a::ReinterpretArray) = a[1]
, if that's right?
base/reinterpretarray.jl
Outdated
o = Ref{T}() | ||
optr = Base.unsafe_convert(Ref{T}, o) | ||
for i = 1:nels | ||
unsafe_store!(convert(Ptr{S}, optr)+(i-1)*sizeof(S), a.parent[ind_off + i, tail(inds)...]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsafe_store!
can already handle this pointer offset computation:
optr = Ptr{S}(Base.unsafe_convert(Ref{T}, o))
for i in 1:nels
unsafe_store!(optr, a.parent[ind_off + i, tail(inds)...], i)
end
base/reinterpretarray.jl
Outdated
r = Ref{S}(a.parent[1+ind, Base.tail(inds)...]) | ||
@gc_preserve r begin | ||
rptr = Base.unsafe_convert(Ref{S}, r) | ||
ret = unsafe_load(convert(Ptr{T}, rptr) + sub*sizeof(T)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly here, use:
ret = unsafe_load(Ptr{T}(rptr), sub)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
base/reinterpretarray.jl
Outdated
function Base.size(a::ReinterpretArray{T,S}) where {T,S} | ||
psize = size(a.parent) | ||
if sizeof(T) > sizeof(S) | ||
size1 = div(psize[1], div(sizeof(T), sizeof(S))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably this first div
can't be const-ified by inference/LLVM. Since size
is used in bounds-checks, and since div
is achingly slow, I wonder if we need to cache this value in the struct
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
div(T, S)
is constant folded. The other div is generally folded to shifts by LLVM.
base/reinterpretarray.jl
Outdated
end | ||
return o[] | ||
else | ||
ind, sub = divrem(inds[1]-1, div(sizeof(S), sizeof(T))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also a likely performance bottleneck. Consider using a cached SignedMultiplicativeInverse
here.
EDIT: ...unless LLVM can perform this optimization itself. This is a case where the denominator is known at compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, llvm does very well here, since we're dealing with integers and known divisors.
base/reinterpretarray.jl
Outdated
If the size of `T` differs from the size of `S`, the array will be compressed/expanded in | ||
the first dimension. | ||
""" | ||
struct ReinterpretArray{T,S,N,A<:AbstractArray{S, N}} <: AbstractArray{T, N} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps switch places for S
and N
? Typically for AbstractArray
s the element type is the first parameter, and the rank the second parameter.
base/reinterpretarray.jl
Outdated
ind, sub = divrem(inds[1]-1, div(sizeof(S), sizeof(T))) | ||
r = Ref{S}(a.parent[1+ind, tail(inds)...]) | ||
rptr = unsafe_convert(Ref{S}, r) | ||
unsafe_store!(Ptr{T}(rptr), v, sub+1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rptr
is not valid here, since r
is not gc_preserve_begin
'd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r is used after, but I can put in a gc preserve just for completeness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
appearing in the source code after does not gc-preserve a value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough
base/reinterpretarray.jl
Outdated
o = Ref{T}() | ||
optr = unsafe_convert(Ref{T}, o) | ||
for i = 1:nels | ||
unsafe_store!(Ptr{S}(optr), a.parent[ind_off + i, tail(inds)...], i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for optr
here
89834c6
to
afc6839
Compare
Rebased, review addressed. Upstream LLVM patch isn't looking to good. The recommendation is to get rid of the early mem2reg in favor of SROA. We can't do that on pre-0.5, so I think carrying this patch might be fine. I opened #23772 to track looking into possible changes to the pass pipeline. |
e62c1fb
to
6e9056e
Compare
6e9056e
to
11e941c
Compare
Looks like my comment on this got lost. In any case, I think this is good to go. I had to update it to deal with some corner cases (length 3 array of Complex{Int64} to length 2 array of NTuple{3, Int64}), which lost some performance, but I'm confident that can be recovered at the LLVM level. I do want to get this in since it's fairly large and on the 1.0 path. |
base/io.jl
Outdated
@@ -267,15 +267,16 @@ readlines(s=STDIN; chomp::Bool=true) = collect(eachline(s, chomp=chomp)) | |||
|
|||
## byte-order mark, ntoh & hton ## | |||
|
|||
let endian_boms = reinterpret(UInt8, UInt32[0x01020304]) | |||
a = UInt32[0x01020304] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also be let-bound.
reinterpret(T, a, size(a)) | ||
end | ||
|
||
function reinterpret(::Type{T}, a::Array{S}, dims::NTuple{N,Int}) where T where S where N |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a deprecation.
@JeffBezanson made a good point about the sparse array stuff. Reinterpret used to work fine on sparse arrays and would reinterpret the structural non-zeros. However in general, a reinterpreted zero is no longer zero. It's not entirely clear that that was a good behavior and what to do about it. |
56dcd47
to
8fd5d61
Compare
base/reinterpretarray.jl
Outdated
|
||
@inline @propagate_inbounds function setindex!(a::ReinterpretArray{T,N,S}, v, inds::Vararg{Int, N}) where {T,N,S} | ||
v = convert(T, v)::T | ||
if sizeof(T) == sizeof(S) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
base/reinterpretarray.jl
Outdated
sptr = Ptr{UInt8}(unsafe_convert(Ref{S}, s)) | ||
nbytes_copied = 0 | ||
i = 1 | ||
@inline function copy_element() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably shouldn't use a closure, since this isn't inferrable
base/sparse/abstractsparse.jl
Outdated
@@ -2,7 +2,7 @@ | |||
|
|||
abstract type AbstractSparseArray{Tv,Ti,N} <: AbstractArray{Tv,N} end | |||
|
|||
const AbstractSparseVector{Tv,Ti} = AbstractSparseArray{Tv,Ti,1} | |||
const AbstractSparseVector{Tv,Ti} = Union{AbstractSparseArray{Tv,Ti,1}, Base.ReinterpretArray{Tv,1,T,<:AbstractSparseArray{T,Ti,1}} where T} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ewww
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, this was true with the old definition of reinterpret
on SparseArrays, it's not true with the new definition (see comment on nonzeroinds)
base/sparse/abstractsparse.jl
Outdated
ind * div(sizeof(S), sizeof(T)) | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method assumes that reinterpret(zero(T)) === zero(S)
. I don't think we should define it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So did the old reinterpret
kind off, which is why I didn't change it. Happy to get rid of this though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old definition returned a sparse reinterpreted array (SparseArray{ReinterpretedArray{parent}}
), the new one returns a reinterpreted sparse array (Reinterpreted{A}
). The main point where they differ is in the handling of this method, since the old one would return a strong zero (zero(S)
), where the new one would return a casted zero (reinterpret(S, zero(T))
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, but since nonzeroinds returns structural non-zeros, that still seems fine to (if odd if reinterpret(S, zero(T)) != zero(S)
. Would be good to have some input from people who care about sparse matrices.
Seems like the preferred solution is to disallow reinterpret on SparseArrays. I'll back out the sparse parts of this and update NEWS accordingly. |
Is this agreeable with everyone now? |
Seeing no objections, this is getting merged. |
This seems to have broken HDF5. From the Nanosoldier log: https://github.com/JuliaCI/BaseBenchmarkReports/blob/9c503da1dad07ebbe4f7e2e9dadf19be6eb5b1f5/daily_2017_10_9/logs/8273529b08bd23e17be0234ce976bb137d375708_primary.err#L8567 |
I'll take a look. |
Looks like JLD is the thing that's borked. Also looks like JLD hasn't been updated to support bitsunions yet. |
I don't know who's actively maintaining JLD at this point. Migrating Nanosoldier's serialization format from JLD to JSON is on my to-do list but it isn't at the top. |
Is this failure expected? julia> x = reinterpret(Int, 1:4)
4-element reinterpret(Int64, ::UnitRange{Int64}):
1
2
3
4
julia> next(x, endof(x))
ERROR: BoundsError
Stacktrace:
[1] getindex at ./number.jl:57 [inlined]
[2] next(::Base.ReinterpretArray{Int64,1,Int64,UnitRange{Int64}}, ::Int64) at ./abstractarray.jl:763
This breaks code which uses |
Since
I don't think that particular feature is part of the AbstractArray contract. Does it hold for say AxisArrays? |
Yeah, I guess we're just used to |
unsigned alignment = julia_alignment(typ, 0); | ||
emit_memcpy(ctx, dest, src_ptr, jl_datatype_size(typ), alignment, isVolatile, tbaa); | ||
Value *nbytes = ConstantInt::get(T_size, nb); | ||
if (skip) // copy dest -> dest to simulate an undef value / conditional copy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that LLVM recently hit this problem too, and are working on a real fix (especially due the performance problems this change may cause) https://reviews.llvm.org/D86815
This redoes
reinterpret
in julia rather than punning the memoryof the actual array. The motivation for this is to avoid the API
limitations of the current reinterpret implementation (Array only,
preventing strong TBAA, alignment problems). The surface API
essentially unchanged, though the shape argument to reinterpret
is removed, since those concepts are now orthogonal. The return
type from
reinterpret
is nowReinterpretArray
, which implementsthe AbstractArray interface and does the reinterpreting lazily on
demand. The compiler is able to fold away the abstraction and
generate very tight IR:
In addition, the new
reinterpret
implementation is able to handle any AbstractArray(whether useful or not is a separate decision):
The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's
code deeper in the array code that needs to be broadened to allow ReinterpretArray.
Fixes #22849
Fixes #19238