-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent results w/ and w/o @turbo #507
Comments
Bug/known limitation. It is fixed in LoopModels, the successor of LoopVectorization. This blog post discusses how to fix it automatically (i.e. similar to what LoopModels does), but you can also do this manually, like SimpleChains.jl: |
Any chance LoopModels will support Float16? |
It is written as an LLVM pass, so it should more naturally support types supported by Julia and LLVM. |
@chriselrod thank you very much for your prompt reply. I understand this is a known bug/limitation, but from a user perspective I have not seen it mentioned anywhere in the package documentation. Before your kind message, I did not even know what a tiling was (my ignorance, but I guess that not all users of LoopVectorization are experts in this topic). So I only now I realize the difficulty with the rewrite of this specific kind of loop operation. In any case, since (correct me if I am wrong) there is no mention of this issue in the documentation, I feel that LoopVectorization should raise an error saying that this particular loop cannot be vectorized (as it does in other situations), rather than running without any apparent problem and producing incorrect results. |
Well, actually, it could vectorize it in a less fancy way.
Yeah, the goal of the repo (and LoopModels) is that users do not have to know anything. |
OK, I am trying to fix manually the issue using this code function morph_dilate2(A::AbstractArray{T,N}, kernel::AbstractArray{S,N}) where {T<:Integer,S<:Integer,N}
out = similar(A, tuple((first(axes(A, n))+first(axes(kernel, n)):
last(axes(A, n))+last(axes(kernel, n)) for n ∈ 1:N)...)...)
Ks = CartesianIndices(kernel)
Is = CartesianIndices(A)
Ifirst, Ilast = first(Is), last(Is)
Js = CartesianIndices(out)
@turbo for J ∈ Js
tmp = zero(T)
for K ∈ Ks
Aᵢ = (Ifirst <= J - K <= Ilast) ? A[J-K] : zero(T)
tmp |= Aᵢ & kernel[K]
end
out[J] = tmp
end
out
end However, @turbo fails with an error message "Reduction not found." (which surprises me since the reduction is actually there). The issue seems to be with the use of |
Consider the following code
[The function essentially performs a convolution with a given
kernel
of an input arrayA
, using the integer and (&
) and the or (|
) instead of the product and sum in the convolution expression. It also extends the original input array A.]When I use this function on a test input I obtain an unexpected result:
Note that this result is wrong: the correct result, which can be obtained removing the
@turbo
macro in the definition above, isAm I using
@turbo
in an wrong way, or is that a bug?The text was updated successfully, but these errors were encountered: