-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve accuracy of conversions to floating-point numbers (#129) #138
Conversation
Benchmarksee: Known Issues for UsersRounding errorIn some cases, the 1-lsb errors still occurs. (see also: #129 (comment)) For example: julia> x = reinterpret(N4f28, 0b1111_1111_1111_1111_1111_1111_0111_0000)
15.999999523N4f28
julia> Float32(x)
16.0f0
julia> Float32(BigFloat(Rational(x)))
15.999999f0 Workaround: use of the divisions. SlowdownWith a few exceptions, the accuracy comes at price of the speed. Workaround: use of Slowdown with
|
a782be8
to
2cfddd3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Perhaps check the @pure
comment below, then merge at will!
src/normed.jl
Outdated
macro exp2(n) | ||
:(_exp2(Val($(esc(n))))) | ||
end | ||
@generated _exp2(::Val{N}) where {N} = (x = exp2(N); :($x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could both of these be Base.@pure
? With @generated
one should technically provide a fallback (e.g., https://github.com/JuliaLang/julia/blob/e5ee8da70a646f1f54730e727932ebaeaee256d5/base/namedtuple.jl#L75-L83) in preparation for the day when it becomes more common to ship precompiled binaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! _exp2(::Val{N}) where {N} = exp2(N)
seems to work well! (not fully tested)
I found that the generated function can be a workaround for the integer division problems. And then, I thought the same technique could be used for exp2
, and I did it as shown above.
However, I don't know any substitutions for the generated inv_rawone()
.
Should I add a fallback? I am looking for the documentation about @generated
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I do not fully understand the mechanism of @generated
, I added a fallback by following the example.
inv_rawone(x) = (@generated) ? (y = 1.0 / rawone(x); :($y)) : 1.0 / rawone(x)
BTW, I am thinking of moving these low-level utility functions and macros to a new file (e.g. "src/utilities.jl")
Surprisingly, rawone
is defined in "src/normed.jl" even though it can be called with a Fixed
argument. Moreover, although there is no noticeable overhead, the implementation of rawone
takes a detour:
interpret> FixedPointNumbers.rawone(1N5f3)
rawone(v) in FixedPointNumbers at ~/.julia/packages/FixedPointNumbers/870tH/src/normed.jl:38
one(x::Normed) in FixedPointNumbers at ~/.julia/packages/FixedPointNumbers/870tH/src/normed.jl:37
oneunit(x::Normed) in FixedPointNumbers at ~/.julia/packages/FixedPointNumbers/870tH/src/normed.jl:36
one(::Type{T}) where T<:Normed in FixedPointNumbers at ~/.julia/packages/FixedPointNumbers/870tH/src/normed.jl:34
oneunit(::Type{T}) where T<:Normed in FixedPointNumbers at ~/.julia/packages/FixedPointNumbers/870tH/src/normed.jl:32
#unused# = Normed{UInt8,3}
T = Normed{UInt8,3}
31 function oneunit(::Type{T}) where T <: Normed
32 T(typemax(rawtype(T)) >> (8 * sizeof(T) - nbitsfrac(T)), 0)
end
Even though we can get the rawone
value (which isa Unsigned
) in oneunit(::Type{T})
, it will be converted to Normed
, and reinterpreted back to Unsigned
.
Should I refactor the zero
, one
, oneunit
and rawone
first, as a part of #139? Or keep the utility functions in "src/normed.jl" for the moment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that making this more straightforward seems desirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With
diff --git a/src/normed.jl b/src/normed.jl
index 737657b..07f5b86 100644
--- a/src/normed.jl
+++ b/src/normed.jl
@@ -114,7 +114,7 @@ end
_exp2(::Val{N}) where {N} = exp2(N)
# for Julia v1.0, which does not fold `div_float` before inlining
-inv_rawone(x) = (@generated) ? (y = 1.0 / rawone(x); :($y)) : 1.0 / rawone(x)
+@pure inv_rawone(x) = 1.0 / rawone(x)
function (::Type{T})(x::Normed) where {T <: AbstractFloat}
# The following optimization for constant division may cause rounding errors.
I get
julia> x = N8f24(15.2)
15.2N8f24
julia> @code_llvm Float32(x)
; @ /home/tim/.julia/dev/FixedPointNumbers/src/normed.jl:159 within `Float32'
define float @julia_Float32_17656(%jl_value_t addrspace(10)*, { i32 } addrspace(11)* nocapture nonnull readonly dereferenceable(4)) {
top:
; @ /home/tim/.julia/dev/FixedPointNumbers/src/normed.jl:160 within `Float32'
; ┌ @ Base.jl:20 within `getproperty'
%2 = getelementptr inbounds { i32 }, { i32 } addrspace(11)* %1, i64 0, i32 0
; └
; ┌ @ int.jl:498 within `unsafe_trunc'
; │┌ @ int.jl:464 within `rem'
%3 = load i32, i32 addrspace(11)* %2, align 4
; └└
; @ /home/tim/.julia/dev/FixedPointNumbers/src/normed.jl:169 within `Float32'
; ┌ @ promotion.jl:349 within `<' @ int.jl:49
%4 = icmp sgt i32 %3, -1
; └
%5 = select i1 %4, double 0.000000e+00, double 0x4070000010000010
; @ /home/tim/.julia/dev/FixedPointNumbers/src/normed.jl:170 within `Float32'
; ┌ @ float.jl:60 within `Float64'
%6 = sitofp i32 %3 to double
; └
; ┌ @ float.jl:410 within `muladd'
%7 = fmul contract double %6, 0x3E70000010000010
%8 = fadd contract double %7, %5
; └
; @ /home/tim/.julia/dev/FixedPointNumbers/src/normed.jl:170 within `Float32' @ float.jl:251
%9 = fptrunc double %8 to float
; @ /home/tim/.julia/dev/FixedPointNumbers/src/normed.jl:170 within `Float32'
ret float %9
}
on Julia 1.0. It is sometimes joked that @pure
means "ask Jameson if it's OK," but here I think it's pretty straightforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you go into more detail?
This is a matter of the speed, not the correctness. In the case of "Matrix of Vec4", the "master" is slower than the "simple div" except the BigFloat
cases.
#129 (comment)
The default inline_cost_threshold
is 100
and the cost of div_float
is 20
.
For example, if we want to use 4 packed numbers, the cost of single method call (callee) should be reduced to 100 / 4 = 25
, or the inlining of the caller may fail and the SIMD vectorization will be incomplete. In addition, the failure of SIMD vectorization causes the failure of the broadcast optimization. Therefore, one div_float
(i.e. "simple div") is OK, but two div_float
s are not good. Although an intermediate variable can reduce the number of div_float
s, there are still mul_float
, fptrunc
and something trivial.
Julia's optimizer seems to be more complicated, but in my experience, this can explain the reason of the slowdown on Julia v1.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I wasn't looking at vectorization, sorry. The point is that the LLVM is the last stage before native code.
If @pure
doesn't work, let's keep your @generated
version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that the LLVM is the last stage before native code.
I get the point, but even if it's correct in the LLVM IR...
Especially, in cases of
Normed{UInt32,f}
->Float32
where2 <= f <= 24
, the LLVM generates a destructive conditional branch.
%5 = select i1 %4, double 0.000000e+00, double 0x4070000010000010
-> 💥 💥 💥
vxorpd %xmm0, %xmm0, %xmm0
testl %eax, %eax
jns L30
vmovsd .LCPI639_0, %xmm0 # xmm0 = mem[0],zero
L30:
I think there is almost no way to intervene in the LLVM backend from Julia scripts. By using -0.0
instead of 0.0
, the conditional branch (jns
) can be removed, but it stalls in the SIMD version. So I gave up!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably don't know as much about SIMD as you do, but is select
actually destructive? It's not technically a branch. See https://github.com/eschnett/SIMD.jl#representing-simd-vector-types-in-julia (search for select
) and http://blog.llvm.org/2011/12/llvm-31-vector-changes.html (suggesting it went into LLVM 3.1).
Of course, whether it applies these is another question.
I didn't mean to dissuade you from the @generated
solution, by all means adopt that if it works better. I personally would also not stress too much about Julia 1.0 performance, if it's really painful. "Use a more recent Julia version" is a perfectly fine response to a performance concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, select
is not destructive. Since this is a rare (technically interesting but personally hateful) case, I just left the comment. This is a problem with the non-SIMD LLVM IR as you dumped above. Moreover, although I don't like the branch, it is still reasonable in points of the binary size (i.e. cache efficiency) and the high-performance branch prediction of modern CPUs. (This usecase unfortunately does not go well with the branch prediction, but the LLVM backend cannot know that.)
I didn't mean to dissuade you from the
@generated
solution, by all means adopt that if it works better.
I am aware of that. It is just my hesitation. The following comment which I wrote above:
BTW, I am thinking of moving these low-level utility functions and macros to a new file (e.g. "src/utilities.jl")
implicitly suggests that perhaps I will add more @generated
(e.g. div_rawones(a, b) = rawone(a) / rawone(b)
) without the users noticing. I do not think it is pretty good to (over-)use @generated
, but it is certainly powerful.
Now, I am not going to resist the temptation of powerful @generated
. This commit is far from what I originally envisioned, but I am happy because float(3N5f3) == 3
comes true
! 😆
This fixes (mitigates) the issue #129.