Improve accuracy of conversions to floating-point numbers (#129) #138

kimikage · 2019-11-17T17:42:53Z

This fixes (mitigates) the issue #129.

kimikage · 2019-11-17T18:22:53Z

Benchmark

see:
#129 (comment)
#129 (comment)
#129 (comment)

Known Issues for Users

Rounding error

In some cases, the 1-lsb errors still occurs. (see also: #129 (comment))

For example:

julia> x = reinterpret(N4f28, 0b1111_1111_1111_1111_1111_1111_0111_0000)
15.999999523N4f28

julia> Float32(x)
16.0f0

julia> Float32(BigFloat(Rational(x)))
15.999999f0

Workaround: use of the divisions.

Slowdown

With a few exceptions, the accuracy comes at price of the speed.
The following particular cases are relatively noticeable.

N8f8, N7f9, N6f10, N5f11, N4f12, N3f13, N2f14, N1f15 -> Float32
N15f17, N8f24, N7f25, N6f26, N3f29, N1f31 -> Float64

Workaround: use of scaledual. (see also: #130)

Slowdown with `ReinterpretArray`

The conversion methods are designed to be suitable for the SIMD vectorization. However, the current ReinterpretArray is not SIMD-suitable (cf. #125, JuliaLang/julia#28980), so it will cause a further slowdown.
Especially, in cases of Normed{UInt32,f}->Float32 where 2 <= f <= 24, the LLVM generates a destructive conditional branch.

Workaround: use of Array instead of ReinterpretArray (i.e. avoiding rand(::Type{<:Normed}, sz::Dims))

Conversions to `BigFloat`

This PR does not change the conversions to BigFloat. I think the BigFloat users need the accuracy rather than the speed.

Workaround: use of Rational in the first step of two-step conversion.

timholy

This is great! Perhaps check the @pure comment below, then merge at will!

timholy · 2019-11-19T15:08:28Z

src/normed.jl

+macro exp2(n)
+     :(_exp2(Val($(esc(n)))))
+end
+@generated _exp2(::Val{N}) where {N} = (x = exp2(N); :($x))


Could both of these be Base.@pure? With @generated one should technically provide a fallback (e.g., https://github.com/JuliaLang/julia/blob/e5ee8da70a646f1f54730e727932ebaeaee256d5/base/namedtuple.jl#L75-L83) in preparation for the day when it becomes more common to ship precompiled binaries.

Oh! _exp2(::Val{N}) where {N} = exp2(N) seems to work well! (not fully tested)
I found that the generated function can be a workaround for the integer division problems. And then, I thought the same technique could be used for exp2, and I did it as shown above.

However, I don't know any substitutions for the generated inv_rawone().
Should I add a fallback? I am looking for the documentation about @generated.

Although I do not fully understand the mechanism of @generated, I added a fallback by following the example.

inv_rawone(x) = (@generated) ? (y = 1.0 / rawone(x); :($y)) : 1.0 / rawone(x)

BTW, I am thinking of moving these low-level utility functions and macros to a new file (e.g. "src/utilities.jl")
Surprisingly, rawone is defined in "src/normed.jl" even though it can be called with a Fixed argument. Moreover, although there is no noticeable overhead, the implementation of rawone takes a detour:

interpret> FixedPointNumbers.rawone(1N5f3) rawone(v) in FixedPointNumbers at ~/.julia/packages/FixedPointNumbers/870tH/src/normed.jl:38 one(x::Normed) in FixedPointNumbers at ~/.julia/packages/FixedPointNumbers/870tH/src/normed.jl:37 oneunit(x::Normed) in FixedPointNumbers at ~/.julia/packages/FixedPointNumbers/870tH/src/normed.jl:36 one(::Type{T}) where T<:Normed in FixedPointNumbers at ~/.julia/packages/FixedPointNumbers/870tH/src/normed.jl:34 oneunit(::Type{T}) where T<:Normed in FixedPointNumbers at ~/.julia/packages/FixedPointNumbers/870tH/src/normed.jl:32 #unused# = Normed{UInt8,3} T = Normed{UInt8,3} 31 function oneunit(::Type{T}) where T <: Normed 32 T(typemax(rawtype(T)) >> (8 * sizeof(T) - nbitsfrac(T)), 0) end

Even though we can get the rawone value (which isa Unsigned) in oneunit(::Type{T}), it will be converted to Normed, and reinterpreted back to Unsigned.

Should I refactor the zero, one, oneunit and rawone first, as a part of #139? Or keep the utility functions in "src/normed.jl" for the moment?

You're right that making this more straightforward seems desirable.

With

diff --git a/src/normed.jl b/src/normed.jl index 737657b..07f5b86 100644 --- a/src/normed.jl +++ b/src/normed.jl @@ -114,7 +114,7 @@ end _exp2(::Val{N}) where {N} = exp2(N) # for Julia v1.0, which does not fold `div_float` before inlining -inv_rawone(x) = (@generated) ? (y = 1.0 / rawone(x); :($y)) : 1.0 / rawone(x) +@pure inv_rawone(x) = 1.0 / rawone(x) function (::Type{T})(x::Normed) where {T <: AbstractFloat} # The following optimization for constant division may cause rounding errors.

I get

julia> x = N8f24(15.2) 15.2N8f24 julia> @code_llvm Float32(x) ; @ /home/tim/.julia/dev/FixedPointNumbers/src/normed.jl:159 within `Float32' define float @julia_Float32_17656(%jl_value_t addrspace(10)*, { i32 } addrspace(11)* nocapture nonnull readonly dereferenceable(4)) { top: ; @ /home/tim/.julia/dev/FixedPointNumbers/src/normed.jl:160 within `Float32' ; ┌ @ Base.jl:20 within `getproperty' %2 = getelementptr inbounds { i32 }, { i32 } addrspace(11)* %1, i64 0, i32 0 ; └ ; ┌ @ int.jl:498 within `unsafe_trunc' ; │┌ @ int.jl:464 within `rem' %3 = load i32, i32 addrspace(11)* %2, align 4 ; └└ ; @ /home/tim/.julia/dev/FixedPointNumbers/src/normed.jl:169 within `Float32' ; ┌ @ promotion.jl:349 within `<' @ int.jl:49 %4 = icmp sgt i32 %3, -1 ; └ %5 = select i1 %4, double 0.000000e+00, double 0x4070000010000010 ; @ /home/tim/.julia/dev/FixedPointNumbers/src/normed.jl:170 within `Float32' ; ┌ @ float.jl:60 within `Float64' %6 = sitofp i32 %3 to double ; └ ; ┌ @ float.jl:410 within `muladd' %7 = fmul contract double %6, 0x3E70000010000010 %8 = fadd contract double %7, %5 ; └ ; @ /home/tim/.julia/dev/FixedPointNumbers/src/normed.jl:170 within `Float32' @ float.jl:251 %9 = fptrunc double %8 to float ; @ /home/tim/.julia/dev/FixedPointNumbers/src/normed.jl:170 within `Float32' ret float %9 }

on Julia 1.0. It is sometimes joked that @pure means "ask Jameson if it's OK," but here I think it's pretty straightforward.

Could you go into more detail?
This is a matter of the speed, not the correctness. In the case of "Matrix of Vec4", the "master" is slower than the "simple div" except the BigFloat cases.
#129 (comment)

The default inline_cost_threshold is 100 and the cost of div_float is 20.
For example, if we want to use 4 packed numbers, the cost of single method call (callee) should be reduced to 100 / 4 = 25, or the inlining of the caller may fail and the SIMD vectorization will be incomplete. In addition, the failure of SIMD vectorization causes the failure of the broadcast optimization. Therefore, one div_float (i.e. "simple div") is OK, but two div_floats are not good. Although an intermediate variable can reduce the number of div_floats, there are still mul_float, fptrunc and something trivial.
Julia's optimizer seems to be more complicated, but in my experience, this can explain the reason of the slowdown on Julia v1.0.

Oh, I wasn't looking at vectorization, sorry. The point is that the LLVM is the last stage before native code.

If @pure doesn't work, let's keep your @generated version.

The point is that the LLVM is the last stage before native code.

I get the point, but even if it's correct in the LLVM IR...

Especially, in cases of Normed{UInt32,f}->Float32 where 2 <= f <= 24, the LLVM generates a destructive conditional branch.

%5 = select i1 %4, double 0.000000e+00, double 0x4070000010000010

-> 💥 💥 💥

vxorpd %xmm0, %xmm0, %xmm0 testl %eax, %eax jns L30 vmovsd .LCPI639_0, %xmm0 # xmm0 = mem[0],zero L30:

I think there is almost no way to intervene in the LLVM backend from Julia scripts. By using -0.0 instead of 0.0, the conditional branch (jns) can be removed, but it stalls in the SIMD version. So I gave up!

I probably don't know as much about SIMD as you do, but is select actually destructive? It's not technically a branch. See https://github.com/eschnett/SIMD.jl#representing-simd-vector-types-in-julia (search for select) and http://blog.llvm.org/2011/12/llvm-31-vector-changes.html (suggesting it went into LLVM 3.1).

Of course, whether it applies these is another question.

I didn't mean to dissuade you from the @generated solution, by all means adopt that if it works better. I personally would also not stress too much about Julia 1.0 performance, if it's really painful. "Use a more recent Julia version" is a perfectly fine response to a performance concern.

In general, select is not destructive. Since this is a rare (technically interesting but personally hateful) case, I just left the comment. This is a problem with the non-SIMD LLVM IR as you dumped above. Moreover, although I don't like the branch, it is still reasonable in points of the binary size (i.e. cache efficiency) and the high-performance branch prediction of modern CPUs. (This usecase unfortunately does not go well with the branch prediction, but the LLVM backend cannot know that.)

I didn't mean to dissuade you from the @generated solution, by all means adopt that if it works better.

I am aware of that. It is just my hesitation. The following comment which I wrote above:

BTW, I am thinking of moving these low-level utility functions and macros to a new file (e.g. "src/utilities.jl")

implicitly suggests that perhaps I will add more @generated (e.g. div_rawones(a, b) = rawone(a) / rawone(b)) without the users noticing. I do not think it is pretty good to (over-)use @generated, but it is certainly powerful.

Now, I am not going to resist the temptation of powerful @generated. This commit is far from what I originally envisioned, but I am happy because float(3N5f3) == 3 comes true! 😆

kimikage force-pushed the issue129 branch 5 times, most recently from a782be8 to 2cfddd3 Compare November 19, 2019 13:42

timholy approved these changes Nov 19, 2019

View reviewed changes

timholy mentioned this pull request Nov 19, 2019

Optimizing Normed -> Normed conversions #140

Closed

Improve accuracy of conversions to floating-point numbers

0ea5703

kimikage force-pushed the issue129 branch from 2cfddd3 to 0ea5703 Compare November 20, 2019 12:24

kimikage mentioned this pull request Nov 20, 2019

Rounding errors in Normed to Float conversions #129

Closed

kimikage merged commit 4341fe5 into JuliaMath:master Nov 22, 2019

kimikage deleted the issue129 branch November 22, 2019 10:44

This was referenced Nov 27, 2019

Performance regression in Normed -> Float conversions on Julia v1.3.0 #144

Closed

[WIP/RFC] Start benchmarking #69

Closed

kimikage mentioned this pull request Jan 12, 2020

Float64->UInt8 unexpectedly get affected in v0.7.0 #164

Closed

johnnychen94 mentioned this pull request Mar 14, 2020

Changes equality to approximate equality in test set JuliaImages/HistogramThresholding.jl#31

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve accuracy of conversions to floating-point numbers (#129) #138

Improve accuracy of conversions to floating-point numbers (#129) #138

kimikage commented Nov 17, 2019

kimikage commented Nov 17, 2019 •

edited

Loading

timholy left a comment

timholy Nov 19, 2019

kimikage Nov 19, 2019 •

edited

Loading

kimikage Nov 20, 2019 •

edited

Loading

timholy Nov 20, 2019

timholy Nov 20, 2019

kimikage Nov 21, 2019 •

edited

Loading

timholy Nov 21, 2019

kimikage Nov 21, 2019

timholy Nov 21, 2019

kimikage Nov 22, 2019 •

edited

Loading

Improve accuracy of conversions to floating-point numbers (#129) #138

Improve accuracy of conversions to floating-point numbers (#129) #138

Conversation

kimikage commented Nov 17, 2019

kimikage commented Nov 17, 2019 • edited Loading

Benchmark

Known Issues for Users

Rounding error

Slowdown

Slowdown with ReinterpretArray

Conversions to BigFloat

timholy left a comment

Choose a reason for hiding this comment

timholy Nov 19, 2019

Choose a reason for hiding this comment

kimikage Nov 19, 2019 • edited Loading

Choose a reason for hiding this comment

kimikage Nov 20, 2019 • edited Loading

Choose a reason for hiding this comment

timholy Nov 20, 2019

Choose a reason for hiding this comment

timholy Nov 20, 2019

Choose a reason for hiding this comment

kimikage Nov 21, 2019 • edited Loading

Choose a reason for hiding this comment

timholy Nov 21, 2019

Choose a reason for hiding this comment

kimikage Nov 21, 2019

Choose a reason for hiding this comment

timholy Nov 21, 2019

Choose a reason for hiding this comment

kimikage Nov 22, 2019 • edited Loading

Choose a reason for hiding this comment

kimikage commented Nov 17, 2019 •

edited

Loading

Slowdown with `ReinterpretArray`

Conversions to `BigFloat`

kimikage Nov 19, 2019 •

edited

Loading

kimikage Nov 20, 2019 •

edited

Loading

kimikage Nov 21, 2019 •

edited

Loading

kimikage Nov 22, 2019 •

edited

Loading