-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixed Float16 from Float64 and BigFloat #42837
fixed Float16 from Float64 and BigFloat #42837
Conversation
Testing these conversions are really hard because the way you generally test math is computing in higher precision and checking if the result is the same. I should probably add tests for a few likely problematic values like |
Maybe one test for previous float, not sure if it would add coverage or just add clutter. |
Tests added. This still has a bug for some values less than |
base/mpfr.jl
Outdated
function Float16(x::BigFloat) :: Float16 | ||
res = Float32(x) | ||
resi = reinterpret(UInt32, res) | ||
if (resi&0x7fffffff) < 0x38800000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this starts to need some comments now. It's a bit of magic number soup now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. that's just if Float16(res)
is subnormal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this is more efficient than issubnormal(Float16(res))
? Does it necessarily have to be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. this is only 2 instructions while converting to float16 is relatively expensive.
Should we merge this as is? This is already a major improvement over current status and I'm not 100% sure how to fix the last remaining case the edge between |
Do you mean all of |
yeah. Fixed. |
The only test error remaining appears to be openblas threading issue. |
last remaining bug fixed! from my perspective, this is ready to merge. |
How is it ensured that these tests both test the C version and the Julia version? |
the c version is float64. the Julia version is bigfloat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I helped co-author this, so I will approve here too.
Any objections to merging this? |
Needs a manual backport towards 1.7. |
What branch should I target the PR to? |
|
backported: #43092 is the backport |
* fixed Float16 from Float64 and BigFloat. Many thanks to Jamison.
* fixed Float16 from Float64 and BigFloat. Many thanks to Jamison.
Also needs a manual backport towards |
previously
Float16(::Float64)
andFloat16(::BigFloat)
had double rounding issues.