-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exp
and log
are much slower than they should be.
#135
Comments
Yes .. the Double32 type was an early artifact of hoping to redouble Double64s to compute Double128 accurately. At the time, Julia did not provide the mechanism for that sort involutve constructor to just work [or to work at all]. A next large reorganization of DoubleFloats just drops Double32, and just as you say .. recommends the use of Float64 has the doubled Float32 precision. I had not worked with |
There are at least a few approaches to implementing |
How much error are you willing to accept for an |
If you sample deeply through all of the arithmetic and each elementary function, and then do the same with other implementations of double-double, we get at least as many and often more accurate bits in our results. |
How would you feel about 4 ulps (or so) in the low part of the output? I think that will be roughly the threshold that is achievable without significant usage of triple compensated arithmetic. |
I need to find the version of that number I had decided upon some time ago. I do recall there being some threshold of cooperation throughout the functional realizations. OTOH, rather than go full triple-double inside |
How does this look? It's still missing the overflow/underflow checks, but I'm seeing this as benching roughly 10x faster than the current implementation, and I think the error should be acceptable (I haven't fully tested it though)
Edit: This still has a bug somewhere that is causing it to not give nearly as many sig figs as it should. |
It looks good, much more useful than was available an hour ago! |
I just updated the code. I'm seeing about half the error of the current implementation with 10x speed improvement. Now I just need to make a PR and add back the overflow/underflow checks. |
super good, smiling bits
…On Thu, Jan 20, 2022 at 7:27 PM Oscar Smith ***@***.***> wrote:
I just updated the code. I'm seeing about half the error of the current
implementation with 10x speed improvement. Now I just need to make a PR and
add back the overflow/underflow checks.
—
Reply to this email directly, view it on GitHub
<#135 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAM2VRVYFUUC4JUCHYDVUODUXCR7NANCNFSM5ML6RYPQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you commented.Message ID:
***@***.***>
|
let me know when you want to merge something |
Can you merge https://github.com/oscardssmith/DoubleFloats.jl/tree/oscardssmith-faster-square? it's a smaller change, but should be ready to merge. this PR needs a small amount of extra work to avoid an accuracy regression (unless you are OK with a minor accuracy regression). |
merged the faster-square |
I just ran some rand * scale vectors of Double64 and Float128 through @oscardssmith is there some corner of the domain that you found troublesome, or did merging your earlier improvement work quite well? |
what do you mean? |
The title on this issue seems outdated. Before changing it, I am confirming with you that the performance of exp and log have improved over these past months. It appears they have. |
how have they changed? the source likes identical |
Limited benchmarking does not report slow performance relative to Float128, I used 1024 rands, multiplied them by 2, subtracted 1 to have both positive and negative values.
|
yeah, it's not slow compared to float128, but that still is roughly 100x slower than Float64. |
Any progress since then? |
no. I haven't had much time to play with this. I still think that ~5-10x faster than the current algorithm should be achievable. but it is one where you need to be careful to get all the errors to line up properly. it would help to know how many ulps is considered acceptable. I would be inclined to go for relatively loose accuracy requirements since if the last couple bits matter, you are likely better off with multifloats float64x3. |
Thanks, I was not aware of the package multifloats before. But its webpages say it doesn't support functions like exp and log? Another question is, can multifloats be fit in GPU? Doublefloats can have impressive performance with matrix ops on GPU, though I failed to make it work for exp. |
This not an easy fix, the time taken is a result of optimizing for accuracy
over speed.
There is a less accurate implementation (as I recall) that is faster. I
could add that as
`exp_unsafe` -- let me see about tracking it down and benchmarking that.
…On Sun, Mar 17, 2024 at 10:58 PM photor ***@***.***> wrote:
no. I haven't had much time to play with this. I still think that ~5-10x
faster than the current algorithm should be achievable. but it is one where
you need to be careful to get all the errors to line up properly. it would
help to know how many ulps is considered acceptable. I would be inclined to
go for relatively loose accuracy requirements since if the last couple bits
matter, you are likely better off with multifloats float64x3.
Thanks, I was not aware of the package multifloats before. But its
webpages say it doesn't support functions like exp and log? Another
question is, can multifloats be fit in GPU? Doublefloats can have
impressive performance with matrix ops on GPU, though I failed to make it
work for exp.
—
Reply to this email directly, view it on GitHub
<#135 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAM2VRQEZ63ZL3F5NRIPFS3YYZJ7DAVCNFSM5ML6RYP2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBQGI3TQNRSGE2A>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I just spent time on this -- the current exp benchmarks 10x slower than Float64 -- no longer 100x slower.
Julia Version 1.10.2 |
What computer are you on? 35ns for Float64
|
Thanks for that. It seems that using Chairmarks for this messed up my results. Back to the mill. |
Any chance to make it work on GPU?
|
both are around 100x slower than float64.
exp
in particular should be able to be done relatively quickly sinceexp(hi+lo)=exp(hi)exp(lo)
, and sincehi
can easily be reduced to [1,2),exp(lo)
only needs 2 terms.I have fewer ideas for
log
, but there should be something better than the current behavior.also, it's worth figuring on double64 versions of these functions because the double32 versions can be done really quickly using float64.
The text was updated successfully, but these errors were encountered: