-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hypotenuse function for f32, f64 #15
base: main
Are you sure you want to change the base?
Conversation
Yup will give this a look over and test tomorrow 👍 |
hey, out of curiosity, why not use libm for the non-nightly no_std math implementations? also, a bit out of scope for the PR, but I changed the fast math implementation to use the sqrt intrinsic. |
Mostly to control the behaviour of the API and to keep the core lib zero-dependency and easy to audit.
This change doesn't make sense, the intrinsic is already stable as |
Btw, I have approved CI runs for you :D That should let you check ARM & NEON works, I will test the AVX512 stuff, but haven't got around to doing it yet, sorry! |
Ah, I didn't realize any of the compiler intrinsics had a dependency on the standard library. I reverted it, I'm just using direct calculation with no_std for now, I'll circle back around once I get the safe api implemented |
a couple of things I wanted to get some feedback on:
I think I might have figured out how to fix the 1 ulp difference.I need to change scaling to only be applied if values fall outside a certain range. |
so, with the 1 ulp difference, it turned out to not to be a problem with the scaling. it's how the floating point error is being corrected for. haven't had much success porting the error correction process mentioned in the paper, but in either case I think being within 1 ulp is probably an acceptable level of accuracy in most cases. I can maybe try to fix it in a later PR. Let me know if there is something I should correct, I think the PR is ready for review, though clippy CI check will likely throw an error due to the lifetime on memory loader, and a few unused type parameters in core_routine_boilerplate (both unrelated to changes from the pr) |
/// and a point (`x`, `y`) on the Euclidean plane. | ||
unsafe fn hypot(a: Self::Register, b: Self::Register) -> Self::Register; | ||
#[inline(always)] | ||
unsafe fn hypot_dense( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, please add docs for this function as well, and just add a note under # safety
that it depends on the safety requirements of SimdRegister<T>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just add a note under # safety that it depends on the safety requirements of SimdRegister
You mean for the docs of op_hypot
correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, technically it depends on both but I guess if you satisfy op_hypot
you satisfy the register requirements anyway.
cfavml/src/danger/op_hypot.rs
Outdated
use crate::mem_loader::{IntoMemLoader, MemLoader}; | ||
|
||
#[inline(always)] | ||
/// A generic vector hypot over one vector and single value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? Technically mem loaders will accept:
single & single
single & vec
vec & single
vec & vec
result = [0; dims] | ||
|
||
for i in range(dims): | ||
result[i] = a[i].hypot(b[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be useful to have this be a bit more in-depth of what hypot
is actually doing as just calling the function doesn't give much of an insight into the actual math and operation ordering going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this be better?
result = [0; dims]
// assume |a|<|b| for all a,b
for i in range(dims):
let hi = a[i].abs()
let lo = b[i].abs()
let scale = 1/hi
result[i] = hi * sqrt(lo*scale + 1)
return result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I'm tempted to have it be abs(a[i])
just so it is consistent, other than that looks good, much clearer.
cfavml/src/safe_trait_numeric_ops.rs
Outdated
use crate::mem_loader::{IntoMemLoader, MemLoader}; | ||
|
||
/// Various arithmetic operations over vectors. | ||
pub trait NumericOps: Sized + Copy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little bit of bike shedding; If this is called NumericOps
, does this open us up to a bit of pain if later we want to add a function that falls under this category, but for i32, i8, etc...
Since here NumericOps
is probably a bit more like FloatOps
or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries on the bike shedding.
What functions would fall into this category that would be valid for integers? I can change the name of the trait/file to hypot, but I imagine there would be other float specific ops that would be useful to add at some point.
What naming scheme do you think we should go with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, I suspect that, we are more likely to have Float-only ops rather than int-only ops most of the time for this lib, A trait for just the hypot safe op is probably going to become a bit verbose pretty quickly, we could put it under MiscFloatOps
as a bit of a catchall and then come up with a better fitting name if we collect a few more functions under this category/application.
It looks pretty great overall. The comments on the arithmetic ops themselves on the SIMD were very useful to digest and understand. There are a few notes around code style and a little bit of naming debate but nothing insane. I think one thing missing is a dedicated high level function like |
So, I've run into a bit of a roadblock. It looks like for the macos test, the test are specifically failing for subnormal values for the neon implementation. The difference seems to be pretty significant (the exponents don't match), and I've hit the limit of what I can figure out without being able to directly test on hardware. |
I'll try have a look tomorrow and see if I can spot what's up with it |
Shoot I forgot about this, I'll try and look at this when I get the chance, sorry! |
Oh thank god. Who in their right mind would think the natural way to order the args for a fused multiply add function would be add first? |
AH you ran into that issue 😅 Yeah I ran into the same thing when first doing the Neon stuff, the ordering of the accumulators is a different ordering which is... Unhelpful. |
I created a Hypot trait with a single function. It's been tested for avx2 and avx2fma, I don't have the hardware to test the avx512 implementation. I noticed a small diff from std hypot for some normal number pairs for f64 (
hypot(0.1e-38,1e-38)
had a diff arounde-54
) .I wasn't sure whether to add hypot to automath or keep it separate. it's currently separate. Neon might take a little longer
let me know if I need to change anything. Also can you test the avx512 implementation?
EDIT: a bit more information on the difference from std hypot. I've found the difference is easy to trigger, if you provide two numbers for f64 that are normal but either are really small or large (kicks in around 1e-24 and 1e25) and exactly one order of magnitude apart (ex:
1e-24
and1e-25
), the result will differ from std hypot by exactly 1 ulp. I should probably add a test to ensure that results stay within 1 ulp of the std function.This isn't really a rigorous analysis or anything, just eyeball testing. It might vary for other inputs