-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MulAdd and MulAddAssign #387
Comments
|
|
I've been thinking for a while that Rust should have a function exactly like We could use LLVM's naming but Rust already uses the |
I fully agree that something like As far as naming is concerned, I would advise finding a naming convention that can also work for #235 (reductions). As from both of these issues, I get the same general feeling of needing to manage a tradeoff between floating-point output reproducibility and efficient hardware implementations, where the perspectives of people that priorize each concern seem so irreconcilable that providing two APIs that each priorize one of the concerns sounds like the pragmatic choice. In both cases, it seems to me that providing a "relaxed" variant of the operation that does the computation as efficiently as possible for the target architecture, at the expense of providing non-reproducible output across targets, would be desirable in addition to the existing "reproducible" variant. Of course, we may also want to improve the performance/precision compromise of the "reproducible" variant before stabilizing it, as discussed in #235. In the C/++ world, the tradition is to call "relaxed" operations "fast" operations. This would give us names like Alternatively, we could use a different adjective that is not familiar to C/++ devs but tries to express the underlying design compromise better, e.g. |
I didn't realize at first that |
Be a little careful of |
note that |
Java has |
Indeed, if LLVM sees an "unconditional" FMA instruction and cannot prove that the target always has hardware FMA, then it 1/scalarizes everything and 2/introduces a layer of libm call indirection. |
…workingjubilee Add simd_relaxed_fma intrinsic Adds compiler support for rust-lang/portable-simd#387 (comment) r? `@workingjubilee` cc `@RalfJung` is this kind of nondeterminism a problem for miri/opsem?
Rollup merge of rust-lang#133395 - calebzulawski:simd_relaxed_fma, r=workingjubilee Add simd_relaxed_fma intrinsic Adds compiler support for rust-lang/portable-simd#387 (comment) r? `@workingjubilee` cc `@RalfJung` is this kind of nondeterminism a problem for miri/opsem?
…bilee Add simd_relaxed_fma intrinsic Adds compiler support for rust-lang/portable-simd#387 (comment) r? `@workingjubilee` cc `@RalfJung` is this kind of nondeterminism a problem for miri/opsem?
…bilee Add simd_relaxed_fma intrinsic Adds compiler support for rust-lang/portable-simd#387 (comment) r? `@workingjubilee` cc `@RalfJung` is this kind of nondeterminism a problem for miri/opsem?
AVX2 and ARM NEON have fused-multiply-and-add instructions, so it would be useful to be able to explicitly emit them with implementations of MulAdd and MulAddAssign. This is the basis of peak FLOP/s figures of merit, so it will likely improve performance on matrix multiplication benchmarks.
The text was updated successfully, but these errors were encountered: