-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use simd_* intrinsics instead of llvm intrinsics where possible #788
Comments
We used to do that, but only for arm and x86 we needed to add ~10.000 intrinsics. This meant modifying rustc every time a new intrinsic should be added, and doing CI of the intrinsics in rustc, which would add a dozen build jobs to run the tests on less common targets, and increase CI time of rustc build jobs by quite a while (I think the largest tests here take ~20 min). It would be simpler if other backends could just either hijack the llvm intrinsics, or add their own |
I meant the simple ones which already have a |
This PR was the last one removing support for that: rust-lang/rust#57416 in case you are considering adding it again. If that happens, we could incrementally remove the |
Your list appeared unbounded, it would then maybe be helpful if you could write a complete list of operations that you want to propose. The |
Expanding on that, it implements the two operations you mention, and they currently produce optimal code generation with LLVM. For doing a |
I understand that it is practically impossible to do this for all intrinsics, but at least the following stdarch/crates/core_arch/src/simd_llvm.rs Lines 5 to 58 in 65c9fa5
|
Hijacking the llvm intrinsics is what I currently do, but every combination of |
This makes me wonder, why are you trying to support If there is a way to implement some of the |
Because everybody uses |
+1 |
Also, at least for the two intrinsics mentioned:
The specific intrinsics only work for the particular vector length, type, and one specific A generic intrinsic would need to support both f32 and f64 and all supported vector lengths for those types (4, 8, and 16 for 32-bit), for all combinations of target features available. For the LLVM backend, exposing a generic
I think that work is worth doing, and it would improve the current situation a bit, where we handle all of this in a library (e.g. by enabling some optimizations that are not possible today).
The main reason I point you to packed_simd, is because it has already solved most of the problems you are trying to solve: https://github.com/rust-lang-nursery/packed_simd/blob/master/src/api/ops/vector_float_min_max.rs#L13 The intrinsic you are requesting, already exist, and it is precisely called |
I was actually requesting it to be used in
This is already done by cg_llvm for |
Doesn't llvm do that itself? |
I understand, but until |
Ahhh! Duh, I thought you were proposing adding new generic intrinsics for all the exposed operations. If you just want to use the generic ones when they are appropriate and they do work, you can do just that, and our test suite should make sure that the exact same instruction is generated. Just keep in mind that sometimes the target-specific intrinsics might mean something different than the generic ones (e.g. for floating-point, things like the precision might be different).
No, not at all. The
|
By the way, it turns out that cg_llvm implements more https://github.com/rust-lang/rust/blob/master/src/librustc_codegen_llvm/intrinsic.rs#L1340-L1381 |
Yes, most (or all?) of those are used by |
🎉 |
I'm closing this as the issue has been "resolved". Feel free to submit PRs to change whichever parts of |
_mm_sqrt_ps
:llvm.x86.sse.sqrt.ps
->simd_fsqrt
_mm_min_ps
:llvm.x86.sse.min.ps
->simd_fmin
This makes it easier for other codegen backends to support stuff, as all variants of the same operation are nicely bundled together. (eg single and double precision sqrt)
The text was updated successfully, but these errors were encountered: