-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Critical issues before stabilization #364
Comments
w00t! Very excited to see this. :D :D :D
Is there high level docs anywhere describing how the API is laid out? Like, a guided tour of everything? I realize that's a tall order to ask for before the API is set in stone because it takes work to stabilize. But I'm asking because as of right now, I find the API to be very overwhelming. There are a lot of traits and upon seeing them, I immediately wonder how they all fit together. And in particular, why they exist. That is, if the API were less trait heavy, what would we lose? Looking at the names of the traits, I can make some guesses as to what some of them are for. And it looks plausible that they could all mostly be explained (at a high level) without too much exposition. I have opinions on the names of the traits too, but I think that fun can be saved for later. :P I find the number of concrete types to be large but not overwhelming. At a quick glance, I can immediately understand what they mean and what connects them. While I do have some light SIMD background, I'm confident that the large number of concrete types could be explained by a ~paragraph in the crate docs.
Do you have a 50,000-foot view of what the partitioning might look like? I don't mean to put you on the spot and ask for a detailed proposal, but just a general feeling of what you think might make sense.
I agree that missing swizzles is a rather large limitation. For me personally, unless there is a language feature that you're pretty confident is right at the horizon, or if there's a small language feature that you can get the lang team on board with quickly, I would go to war with the army you have when everything else is ready to stabilize. But this is just a general sense of things personally, and I'm sure there is a deeper analysis that could be done here. e.g., "If Rust had language feature Foo, then the swizzle API could be written like this which would make things simpler and/or unlock new use cases." |
that would be: if LLVM had dynamic swizzles (using the |
Nope, that's definitely something we should have. Personally, I think we could use more submodules to make the division a little clearer. Each of these could be documented independently, with a bit of guidance in the main
The original implementation had no traits. This results in two related issues. To implement
Offhand, I would say something like:
One option might be to implement a slightly less powerful set of |
I hope it will be released soon. I've tried using it in one of my crates (linebender/tiny-skia#88) and it mostly works. There are some glitches which I haven't debugged yet, but I don't think they are critical. I personally do no use |
Realistically we'd probably gain things. As it is, there are a number of APIs that have been hard to add due to the need to make them fit into the trait-heavy design. That said, my stance here (that we should not have gone with the traits as it's lead to a cumbersome and inflexible design) is not a popular one, so YMMV. |
As far as restricting the number of lanes goes, I tried combining the bound into something like:
In theory, this seemed like a good compromise between the current verbose bounds and using a non-local error. In practice, it wasn't really possible to implement this. It turns out we use the separation between the element bound ( After this experiment, I think the best option is to add two new compiler attributes like |
could we just use: impl<T: ..., const N: usize> Simd<T, N> {
const VALID: () = {
let valid = N > 0 && N < 64;
if !cfg!(all_lane_counts_iirc) {
valid &= N.is_power_of_two();
}
assert!(valid, "invalid lane count");
};
pub fn each_method(self, ...) -> ... {
Self::VALID; // use so it triggers a post-mono error
...
}
} actually, no, we can't, since just copying a |
Also, it would be easy to accidentally forget |
Any chance we'd be willing to say that we're ok allowing the type to exist for non-PoT lane counts, but then just not offering anything other than (But I guess that exposes layout questions that we don't want to decide just yet, even if we left them explicitly unspecified.)
We could maybe have a |
we already have a feature flag to enable non-pot lane counts, the main issue imo is that someone could try to use |
Also, removing the bounds from
There is precedent with |
the other issue I forgot is that imo non-pot lane counts have the wrong layout and that should change: #319 |
This one I feel strongly isn't a problem, since So long as using
I would love to have even a hacky solution to this. lcnr had some thoughts on doing something like this in https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/How.20essential.20is.20the.20compile-time.20check.20for.20empty.20arrays.3F/near/248387965. Maybe someone could pick that up to deal with the zero problem for a while? If it was easier,
I suppose that's another possibility here -- explicitly decide to have v1 of stabilizing this not allow writing generic code over them, but still let people use specific versions when implementing stuff. That seems like it'd still be pretty useful, since using |
My proposal of using |
I love the current state of this project and am grateful for the hard work being done and the general friendliness/helpfulness of the team. However, I will be a bit of a contrarian and say that swizzles are important to me, particularly dynamic ones. I've also used the APIs with the I'm not saying swizzles should stop stabilization per se, I'm a very happy nightly user. I'll also be contrary and say that it's okay to abstract more functionality for very common use cases. I think some of this is already done, but I wouldn't be sad, for example, to see a function to map bytes using dynamic swizzles. The newbie user (like me) may not realize the hardware limitations (the usual size of look-up tables, for example) of certain operations and so creating a well-designed function for a common use case is helpful to people like me. |
I agree that a satisfying story for swizzles is important. Regarding traits, I initially proposed it as a way to reduce the "multiplication of entities" problem so that it's easier to follow things. I have begun to think a different approach may be appropriate (increased experience in API design changes your opinions about API design, who would've thought?), but I believe we should probably fork further discussion about that into its own issues/threads. |
My 2c is that if Rust's primitive types cannot currently be abstracted over with traits (other than |
Personally, I don't see a way around using traits. This was one of the first issues we hashed out, I'm sure there is more discussion available in zulip. SIMD types are not just numbers and not just arrays, they act like both simultaneously. Like arrays, SIMD vectors have container-like operations that are only concerned with their memory layout and are indifferent to their element type. This includes splatting, indexing, swizzles, scatter, gather, conversions to and from arrays and slices. To undo this means that you will have 14 vector types that each implement these functions separately, and there will be no way in the future to ever treat a vector like an element-agnostic container. A comparison would be like if Using traits allowed us to treat vectors as containers (which they are) while also giving them element-specific behavior. The element-specific functions can't be implemented directly |
Can we add "documentation and review of the intrinsics' safety requirements" to the list? Currently every time I want to implement one of these in Miri I have to dig through PRs and then usually still go ask people. Intrinsics are language extensions so they should be reviewed by t-opsem before stabilization (and ideally, a draft documentation is created when the intrinsic is implemented, so that one has some starting point for "intended behavior" vs "actual behavior" and checking that they are the same). I also think the intrinsics' declarations should be moved into libcore so that they can be equipped with doc comments that live in the same repository as the codegen and Miri implementations that turn those doc comments into reality. |
Yes, I intend on doing just that. |
I opened #381 to track that |
I strongly support checking lane count post-monomorphization. It's quite a hassle to carry around bounds proving that not only |
if you want to use a function |
related conversation about |
Why not support any number of lanes, by using arrays of native SIMD vectors large as needed, and masking for operations that need it? More in detail:
|
The simple answer is that the size limitations are not related to native SIMD register sizes, but codegen backend encoding limitations. We further restrict to even smaller sizes that we are confident operate correctly and optimize well. Our design allows for increasing the maximum in the future. |
The codegen backend encoding limitations can be easily solved by having the std::simd implementation code-generate Simd<T, N> as an array of either T (with higher alignment) or a backend SIMD type and implement operations as loops that operate on parts of it with appropriate size for current SIMD ISA. Given that this transformation is very simple and that not doing it is very burdensome for the API user (due to the SupportedLaneCount bounds and potentially having to use an extra crate that does that transformation for some applications), I don't see any reason to provide an API that doesn't perform it. If the user uses a proper SIMD size, then the transformation will be a no-op and performance will be optimal; otherwise, the code will still work albeit less efficiently in some cases. |
More precisely, doing something vaguely like this:
Or something like this:
Or alternatively the compiler itself could do all this, i.e. you could fix the codegen backend to support arbitrary sizes. |
|
The codegen limitations are something like 2^16 max elements, which is large enough that I doubt it's necessary to support anything larger. Something like this would make it possible to exceed that limit but I'm not sure it's worth the added complexity. We will likely be able to lift the current limit to something much larger if we can get reliable non-power-of-two-length codegen, but that's proving problematic for now. |
e.g. terrible code LLVM generates for a |
I guess the downside of my proposal is that it allows the user to write "a + b + c" which will result in two loops, and unless the backend optimizer manages to fuse the loops, that code will be much worse than a single loop since it unnecessarily writes and reads the intermediate result to memory. So maybe another option to solve the The iterator based design could still implement arithmetic on iterators ( Ideally, the compiler would gain support for generic closures, which would allow to specify a polymorphic closure to the iterator combinator, which would allow to choose at runtime the vector size and instruction set to use. This would suggest a different change to the design though, which is to have a Simd<T, N, ISA> which would allow for example to have different types that use either AVX-2 or AVX10/256, since otherwise the polymorphic closure scheme can't differentiate between them since vector length is the same, and would also allow to use either instruction in ISA extensions or a replacement sequence. |
As always, this depends on your choice of CPU. If Rust used target-cpu=native by default... https://llvm.godbolt.org/z/xjsndxc1b I agree that we are constantly fighting with dreadful LLVM codegen, regardless of the target. |
all that does is increase the upper limit for somewhat reasonable code, if using a 4000 element vector you go back to terrible code: https://llvm.godbolt.org/z/Wf9o5jT67 |
I'm using a personal arbitrary-length wrapper layer. A couple notes
|
Considering that shuffles/swizzles with compile-time-constant lane indices have worked in since at least 2015 without substantial change to the API shape, I think it's well past time to stabilize the API shape that exists. So I think it makes sense to stabilize the |
Portable SIMD has taken a substantially different approach to swizzles than e.g. the x86 shuffle intrinsics. We really wanted to avoid a magic It should be safe to stabilize |
Why is this even using a trait? If that's just to work around making the final argument const, then I think this can be expressed much better with a However, that would still expose the underlying |
We can't expose the intrinsic directly because we do some manipulation of the const indices before passing them into the intrinsic--passing it via an associated const allows it to be used in regular rust since it isn't bypassing the usual type system. |
You should be able to do the same manipulation inside the |
I think I'm following... but do we want more const args? I think I even saw an issue somewhere encouraging const args to be removed in an edition change or something, I'm not sure where they stand right now. |
Oh I see, the problem is that the macro shouldn't call |
I think it's possible to write a macro something like the following (I have a prototype): fn some_swizzle<T, const LEN: usize, const PARAM1: usize, const PARAM2: usize>(x: Simd<T, LEN>) -> Simd<T, LEN> {
simd_swizzle!(x,
const { /* some expression using LEN, PARAM1, PARAM2 */ },
len = LEN,
const_parameters = (PARAM1, PARAM2),
)
} But is something like this any better than implementing the Swizzle trait? I'm having trouble imagining an idiomatic interface for this macro, but it takes away the need to understand some const generics type inference magic you need to understand to implement it. Notably, |
I wanted to put together a more technical list of issues to be solved before we can stabilize (see rust-lang/rust#86656).
There are many more important issues like performance on some targets, or important missing functions, but IMO those don't prevent stabilization. In this list I've trimmed down the issues to major API-related issues that can't be changed, or are difficult to change, after stabilization.
Issues to be solved
LaneCount<N>: SupportedLaneCount
bound makes the API exceptionally cumbersome. I've also had some trouble writing generics around it, particularly when making functions that change the number of lanes (the trait solver sometimes simply fails). Adding the bound often feels like boilerplate, and I found myself looking for ways to launder the bound, like adding unnecessary const parameters. Making this a post-monomorphization error (I found Design meeting: Non-local errors (aka post-monomorphization errors) lang-team#195 helpful) might be the way to go, or perhaps there's a way to make a const fn panic. Cons: the trait bound is very explicit and hiding the error states could possibly do more harm than good when identifying the source of a build failure.Simd<f32, N>
should beSimd<i32, N>
(see Change Mask element to match Simd element #322 for discussion). I think it would be much more straightforward if the types simply matched. Cons: this would require an extra cast when usingi32
masks forf32
vectors or similar, and makes implementingFrom<Mask> for Mask
impossible (since pointer masks must be genericMask<*const T, N>
, maybe all pointers could use the same mask element?).Non-issues, but things that should be done
SimdPartialEq
,SimdInt
.Updates (after filing this issue)
Lane count
LaneCount
anSimdElement
into a single boundSimd<T, N>: Supported
. This doesn't work well for a variety of reasons. One example: scatter/gather useSimd<T, N>
,Simd<*const T, N>
, andSimd<usize, N>
. Each of these would need their own bound, rather than using oneLaneCount
bound since they all shareN
.LaneCount
andSimdElement
) or turningLaneCount
into a non-local error.Swizzles
I tried the following swizzle code. It requires the incomplete
adt_const_params
feature. Even with this feature enabled, it's impossible to implement functions likereverse
, because const generics can't access the generic const parameters.The text was updated successfully, but these errors were encountered: