Implement batched query support #6161

InBetweenNames · 2022-10-04T12:05:11Z

Only Dense queries are accelerated currently
Certain features are awaiting on GATs/generic const expressions
"Alignment-oblivious" architecture designed for best-in-class performance on modern (>2009) microarchitectures for simplicity

Objective

Fixes #1990

Solution

Implement a batch-oriented view of the Bevy ECS suitable for SIMD acceleration, providing overalignment of table columns to ensure efficient aligned access where possible.

Changelog

Queries can be executed in batches of N using the for_each_batched/for_each_mut_batched interfaces
Added a `WorldQueryBatch`` trait to support batched queries
Performance related changes for batches:
- Added a MAX_SIMD_ALIGNMENT constant that table columns are aligned to (at minimum) (cache related)
- BlobVecs are now aligned to MAX_SIMD_ALIGNMENT to support batching use cases and better performance (aligned loads)
- SimdAlignedVec type was added to provide MAX_SIMD_ALIGNMENT on the inner buffer, providing Vec-like semantics (aligned loads)
Tests have been added for the new functionality
Docs too (with runnable examples)
Benchmarks provided directly in the Bevy tree (requires AVX to run! Add -Ctarget-feature=+avx to your RUSTFLAGS on x86)

A detailed discussion can be found in #1990 , including benchmarks justifying why batched queries are worth implementing.
Older benchmarks can be found here: https://github.com/inbetweennames/bevy-simd-bench

Migration Guide

No breaking changes.

Fixes #6161

alice-i-cecile

First round of feedback! Mostly docs related.

In addition to my nits, I think this deserves a full example (ideally with easy mode-toggling for self-benchmarking) in the root level examples/bevy_ecs. This is a cool feature, but it's going to be tricky to discover, understand and use for intermediate users.

crates/bevy_ecs/src/change_detection.rs

crates/bevy_ptr/src/batch.rs

crates/bevy_ecs/src/query/fetch.rs

alice-i-cecile · 2022-10-04T12:23:48Z

crates/bevy_ecs/src/query/state.rs

@@ -778,6 +811,157 @@ impl<Q: WorldQuery, F: ReadOnlyWorldQuery> QueryState<Q, F> {
            );
        }
    }
+    /// This is a batched version of `for_each_mut` that accepts a batch size, `N`, and a desired alignment for the batches `ALIGN`.
+    /// Runs `f` and `f_batch` on each query result for the given [`World`].  This can be used to improve performance of your queries using SIMD.


These docs need to explain:

what is alignment

what is SIMD

when this might improve (or worsen) performance

Would it be okay if I just wrote a short blurb about these things, referring to e.g. Wikipedia or maybe some Intel blog posts for a more in-depth treatment?

Yep, that's totally fine. I just want this to be a bit less impenetrable to users who are new to low-level programming :)

Hey @alice-i-cecile , I added a short little introduction to Query::for_each_mut_batched covering these concepts, would you mind giving it a read when you get a chance? I don't want to go overboard with info but discoverability is a good thing. Thank you!

crates/bevy_ecs/src/query/state.rs

crates/bevy_ecs/src/storage/aligned_vec.rs

InBetweenNames · 2022-10-04T23:33:05Z

Wow, you're fast!! I will see about getting these finished in the coming days, thanks for your patience :)

InBetweenNames · 2022-10-11T02:00:47Z

Happy Thanksgiving!

Hah, I'm glad I wrote tests. Found a few soundness issues thanks to the Miri CI we have here. You bet I'll be running that locally now too :) fortunately the issues weren't fundamental.

Currently working on the remaining documentation related problems now that the soundness issues have been resolved.

There's a few ways of using the API and the current example only shows the more "advanced" way with repr(transparent). It's now possible to more easily use batches with a non-"transparent" API using the map and as_array functions of the AlignedBatch trait. The tradeoff is that it seems to optimize worse -- the Rust docs seem to indicate that the performance situation is a little finicky with array maps. In a quick test I did, I saw loads and stores that could have been easily optimized out appearing in the assembly. The docs recommend using the iter() API instead -- maybe I'll play around with that later this week after I clean up the docs a bit more.

BoxyUwU

I havent reviewed all of this, I don't think we should merge this with the current changes to fetch.rs but I also think there is not much we can do until feature(generic_associated_types) is stabilized (other than removing the genericness over ALIGN and having 3 separate fns or just having one). next rust release should have stable gats which is in ~20 days so I don't think this is too much of an ask to wait (and its extraordinarily unlikely imo that this PR actually gets merged in such a short timeframe anyway)

crates/bevy_ecs/src/storage/blob_vec.rs

BoxyUwU · 2022-10-12T04:52:39Z

crates/bevy_ecs/src/storage/blob_vec.rs

+
+    //Note: NEEDED for batching.  This is the layout of the array itself, not the layout of its elements.
+    let array_layout = array_layout.align_to(batch::MAX_SIMD_ALIGNMENT).unwrap();


This change seems pretty sus to me, this is no longer giving the layout of an array and is also no longer a copy of the layout.rs fn from std. docs and fn name ought to be changed to explain what this fn does and making it super clear that this can return a layout with stride!=size which is pretty unusual in rust, or alternatively (and my preference) call .pad_to_align() so we arent doing anything "weird".

edit: apparently there is no array_layout in std..? I am wildly confused as to why this doc comment says its taken from std when I cannot see this code there. regardless point still remains that this is no longer creating the layout of an array and needs to be updated

Sorry I can see where I wasn't clear with this. stride == size definitely still holds -- this statement just aligns the start of the array, if that makes sense (the chunk of memory for the array itself is aligned to MAX_SIMD_ALIGNMENT). The element layout remains exactly the same as it was before. This was needed to guarantee that a batch starting at index 0 would be aligned.

For interested readers, typically SIMD code operates in three "phases", a prologue, a "vector region", and an epilogue.
The prologue and epilogue are just "scalar code paths", i.e., your typical for loops. The prologue's purpose is to process elements up until it is possible to execute vector instructions, which generally require a certain alignment to work. However, the vector region can only execute in batches of size N, which means that if there are any remaining items to do that can't fit in a batch of size N, they have to be processed serially.

As an example, consider an array of [f32; 128], which is aligned to a boundary of 4 (say, address 100). Suppose you had a special vector instruction set that could operate on batches of 8 f32s at a time, the catch being, the loads have to be aligned to 32 as well (sizeof f32 being 4). Then, because of the alignment of the array, you would only be able to batch from index 7 onwards (since index 7 would be at address 128). Then you would of course need a prologue to handle that "slow path". However, if the array started at address 96, you could see that no prologue would be necessary.

This line here eliminates the prologue case from Bevy by guaranteeing that batch 0 is aligned to the correct SIMD alignment in all cases, because MAX_SIMD_ALIGNMENT is the greatest alignment required of any instruction on the given system (including vector instructions). Does this make sense? Thank you!

stride == size does not hold for the layout returned by this fn, align(MAX_SIMD_ALIGNMENT) size(2) for example isnt stride == size but could be potentially returned from this fn. if you look at the docs for Layout::align_to you can see that it says that calling align_to(64) on a 32 size layout doesnt give you 64size

Yes that's exactly what I was after -- this array_layout is what is ultimately used to allocate memory for the entire array after the stride has already been taken into account. It is derived from the item_layout using the repeat_layout function. The flow (as I understand it) works as follows:

1 - The item_layout of T is passed in via the BlobVec constructor and held. (stride matters here, as you rightly point out!)
2 - The array_layout is generated from repeat_layout(layout, n) (stride is taken into account by repeat_layout -- this gives us a Layout that can hold N item_layouts [size == stride] and aligned to alignment of item_layout)
3 - The "SIMD aligned" array_layout is obtained by taking that array_layout and then calling align_to on it with MAX_SIMD_ALIGNMENT (stride has already been taken into account -- we are simply telling the allocator "take this Layout and ensure that the alignment requirement is MAX_SIMD_ALIGNMENT. I.e. this is now a Layout that can hold N item_layouts [size == stride] except the the chunk of memory will now be aligned to MAX_SIMD_ALIGNMENT)
4 - It is ultimately (3) that is passed into the allocator

Note that the item_layout is left pristine through all of this to ensure we don't add any extra padding between elements. It is strictly just the array_layout that gets align_to(MAX_SIMD_ALIGNMENT).
The actual elements of the array are still laid out like repeat_layout would do it.

This is the best of my understanding of that at the moment. If the code called align_to on the item_layout itself, then stride != size for sure. But since it's being called on array_layout after repeat_layout has already done its job, the stride should already have been taken into account. Is my understanding right here?

(Apologies for the ASCII art, I'm a visual thinker :D)

1 - item_layout provided to constructor 2 - align(item_layout): | item_layout item_layout item_layout item_layout ... (N times) | (repeat_layout) 3 - align(MAX_SIMD_ALIGNMENT): | item_layout item_layout item_layout item_layout ... (N times) | (array_layout)

A rust playground link illustrating it a little further: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=099b1168135ed7ed189dd3bad76477c1

BoxyUwU · 2022-10-12T05:23:13Z

crates/bevy_ptr/src/batch.rs

+            //A "batch" of ZSTs is itself a ZST.
+            //This sounds dangerous, but won't cause harm as nothing
+            //will actually access anything "in the slice".
+            //ZSTs are trivially well aligned, so there is nothing to be checked.


ZSTs are not trivially well aligned they are the same as every other type and should also have a debug check here. For () align is 1 so every non null pointer is aligned which is pretty trivialy, there are however types that have size 0 and align >=2

align_offset also says that it might be implemented to unconditionally return usize::MAX, and it is only to be used for perf not correctness so we also shouldn't use it here.

Definitely an oversight on my part, and I agree this needs to be done better. Thank you!

Yeah the comment about ZSTs being trivially well aligned is wrong. I've removed that. Fortunately it didn't have much to do with the rest of the code :)

Do you have any suggestions for how to check that the pointer is aligned idiomatically? This StackOverflow seems to suggest that align_offset is the way to go until pointer_is_aligned is stable. I've rewritten this as a pointer to integer cast modulo ALIGN (with a TODO to change this when pointer_is_aligned is stable), but I'm not 100% convinced this is the best way to go either.

The latest commit I've pushed up has these changes.

BoxyUwU · 2022-10-12T05:33:37Z

crates/bevy_ptr/src/lib.rs

+        if core::mem::size_of::<T>() > 0 {
+            //NOTE: ZSTs may cause this "slice" to point into nothingness.
+            //This sounds dangerous, but won't cause harm as nothing
+            //will actually access anything "in the slice"
+            debug_assert!(ptr.align_offset(batch::MAX_SIMD_ALIGNMENT) == 0);
+        }


same here, we shouldn't use align_offset or skip checking align for ZSTs

(See comment above)

crates/bevy_ecs/src/query/fetch.rs

InBetweenNames · 2022-10-12T23:16:33Z

Thanks very much for your comments @BoxyUwU -- this is a learning experience in Rust for me and I'll work at addressing yours (and the rest of @alice-i-cecile 's ) comments over the next few weeks. I think delaying until GATs are stable makes sense and I can actually start readying the code for that in the meantime. Thank you for your patience :)

InBetweenNames · 2022-10-19T01:49:14Z

Next up on my list is to get one of the Bevy examples going with batched queries (in a very limited way) to demonstrate batched queries a bit better. Thanks for your comments and your patience everyone.

alice-i-cecile · 2022-10-19T23:26:23Z

Not a problem! It's been great to see this slowly progress.

james7132 · 2022-11-01T09:42:36Z

@InBetweenNames can you please mark in the PR description that this Fixes #6161 so that it gets auto-closed when this is merged? I'll give this a review soon. This has been quite the API hole compared to what Unity's ECS has that Bevy sorely needs.

InBetweenNames · 2022-11-01T22:49:11Z

@InBetweenNames can you please mark in the PR description that this Fixes #6161 so that it gets auto-closed when this is merged? I'll give this a review soon. This has been quite the API hole compared to what Unity's ECS has that Bevy sorely needs.

Not a problem! I have some more changes coming down the pipeline when I get the energy to make the API a little friendlier. With the stabilization of GATs being imminent, I'd like to clean up the code with those (not a big change), and also restructure for_each_mut_batched to work similar to the following:

    for_each_mut_batched::<4>( |((mut a, Align16), (b, Align32))| {
        
        println!("a: {:?} b: {:?}", a, b);
        
    } );

That is, the Align parameter now becomes per-component, which is way more flexible. As a bonus, I can provide a for_each_mut_batched variant that doesn't care about alignment -- with the usual caveats that would imply in terms of codegen.

For the above example, such a variant would be equivalent to:

    for_each_mut_batched::<4>( |((mut a, Unaligned), (b, Unaligned))| {
        
        println!("a: {:?} b: {:?}", a, b);
        
    } );

Simple, safe, effective :)

Can't promise any timelines for that (hopefully this month), but I am working on it in the evenings here. Bevy is in a very unique position to provide HPC primitives like this :) Down the pipeline, we can look at avoiding false-sharing in a type-safe and compile time checked manner when doing batching across multiple threads, too. That's my dream toolkit for simulations right there :)

InBetweenNames · 2022-11-07T02:39:46Z

Alright, managed to get this ported over to the new GAT way of doing things and simplified a lot in the process. However, the per-component alignment feature isn't implemented yet. I'm going to try seeing if I can use typenum to automatically compute it, which would keep things still relatively simple while still not relying on generic const expressions (which seem to have some foundational problems). I've moved the PR over to "draft" status as I don't think it should be merged before alignments can be manually set per component or they can be automatically computed. Still, it's worth reviewing the changes so far as what is committed shouldn't change drastically after those "feature complete" changes are in.

InBetweenNames · 2022-11-09T00:53:49Z

I took a little detour to try out what this would look like if generic_const_exprs were stable, and it's actually really nice:

q.for_each_mut_batched::<4>(
                |(mut position, velocity)| { 
                    //Scalar path
                },
                |(mut position, velocity)| { 
                    //Batched path
                });

Code provided here: https://github.com/inbetweennames/bevy/tree/generic_const_exprs

All alignments automatically computed, absolutely zero work needs to be done by the user to do anything with alignments, and the autovectorizer loves it. The type is very simple to understand too: AlignedBatch<T, N> automatically has the correct batch alignment computed using the gcd! Literally all the user needs to do is select the batch size N, provide a scalar and vector path, and the rest is taken care of for you. It really is that nice.

(One can dream!)

Obviously I can't propose that for merging, but it does serve as an important design validation point that you really can make some trivial changes to the code once it is possible to evaluate const expressions in generics and just have this for free. I suppose this could be hacked in with typenum too, maybe... might be worth a shot.

InBetweenNames · 2022-11-11T01:57:55Z

Bad news :( ... even with the typenum hackery, it seems this can't be done on stable as it's not possible to get the size of a type T as a typenum without using generic_const_exprs. Therefore, it seems the best we can do (for now, on stable) is implement the per-component manual alignments as described previously. That's okay, because the "unaligned" variant can then be promoted to "alignments automatically derived" without code changes once generic_const_exprs are stable, and the manual aligned version can easily migrated off. I'll go ahead and start working on that. The API is still very usable this way :)

InBetweenNames · 2023-01-16T23:35:29Z

@alice-i-cecile @james7132 @BoxyUwU @TheRawMeatball I think this is now fully ready for review!

For example usage, see this benchmark 😄 -- it has been included in the docs too.

alice-i-cecile · 2023-01-16T23:41:52Z

Awesome :D I'll aim to do a full review by the end of next Monday: there's a ton of ECS work in the air right now.

InBetweenNames · 2023-01-16T23:44:48Z

Sounds good, no rush! :)

EDIT: Hmm weird, I guess I don't know the review request interface that well... didn't mean to remove the requests for reviews.

PR bevyengine#6161 Issue bevyengine#1990 * Only Dense queries are accelerated currently * Code refactored to use GATs * Batch type does not encode alignment as per discussion * Simplified calling for_each_{mut_}batched (no longer need _ arguments) * Documentation about SIMD and batching * Added an example and an AVX-specific benchmark

InBetweenNames · 2023-02-08T01:10:35Z

No changes in the latest commit; just rebased. I see some new verbiage in main about "BatchedStrategies" for parallel queries. That's cool -- just wondering if maybe a different name should be given to this different concept introduced in this PR to avoid confusion. While similar, the intent is different: the batching introduced in this PR is about processing batches of N components at a time using SIMD, whereas parallel batch sizes appear to be more about managing overhead and synchronization costs when executing a query across multiple threads. I'm open to suggestions here :)

InBetweenNames · 2023-02-19T01:25:44Z

When #6773 is merged, I'll port this over to using the new style of iteration.

InBetweenNames · 2023-02-20T21:23:59Z

FYI: did a little reading into the naming convention behind this concept. EnTT refers to it as "chunked iteration": skypjack/entt#462

How do we feel about using the name "chunk" instead of "batch" for this PR, to set it apart from other uses of "batch" in Bevy? Is there a better name we could use instead?

alice-i-cecile · 2023-02-28T13:22:33Z

[7:27 AM] james7132: On closer inspection, I don't think the batched queries is going to be ready for 0.11 (or even 0.13 at this rate), it's reliance on unstable rust features is probably a non-starter for it.

From @james7132 on Discord. If you can find a way to make it work without this, or convince us that the assessment is wrong / it's worth making an exception, let us know.

Similarly, if you need us to make changes to the internals to allow 3rd party crates that need unstable Rust features to implement this externally, we're also generally very open to that!

InBetweenNames · 2023-02-28T13:53:25Z

Hey Alice, this was an old comment that was addressed already. This code does not rely on any unstable rust features. Thank you!

…

On Tue, Feb 28, 2023, 8:22 a.m. Alice Cecile ***@***.***> wrote: [7:27 AM] james7132: On closer inspection, I don't think the batched queries is going to be ready for 0.11 (or even 0.13 at this rate), it's reliance on unstable rust features is probably a non-starter for it. From @james7132 <https://github.com/james7132> on Discord <https://discord.com/channels/691052431525675048/749335865876021248/1080103960943267911>. If you can find a way to make it work without this, or convince us that the assessment is wrong / it's worth making an exception, let us know. Similarly, if you need us to make changes to the internals to allow 3rd party crates that need unstable Rust features to implement this externally, we're also generally very open to that! — Reply to this email directly, view it on GitHub <#6161 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3VLDEEHERTHWTCNL26MULWZX32LANCNFSM6AAAAAAQ4QFHE4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

james7132 · 2023-03-01T10:02:18Z

Sorry about that, must have been reading an old version of the PR. Really shouldn't be doing this stuff at 4AM, haha.

Quick pass seems to suggest we can aim for a 0.11 timeline for this. I really want something like this to help round out the last bits of CPU performance we may be lacking in the ECS.

workingjubilee · 2023-03-07T10:49:18Z

benches/benches/bevy_ecs/iteration/batched_compute.rs

+        //# SAFETY: Vec3 is repr(C) for x86_64
+        let mx0 = _mm_loadu_ps((aos_inner as *const Vec3 as *const f32).offset(0));
+        let mx1 = _mm_loadu_ps((aos_inner as *const Vec3 as *const f32).offset(4));
+        let mx2 = _mm_loadu_ps((aos_inner as *const Vec3 as *const f32).offset(8));
+        let mx3 = _mm_loadu_ps((aos_inner as *const Vec3 as *const f32).offset(12));
+        let mx4 = _mm_loadu_ps((aos_inner as *const Vec3 as *const f32).offset(16));
+        let mx5 = _mm_loadu_ps((aos_inner as *const Vec3 as *const f32).offset(20));


If this is glam::Vec3, this is unsound and repr(C) is not enough. Vec3, as opposed to Vec3A, has padding bytes. That is, its layout is:

f32 f32 f32 ?

init init init poison

so this is filling the 4th lane with data that should not be read. Vec3A is defined as __m128 already. You must use a full 16-byte type with zero padding, consistently.

Thanks for the review! Indeed, you're right: the C (and C++) standards don't provide any guarantee against arbitrary padding being added between elements or at the end of a struct, even in the case it wouldn't be necessary to guarantee stride alignment in sequential array accesses. It's perfectly legitimate for a conforming C compiler to lay glam::Vec3 out in exactly the way you have suggested, including using a trap representation for the padding at the end of the struct. Even if glam::Vec3 were defined in terms of a single array [f32; 3], you still can't escape padding at the end as you have indicated, unless you made it repr(transparent) anyway.

In C++, the way we'd normally deal with this is with a static assert for sizeof(Vec3) == 3*sizeof(float). It's a hack, but it works to avoid this problem. Indeed, on all major x86 ABIs, this holds. Since static asserts don't really exist in Rust, I'm not sure how to accomplish something equivalent there in this example.

I'd like to mention though, that real x86 C (and C++) compilers would never, ever add padding for the sake of adding it. Padding is something done on x86 to ensure alignment, and the alignment of Vec3 is not required to be anything higher than the alignment of f32. It could be, and that would be legal in a theoretic sense for C, but no implementation out there -- on x86 -- would ever do it. For one thing, that would contradict the psABI and break compatibility with other code on x86 systems. However yes, one could construct a toy C compiler that would do just that, say to anticipate vectorization later on -- although I expect such a transform would be a pessimization a lot more often than not.

So, laying out Vec3 with 12 bytes of space and aligning to the same alignment of f32 meets the alignment requirements just fine -- and this is what real implementations -- on x86 -- do. Not that this excuses anything, mind you, because I agree: the code should be correct from a theoretic sense, or guarded by a compile-time guard to ward off the UB parts (as one would do in C++).

With all this said, as someone that isn't super in the know of the current Rust landscape in dealing with these things, what's the standard practice here? In C++ I'd stick a static assert on it and be done with it. How do we make this work in Rust?

Hmm, it was very late at night when I scanned through things here, and I both mislaid some mental math while reading this and flubbed what I meant to say. My apologies. Let me correct myself: the safety documentation here needs to explicate the actual safety precondition which is the layout match of 8 * 3 / (6 * 4) == 1 and these are unaligned loads, instead of a handwave about repr(C), as if (n * x) / (m * y) == 1 doesn't hold, then you have the possibility of reading the poisoned value.

You needn't particularly worry about toy C compilers, and I wasn't trying to language lawyer you on those. Rust picks specific C compilers to be compatible with (though it gets more vague if e.g. two compilers emit "the same" ABI but actually disagree on fine details). Thus when targeting x86_64-pc-windows-msvc, obviously it emits compatible code with the Windows x64 ABI as emitted by MSVC, etc. etc. down the line. There are problems that arise when e.g. gcc and clang disagree on the ABI.

You can generate a "static assert" in Rust using the pattern

const _: () = { let precondition = { // some code that can run in const eval }; if precondition == false { panic!("a nicely-written message") }; };

No problem and thanks for the feedback. I wrote a good amount of this PR myself during some late nights 😅 It sounds like we're converging on a solution. Would it be acceptable to document this about the expected layout of Vec3 for x86 in the example, and use that pattern to statically assert this is the case for completeness? With maybe an additional comment about how a repr(transparent) type would be preferable for code that needs to be portable?

Discussing repr(transparent) isn't needed (it's a bit of a red herring) but otherwise yes, that sounds fine. My real concern is just about someone retouching this code later without a deep and horrifying knowledge we have of the details of the x86-64 ABI, what __m128 even is, and C layout, in a way that changes the numbers and causes the the last _mm_loadu_ps to read 1 more float than actually exists.

Sorry, just for my own understanding, can you clarify your statement about repr(transparent)? My understanding is that if Vec3 were defined as a repr(transparent) struct with a single [f32; 3] member, all padding issues go away. Indeed, this is similar advice to what you'd see in the C++ space. Is there something I'm not seeing there?

Unfortunately I can't provide any input on concerns about maintainability -- I think this kind of stuff just comes with the gamedev (and simulation) territory. At least this is user provided code, not library code in Bevy, so it's not like we have ECS internals doing this kind of thing.

bas-ie · 2024-10-06T00:48:06Z

Backlog cleanup: this one has a long history of attempts to get it across the finish line! Closing for now due to inactivity, leaving in place the original tracking issue (#1990) and perhaps it could be adopted from there if still needed.

InBetweenNames force-pushed the batched-queries branch from 9190701 to 1188177 Compare October 4, 2022 12:05

alice-i-cecile added A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times labels Oct 4, 2022

alice-i-cecile requested review from james7132, TheRawMeatball and BoxyUwU and removed request for TheRawMeatball October 4, 2022 12:08

InBetweenNames force-pushed the batched-queries branch from 1188177 to 1fd4d43 Compare October 4, 2022 12:09

InBetweenNames mentioned this pull request Oct 4, 2022

Batched ECS Query #1990

Open

alice-i-cecile requested changes Oct 4, 2022

View reviewed changes

alice-i-cecile added C-Feature A new feature, making something new possible X-Controversial There is active debate or serious implications around merging this PR labels Oct 4, 2022

InBetweenNames force-pushed the batched-queries branch from b1c8717 to 7269379 Compare October 8, 2022 01:50

BoxyUwU suggested changes Oct 12, 2022

View reviewed changes

InBetweenNames force-pushed the batched-queries branch 2 times, most recently from f3f5e78 to a06c865 Compare October 19, 2022 01:20

InBetweenNames marked this pull request as draft November 6, 2022 21:30

InBetweenNames force-pushed the batched-queries branch from 45cd178 to 31ed43c Compare November 7, 2022 02:31

InBetweenNames force-pushed the batched-queries branch from 31ed43c to cc6e5ab Compare November 7, 2022 03:01

InBetweenNames force-pushed the batched-queries branch from f115546 to 35ab799 Compare January 16, 2023 23:16

InBetweenNames requested review from BoxyUwU and removed request for james7132 and TheRawMeatball January 17, 2023 00:25

james7132 self-requested a review January 17, 2023 22:29

InBetweenNames force-pushed the batched-queries branch from 35ab799 to 5025724 Compare February 8, 2023 00:55

alice-i-cecile added this to the 0.11 milestone Feb 16, 2023

InBetweenNames mentioned this pull request Feb 19, 2023

Override QueryIter::fold to port Query::for_each perf gains to select Iterator combinators #6773

Merged

alice-i-cecile removed this from the 0.11 milestone Feb 28, 2023

alice-i-cecile added the S-Blocked This cannot move forward until something else changes label Feb 28, 2023

alice-i-cecile removed the S-Blocked This cannot move forward until something else changes label Feb 28, 2023

james7132 added this to the 0.11 milestone Mar 1, 2023

workingjubilee reviewed Mar 7, 2023

View reviewed changes

alice-i-cecile removed this from the 0.11 milestone Jun 19, 2023

hymm added this to the 0.13 milestone Oct 9, 2023

alice-i-cecile removed this from the 0.13 milestone Jan 24, 2024

ItsDoot added the S-Adopt-Me The original PR author has no intent to complete this work. Pick me up! label Aug 19, 2024

bas-ie closed this Oct 6, 2024


		//Note: NEEDED for batching. This is the layout of the array itself, not the layout of its elements.
		let array_layout = array_layout.align_to(batch::MAX_SIMD_ALIGNMENT).unwrap();

Implement batched query support #6161

Implement batched query support #6161

Conversation

InBetweenNames commented Oct 4, 2022 • edited Loading

Objective

Solution

Changelog

Migration Guide

alice-i-cecile left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

InBetweenNames commented Oct 4, 2022

InBetweenNames commented Oct 11, 2022

BoxyUwU left a comment

Choose a reason for hiding this comment

BoxyUwU Oct 12, 2022 • edited Loading

Choose a reason for hiding this comment

InBetweenNames Oct 15, 2022 • edited Loading

Choose a reason for hiding this comment

BoxyUwU Oct 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

InBetweenNames commented Oct 12, 2022

InBetweenNames commented Oct 19, 2022

alice-i-cecile commented Oct 19, 2022

james7132 commented Nov 1, 2022 • edited Loading

InBetweenNames commented Nov 1, 2022

InBetweenNames commented Nov 7, 2022

InBetweenNames commented Nov 9, 2022 • edited Loading

InBetweenNames commented Nov 11, 2022

InBetweenNames commented Jan 16, 2023 • edited Loading

alice-i-cecile commented Jan 16, 2023

InBetweenNames commented Jan 16, 2023 • edited Loading

InBetweenNames commented Feb 8, 2023

InBetweenNames commented Feb 19, 2023

InBetweenNames commented Feb 20, 2023

alice-i-cecile commented Feb 28, 2023

InBetweenNames commented Feb 28, 2023 via email

james7132 commented Mar 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

workingjubilee Mar 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

workingjubilee Mar 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bas-ie commented Oct 6, 2024

InBetweenNames commented Oct 4, 2022 •

edited

Loading

BoxyUwU Oct 12, 2022 •

edited

Loading

InBetweenNames Oct 15, 2022 •

edited

Loading

BoxyUwU Oct 15, 2022 •

edited

Loading

james7132 commented Nov 1, 2022 •

edited

Loading

InBetweenNames commented Nov 9, 2022 •

edited

Loading

InBetweenNames commented Jan 16, 2023 •

edited

Loading

InBetweenNames commented Jan 16, 2023 •

edited

Loading

james7132 commented Mar 1, 2023 •

edited

Loading

workingjubilee Mar 8, 2023 •

edited

Loading

workingjubilee Mar 8, 2023 •

edited

Loading