Add `EncasedBufferVec`, an higher-performance alternative to `StorageBuffer`, and make `GpuArrayBuffer` use it. #12670

pcwalton · 2024-03-23T20:37:21Z

EncasedBufferVec is like BufferVec, but it doesn't require that the type be Pod. Alternately, it's like StorageBuffer<Vec<T>>, except it doesn't allow CPU access to the data after it's been pushed. GpuArrayBuffer already doesn't allow CPU access to the data, so switching it to use EncasedBufferVec doesn't regress any functionality and offers higher performance.

Shutting off CPU access eliminates the need to copy to a scratch buffer, which results in significantly higher performance. Note that this needs teoxoy/encase#65 from @james7132 to achieve end-to-end performance benefits, because encase is rather slow at encoding data without that patch, swamping the benefits of avoiding the copy. With that patch applied, and #[inline] added to encase's derive implementation of write_into on structs, this results in a 16% overall speedup on many_cubes --no-frustum-culling.

I've verified that the generated code is now close to optimal. The only reasonable potential improvement that I see is to eliminate the zeroing in push. This requires unsafe code, however, so I'd prefer to leave that to a followup.

Here's write_batched_instance_buffer before-and-after (yellow = after, red = before) for many_cubes --no-frustum-culling:

@james7132

`StorageBuffer`, and make `GpuArrayBuffer` use it. `EncasedBufferVec` is like `BufferVec`, but it doesn't require that the type be `Pod`. Alternately, it's like `StorageBuffer<Vec<T>>`, except it doesn't allow CPU access to the data after it's been pushed. `GpuArrayBuffer` already doesn't allow CPU access to the data, so switching it to use `EncasedBufferVec` doesn't regress any functionality and offers higher performance. Shutting off CPU access eliminates the need to copy to a scratch buffer, which results in significantly higher performance. *Note that this needs teoxoy/encase#65 from @james7132 to achieve end-to-end performance benefits*, because `encase` is rather slow at encoding data without that patch, swamping the benefits of avoiding the copy. With that patch applied, and `#[inline]` added to `encase`'s `derive` implementation of `write_into` on structs, this results in a *16% overall speedup on `many_cubes --no-frustum-culling`*. I've verified that the generated code is now close to optimal. The only reasonable potential improvement that I see is to eliminate the zeroing in `push`. This requires unsafe code, however, so I'd prefer to leave that to a followup.

pcwalton · 2024-03-23T20:38:02Z

I'm not wedded to the name EncasedBufferVec. Alternative suggestions are welcome.

superdump · 2024-03-23T20:58:48Z

Nice performance improvements in conjunction with James’ encase changes. I don’t know what the name should be but I don’t think it’s quite right.

I also want us to check thoroughly that this will never produce misaligned data. I think it will be fine but I’m not certain. For a wgsl binding containing only an array the alignment of each element is supposed to be the alignment of T and if T is a struct then it is the max of the alignments of the members of the struct type. And the size of a struct type is the offset of the last member plus the size of the last member rounded up to the alignment. So I think it will be fine… There is one case where it might break - WebGL2 where 16-byte aligned sizes are always needed or something like that.

pcwalton · 2024-03-23T21:51:51Z

How about renaming BufferVec to RawBufferVec and EncasedBufferVec to BufferVec?

`BufferVec`

pcwalton · 2024-03-25T19:10:09Z

I did that renaming since nobody objected.

github-actions · 2024-03-25T19:11:33Z

It looks like your PR is a breaking change, but you didn't provide a migration guide.

Could you add some context on what users should update when this change get released in a new version of Bevy?
It will be used to help writing the migration guide for the version. Putting it after a ## Migration Guide will help it get automatically picked up by our tooling.

NthTensor · 2024-04-01T16:54:47Z

Look's great, but I'm marking this as blocked until the encase patch goes through.

Elabajaba · 2024-04-28T03:28:05Z

Encase released a new version a few days ago.

I think this just needs #12757 now?

crates/bevy_render/src/render_resource/gpu_array_buffer.rs

IceSentry

Assuming CI passes, LGTM. There's just a few issues left with the renaming I think.

I haven't done any performance testing but I trust you on that part. I've tested that everything still works as expected though.

IceSentry · 2024-05-02T02:45:46Z

crates/bevy_render/src/render_resource/buffer_vec.rs

@@ -28,7 +34,7 @@ use wgpu::BufferUsages;
 /// * [`GpuArrayBuffer`](crate::render_resource::GpuArrayBuffer)
 /// * [`BufferVec`]
 /// * [`Texture`](crate::render_resource::Texture)
-pub struct BufferVec<T: Pod> {
+pub struct RawBufferVec<T: Pod> {


I can't add a suggestion but the doc comment has a few mentions to BufferVec that should be renamed.

IceSentry · 2024-05-02T02:48:49Z

crates/bevy_render/src/render_resource/buffer_vec.rs

+        // Take a slice of the new data for `write_into` to use. This is
+        // important: it hoists the bounds check up here so that the compiler
+        // can eliminate all the bounds checks that `write_into` will emit.
+        let mut dest = &mut self.data[offset..(offset + element_size)];


Oh, nice, I never thought about doing that but it's a neat trick to know.

crates/bevy_render/src/render_resource/buffer_vec.rs

IceSentry · 2024-05-02T02:54:23Z

crates/bevy_render/src/render_resource/buffer_vec.rs

+    /// the `BufferVec` was created, the buffer on the [`RenderDevice`]
+    /// is marked as [`BufferUsages::COPY_DST`](BufferUsages).
+    pub fn reserve(&mut self, capacity: usize, device: &RenderDevice) {
+        if capacity <= self.capacity && !self.label_changed {


This is just for my own curiosity, nothing needs to change. Why shouldn't we try to allocate a smaller buffer if the capacity is smaller?

crates/bevy_render/src/render_resource/storage_buffer.rs

crates/bevy_render/src/render_resource/uniform_buffer.rs

Co-authored-by: IceSentry <[email protected]>

alice-i-cecile

Broadly comfortable with the changes here, and the performance gains are compelling evidence that it actually does what it's trying to do.

The math looks correct, I agree with your TODO to consider using UNSAFE. Code quality and docs are, like always, high: much appreciated!

Once the mistaken doc references to the wrong BufferVec in RawBufferVec are cleaned up, you'll have my approval.

kristoff3r

LGTM, I tested it with some examples on wasm just to check.

However, there are a bunch of places still where it refers to BufferVec in docs on functions that actually take RawBufferVec. It might be a good idea to grep for BufferVec and look at all of them.

pcwalton · 2024-05-03T08:20:54Z

Closed as I am completely burned out and have no motivation to continue working on this.

…d make GpuArrayBuffer use it. (#13199) This is an adoption of #12670 plus some documentation fixes. See that PR for more details. --- ## Changelog * Renamed `BufferVec` to `RawBufferVec` and added a new `BufferVec` type. ## Migration Guide `BufferVec` has been renamed to `RawBufferVec` and a new similar type has taken the `BufferVec` name. --------- Co-authored-by: Patrick Walton <[email protected]> Co-authored-by: Alice Cecile <[email protected]> Co-authored-by: IceSentry <[email protected]>

pcwalton requested a review from james7132 March 23, 2024 20:37

alice-i-cecile added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times M-Needs-Release-Note Work that should be called out in the blog due to impact labels Mar 23, 2024

Rename BufferVec to RawBufferVec and EncasedBufferVec to

e0c478c

`BufferVec`

james7132 added the M-Needs-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide label Mar 25, 2024

Merge remote-tracking branch 'origin/main' into encased-buffer-vec

6aee97f

NthTensor added the S-Blocked This cannot move forward until something else changes label Apr 1, 2024

pcwalton mentioned this pull request Apr 7, 2024

Generate MeshUniforms on the GPU via compute shader where available. #12773

Merged

JMS55 added this to the 0.14 milestone Apr 17, 2024

JMS55 removed the S-Blocked This cannot move forward until something else changes label Apr 25, 2024

mnmaita reviewed Apr 30, 2024

View reviewed changes

crates/bevy_render/src/render_resource/gpu_array_buffer.rs Outdated Show resolved Hide resolved

IceSentry approved these changes May 2, 2024

View reviewed changes

alice-i-cecile and others added 3 commits May 2, 2024 15:06

Typo

11685f9

Co-authored-by: IceSentry <[email protected]>

Fixed doc links

028cd82

Co-authored-by: IceSentry <[email protected]>

Remove dead doc link

32caa81

alice-i-cecile reviewed May 2, 2024

View reviewed changes

kristoff3r approved these changes May 2, 2024

View reviewed changes

alice-i-cecile added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged labels May 2, 2024

pcwalton closed this May 3, 2024

kristoff3r mentioned this pull request May 3, 2024

Add BufferVec, an higher-performance alternative to StorageBuffer, and make GpuArrayBuffer use it. #13199

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `EncasedBufferVec`, an higher-performance alternative to `StorageBuffer`, and make `GpuArrayBuffer` use it. #12670

Add `EncasedBufferVec`, an higher-performance alternative to `StorageBuffer`, and make `GpuArrayBuffer` use it. #12670

pcwalton commented Mar 23, 2024

pcwalton commented Mar 23, 2024

superdump commented Mar 23, 2024

pcwalton commented Mar 23, 2024

pcwalton commented Mar 25, 2024

github-actions bot commented Mar 25, 2024

NthTensor commented Apr 1, 2024

Elabajaba commented Apr 28, 2024

IceSentry left a comment

IceSentry May 2, 2024

IceSentry May 2, 2024

IceSentry May 2, 2024

alice-i-cecile left a comment

kristoff3r left a comment

pcwalton commented May 3, 2024

Add EncasedBufferVec, an higher-performance alternative to StorageBuffer, and make GpuArrayBuffer use it. #12670

Add EncasedBufferVec, an higher-performance alternative to StorageBuffer, and make GpuArrayBuffer use it. #12670

Conversation

pcwalton commented Mar 23, 2024

pcwalton commented Mar 23, 2024

superdump commented Mar 23, 2024

pcwalton commented Mar 23, 2024

pcwalton commented Mar 25, 2024

github-actions bot commented Mar 25, 2024

NthTensor commented Apr 1, 2024

Elabajaba commented Apr 28, 2024

IceSentry left a comment

Choose a reason for hiding this comment

IceSentry May 2, 2024

Choose a reason for hiding this comment

IceSentry May 2, 2024

Choose a reason for hiding this comment

IceSentry May 2, 2024

Choose a reason for hiding this comment

alice-i-cecile left a comment

Choose a reason for hiding this comment

kristoff3r left a comment

Choose a reason for hiding this comment

pcwalton commented May 3, 2024

Add `EncasedBufferVec`, an higher-performance alternative to `StorageBuffer`, and make `GpuArrayBuffer` use it. #12670

Add `EncasedBufferVec`, an higher-performance alternative to `StorageBuffer`, and make `GpuArrayBuffer` use it. #12670