Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EncasedBufferVec, an higher-performance alternative to StorageBuffer, and make GpuArrayBuffer use it. #12670

Closed
wants to merge 6 commits into from

Conversation

pcwalton
Copy link
Contributor

EncasedBufferVec is like BufferVec, but it doesn't require that the type be Pod. Alternately, it's like StorageBuffer<Vec<T>>, except it doesn't allow CPU access to the data after it's been pushed. GpuArrayBuffer already doesn't allow CPU access to the data, so switching it to use EncasedBufferVec doesn't regress any functionality and offers higher performance.

Shutting off CPU access eliminates the need to copy to a scratch buffer, which results in significantly higher performance. Note that this needs teoxoy/encase#65 from @james7132 to achieve end-to-end performance benefits, because encase is rather slow at encoding data without that patch, swamping the benefits of avoiding the copy. With that patch applied, and #[inline] added to encase's derive implementation of write_into on structs, this results in a 16% overall speedup on many_cubes --no-frustum-culling.

I've verified that the generated code is now close to optimal. The only reasonable potential improvement that I see is to eliminate the zeroing in push. This requires unsafe code, however, so I'd prefer to leave that to a followup.

Here's write_batched_instance_buffer before-and-after (yellow = after, red = before) for many_cubes --no-frustum-culling:

Screenshot 2024-03-23 130623

`StorageBuffer`, and make `GpuArrayBuffer` use it.

`EncasedBufferVec` is like `BufferVec`, but it doesn't require that the
type be `Pod`. Alternately, it's like `StorageBuffer<Vec<T>>`, except it
doesn't allow CPU access to the data after it's been pushed.
`GpuArrayBuffer` already doesn't allow CPU access to the data, so
switching it to use `EncasedBufferVec` doesn't regress any functionality
and offers higher performance.

Shutting off CPU access eliminates the need to copy to a scratch buffer,
which results in significantly higher performance. *Note that this needs
teoxoy/encase#65 from @james7132 to achieve
end-to-end performance benefits*, because `encase` is rather slow at
encoding data without that patch, swamping the benefits of avoiding the
copy. With that patch applied, and `#[inline]` added to `encase`'s
`derive` implementation of `write_into` on structs, this results in a
*16% overall speedup on `many_cubes --no-frustum-culling`*.

I've verified that the generated code is now close to optimal. The only
reasonable potential improvement that I see is to eliminate the zeroing
in `push`. This requires unsafe code, however, so I'd prefer to leave
that to a followup.
@pcwalton pcwalton requested a review from james7132 March 23, 2024 20:37
@pcwalton
Copy link
Contributor Author

I'm not wedded to the name EncasedBufferVec. Alternative suggestions are welcome.

@superdump
Copy link
Contributor

Nice performance improvements in conjunction with James’ encase changes. I don’t know what the name should be but I don’t think it’s quite right.

I also want us to check thoroughly that this will never produce misaligned data. I think it will be fine but I’m not certain. For a wgsl binding containing only an array the alignment of each element is supposed to be the alignment of T and if T is a struct then it is the max of the alignments of the members of the struct type. And the size of a struct type is the offset of the last member plus the size of the last member rounded up to the alignment. So I think it will be fine… There is one case where it might break - WebGL2 where 16-byte aligned sizes are always needed or something like that.

@alice-i-cecile alice-i-cecile added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times M-Needs-Release-Note Work that should be called out in the blog due to impact labels Mar 23, 2024
@pcwalton
Copy link
Contributor Author

How about renaming BufferVec to RawBufferVec and EncasedBufferVec to BufferVec?

@pcwalton
Copy link
Contributor Author

I did that renaming since nobody objected.

@james7132 james7132 added the M-Needs-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide label Mar 25, 2024
Copy link
Contributor

It looks like your PR is a breaking change, but you didn't provide a migration guide.

Could you add some context on what users should update when this change get released in a new version of Bevy?
It will be used to help writing the migration guide for the version. Putting it after a ## Migration Guide will help it get automatically picked up by our tooling.

@NthTensor
Copy link
Contributor

Look's great, but I'm marking this as blocked until the encase patch goes through.

@NthTensor NthTensor added the S-Blocked This cannot move forward until something else changes label Apr 1, 2024
@JMS55 JMS55 added this to the 0.14 milestone Apr 17, 2024
@JMS55 JMS55 removed the S-Blocked This cannot move forward until something else changes label Apr 25, 2024
@Elabajaba
Copy link
Contributor

Encase released a new version a few days ago.

I think this just needs #12757 now?

Copy link
Contributor

@IceSentry IceSentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming CI passes, LGTM. There's just a few issues left with the renaming I think.

I haven't done any performance testing but I trust you on that part. I've tested that everything still works as expected though.

@@ -28,7 +34,7 @@ use wgpu::BufferUsages;
/// * [`GpuArrayBuffer`](crate::render_resource::GpuArrayBuffer)
/// * [`BufferVec`]
/// * [`Texture`](crate::render_resource::Texture)
pub struct BufferVec<T: Pod> {
pub struct RawBufferVec<T: Pod> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't add a suggestion but the doc comment has a few mentions to BufferVec that should be renamed.

// Take a slice of the new data for `write_into` to use. This is
// important: it hoists the bounds check up here so that the compiler
// can eliminate all the bounds checks that `write_into` will emit.
let mut dest = &mut self.data[offset..(offset + element_size)];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nice, I never thought about doing that but it's a neat trick to know.

crates/bevy_render/src/render_resource/buffer_vec.rs Outdated Show resolved Hide resolved
/// the `BufferVec` was created, the buffer on the [`RenderDevice`]
/// is marked as [`BufferUsages::COPY_DST`](BufferUsages).
pub fn reserve(&mut self, capacity: usize, device: &RenderDevice) {
if capacity <= self.capacity && !self.label_changed {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for my own curiosity, nothing needs to change. Why shouldn't we try to allocate a smaller buffer if the capacity is smaller?

crates/bevy_render/src/render_resource/storage_buffer.rs Outdated Show resolved Hide resolved
crates/bevy_render/src/render_resource/storage_buffer.rs Outdated Show resolved Hide resolved
crates/bevy_render/src/render_resource/uniform_buffer.rs Outdated Show resolved Hide resolved
crates/bevy_render/src/render_resource/uniform_buffer.rs Outdated Show resolved Hide resolved
alice-i-cecile and others added 3 commits May 2, 2024 15:06
Co-authored-by: IceSentry <[email protected]>
Co-authored-by: IceSentry <[email protected]>
Copy link
Member

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly comfortable with the changes here, and the performance gains are compelling evidence that it actually does what it's trying to do.

The math looks correct, I agree with your TODO to consider using UNSAFE. Code quality and docs are, like always, high: much appreciated!

Once the mistaken doc references to the wrong BufferVec in RawBufferVec are cleaned up, you'll have my approval.

Copy link
Contributor

@kristoff3r kristoff3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I tested it with some examples on wasm just to check.

However, there are a bunch of places still where it refers to BufferVec in docs on functions that actually take RawBufferVec. It might be a good idea to grep for BufferVec and look at all of them.

@alice-i-cecile alice-i-cecile added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged labels May 2, 2024
@pcwalton pcwalton closed this May 3, 2024
@pcwalton
Copy link
Contributor Author

pcwalton commented May 3, 2024

Closed as I am completely burned out and have no motivation to continue working on this.

github-merge-queue bot pushed a commit that referenced this pull request May 3, 2024
…d make GpuArrayBuffer use it. (#13199)

This is an adoption of #12670 plus some documentation fixes. See that PR
for more details.

---

## Changelog

* Renamed `BufferVec` to `RawBufferVec` and added a new `BufferVec`
type.

## Migration Guide
`BufferVec` has been renamed to `RawBufferVec` and a new similar type
has taken the `BufferVec` name.

---------

Co-authored-by: Patrick Walton <[email protected]>
Co-authored-by: Alice Cecile <[email protected]>
Co-authored-by: IceSentry <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times M-Needs-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide M-Needs-Release-Note Work that should be called out in the blog due to impact S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

10 participants