Skip to content

Commit

Permalink
Add EncasedBufferVec, an higher-performance alternative to
Browse files Browse the repository at this point in the history
`StorageBuffer`, and make `GpuArrayBuffer` use it.

`EncasedBufferVec` is like `BufferVec`, but it doesn't require that the
type be `Pod`. Alternately, it's like `StorageBuffer<Vec<T>>`, except it
doesn't allow CPU access to the data after it's been pushed.
`GpuArrayBuffer` already doesn't allow CPU access to the data, so
switching it to use `EncasedBufferVec` doesn't regress any functionality
and offers higher performance.

Shutting off CPU access eliminates the need to copy to a scratch buffer,
which results in significantly higher performance. *Note that this needs
teoxoy/encase#65 from @james7132 to achieve
end-to-end performance benefits*, because `encase` is rather slow at
encoding data without that patch, swamping the benefits of avoiding the
copy. With that patch applied, and `#[inline]` added to `encase`'s
`derive` implementation of `write_into` on structs, this results in a
*16% overall speedup on `many_cubes --no-frustum-culling`*.

I've verified that the generated code is now close to optimal. The only
reasonable potential improvement that I see is to eliminate the zeroing
in `push`. This requires unsafe code, however, so I'd prefer to leave
that to a followup.
  • Loading branch information
pcwalton committed Mar 23, 2024
1 parent e33b93e commit 47ae76f
Show file tree
Hide file tree
Showing 4 changed files with 182 additions and 14 deletions.
166 changes: 164 additions & 2 deletions crates/bevy_render/src/render_resource/buffer_vec.rs
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
use std::{iter, marker::PhantomData};

use crate::{
render_resource::Buffer,
renderer::{RenderDevice, RenderQueue},
};
use bytemuck::{cast_slice, Pod};
use wgpu::BufferUsages;
use encase::{
internal::{WriteInto, Writer},
ShaderType,
};
use wgpu::{BufferAddress, BufferUsages};

/// A structure for storing raw bytes that have already been properly formatted
/// for use by the GPU.
Expand Down Expand Up @@ -112,7 +118,7 @@ impl<T: Pod> BufferVec<T> {
let size = self.item_size * capacity;
self.buffer = Some(device.create_buffer(&wgpu::BufferDescriptor {
label: self.label.as_deref(),
size: size as wgpu::BufferAddress,
size: size as BufferAddress,
usage: BufferUsages::COPY_DST | self.buffer_usage,
mapped_at_creation: false,
}));
Expand Down Expand Up @@ -160,3 +166,159 @@ impl<T: Pod> Extend<T> for BufferVec<T> {
self.values.extend(iter);
}
}

/// Like [`BufferVec`], but doesn't require that the data type `T` be [`Pod`].
///
/// This is a high-performance data structure that you should use whenever
/// possible if your data is more complex than is suitable for [`BufferVec`].
/// The [`ShaderType`] trait from the `encase` library is used to ensure that
/// the data is correctly aligned for use by the GPU.
///
/// For performance reasons, unlike [`BufferVec`], this type doesn't allow CPU
/// access to the data after it's been added via [`EncasedBufferVec::push`]. If
/// you need CPU access to the data, consider another type, such as
/// [`StorageBuffer`].
pub struct EncasedBufferVec<T>
where
T: ShaderType + WriteInto,
{
data: Vec<u8>,
buffer: Option<Buffer>,
capacity: usize,
buffer_usage: BufferUsages,
label: Option<String>,
label_changed: bool,
phantom: PhantomData<T>,
}

impl<T> EncasedBufferVec<T>
where
T: ShaderType + WriteInto,
{
/// Creates a new [`EncasedBufferVec`] with the given [`BufferUsages`].
pub const fn new(buffer_usage: BufferUsages) -> Self {
Self {
data: vec![],
buffer: None,
capacity: 0,
buffer_usage,
label: None,
label_changed: false,
phantom: PhantomData,
}
}

/// Returns a handle to the buffer, if the data has been uploaded.
#[inline]
pub fn buffer(&self) -> Option<&Buffer> {
self.buffer.as_ref()
}

/// Returns the amount of space that the GPU will use before reallocating.
#[inline]
pub fn capacity(&self) -> usize {
self.capacity
}

/// Returns the number of items that have been pushed to this buffer.
#[inline]
pub fn len(&self) -> usize {
self.data.len() / u64::from(T::min_size()) as usize
}

/// Returns true if the buffer is empty.
#[inline]
pub fn is_empty(&self) -> bool {
self.data.is_empty()
}

/// Adds a new value and returns its index.
pub fn push(&mut self, value: T) -> usize {
let element_size = u64::from(T::min_size()) as usize;
let offset = self.data.len();

// TODO: Consider using unsafe code to push uninitialized, to prevent
// the zeroing. It shows up in profiles.
self.data.extend(iter::repeat(0).take(element_size));

// Take a slice of the new data for `write_into` to use. This is
// important: it hoists the bounds check up here so that the compiler
// can eliminate all the bounds checks that `write_into` will emit.
let mut dest = &mut self.data[offset..(offset + element_size)];
value.write_into(&mut Writer::new(&value, &mut dest, 0).unwrap());

offset / u64::from(T::min_size()) as usize
}

/// Changes the debugging label of the buffer.
///
/// The next time the buffer is updated (via [`reserve`]), Bevy will inform
/// the driver of the new label.
pub fn set_label(&mut self, label: Option<&str>) {
let label = label.map(str::to_string);

if label != self.label {
self.label_changed = true;
}

self.label = label;
}

/// Returns the label.
pub fn get_label(&self) -> Option<&str> {
self.label.as_deref()
}

/// Creates a [`Buffer`] on the [`RenderDevice`] with size
/// at least `std::mem::size_of::<T>() * capacity`, unless a such a buffer already exists.
///
/// If a [`Buffer`] exists, but is too small, references to it will be discarded,
/// and a new [`Buffer`] will be created. Any previously created [`Buffer`]s
/// that are no longer referenced will be deleted by the [`RenderDevice`]
/// once it is done using them (typically 1-2 frames).
///
/// In addition to any [`BufferUsages`] provided when
/// the `BufferVec` was created, the buffer on the [`RenderDevice`]
/// is marked as [`BufferUsages::COPY_DST`](BufferUsages).
pub fn reserve(&mut self, capacity: usize, device: &RenderDevice) {
if capacity <= self.capacity && !self.label_changed {
return;
}

self.capacity = capacity;
let size = u64::from(T::min_size()) as usize * capacity;
self.buffer = Some(device.create_buffer(&wgpu::BufferDescriptor {
label: self.label.as_deref(),
size: size as BufferAddress,
usage: BufferUsages::COPY_DST | self.buffer_usage,
mapped_at_creation: false,
}));
self.label_changed = false;
}

/// Queues writing of data from system RAM to VRAM using the [`RenderDevice`]
/// and the provided [`RenderQueue`].
///
/// Before queuing the write, a [`reserve`](EncasedBufferVec::reserve)
/// operation is executed.
pub fn write_buffer(&mut self, device: &RenderDevice, queue: &RenderQueue) {
if self.data.is_empty() {
return;
}

self.reserve(self.data.len() / u64::from(T::min_size()) as usize, device);

let Some(buffer) = &self.buffer else { return };
queue.write_buffer(buffer, 0, &self.data);
}

/// Reduces the length of the buffer.
pub fn truncate(&mut self, len: usize) {
self.data.truncate(u64::from(T::min_size()) as usize * len);
}

/// Removes all elements from the buffer.
pub fn clear(&mut self) {
self.data.clear();
}
}
26 changes: 14 additions & 12 deletions crates/bevy_render/src/render_resource/gpu_array_buffer.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
use super::{
binding_types::{storage_buffer_read_only, uniform_buffer_sized},
BindGroupLayoutEntryBuilder, StorageBuffer,
BindGroupLayoutEntryBuilder, EncasedBufferVec,
};
use crate::{
render_resource::batched_uniform_buffer::BatchedUniformBuffer,
Expand All @@ -10,29 +10,31 @@ use bevy_ecs::{prelude::Component, system::Resource};
use encase::{private::WriteInto, ShaderSize, ShaderType};
use nonmax::NonMaxU32;
use std::marker::PhantomData;
use wgpu::BindingResource;
use wgpu::{BindingResource, BufferUsages};

/// Trait for types able to go in a [`GpuArrayBuffer`].
pub trait GpuArrayBufferable: ShaderType + ShaderSize + WriteInto + Clone {}
impl<T: ShaderType + ShaderSize + WriteInto + Clone> GpuArrayBufferable for T {}

/// Stores an array of elements to be transferred to the GPU and made accessible to shaders as a read-only array.
///
/// On platforms that support storage buffers, this is equivalent to [`StorageBuffer<Vec<T>>`].
/// Otherwise, this falls back to a dynamic offset uniform buffer with the largest
/// array of T that fits within a uniform buffer binding (within reasonable limits).
/// On platforms that support storage buffers, this is equivalent to
/// [`EncasedBufferVec<T>`]. Otherwise, this falls back to a dynamic offset
/// uniform buffer with the largest array of T that fits within a uniform buffer
/// binding (within reasonable limits).
///
/// Other options for storing GPU-accessible data are:
/// * [`StorageBuffer`]
/// * [`DynamicStorageBuffer`](crate::render_resource::DynamicStorageBuffer)
/// * [`UniformBuffer`](crate::render_resource::UniformBuffer)
/// * [`DynamicUniformBuffer`](crate::render_resource::DynamicUniformBuffer)
/// * [`BufferVec`](crate::render_resource::BufferVec)
/// * [`EncasedBufferVec`](crate::render_resource::EncasedBufferVec)
/// * [`Texture`](crate::render_resource::Texture)
#[derive(Resource)]
pub enum GpuArrayBuffer<T: GpuArrayBufferable> {
Uniform(BatchedUniformBuffer<T>),
Storage(StorageBuffer<Vec<T>>),
Storage(EncasedBufferVec<T>),
}

impl<T: GpuArrayBufferable> GpuArrayBuffer<T> {
Expand All @@ -41,24 +43,22 @@ impl<T: GpuArrayBufferable> GpuArrayBuffer<T> {
if limits.max_storage_buffers_per_shader_stage == 0 {
GpuArrayBuffer::Uniform(BatchedUniformBuffer::new(&limits))
} else {
GpuArrayBuffer::Storage(StorageBuffer::default())
GpuArrayBuffer::Storage(EncasedBufferVec::new(BufferUsages::STORAGE))
}
}

pub fn clear(&mut self) {
match self {
GpuArrayBuffer::Uniform(buffer) => buffer.clear(),
GpuArrayBuffer::Storage(buffer) => buffer.get_mut().clear(),
GpuArrayBuffer::Storage(buffer) => buffer.clear(),
}
}

pub fn push(&mut self, value: T) -> GpuArrayBufferIndex<T> {
match self {
GpuArrayBuffer::Uniform(buffer) => buffer.push(value),
GpuArrayBuffer::Storage(buffer) => {
let buffer = buffer.get_mut();
let index = buffer.len() as u32;
buffer.push(value);
let index = buffer.push(value) as u32;
GpuArrayBufferIndex {
index,
dynamic_offset: None,
Expand Down Expand Up @@ -91,7 +91,9 @@ impl<T: GpuArrayBufferable> GpuArrayBuffer<T> {
pub fn binding(&self) -> Option<BindingResource> {
match self {
GpuArrayBuffer::Uniform(buffer) => buffer.binding(),
GpuArrayBuffer::Storage(buffer) => buffer.binding(),
GpuArrayBuffer::Storage(buffer) => {
buffer.buffer().map(|buffer| buffer.as_entire_binding())
}
}
}

Expand Down
2 changes: 2 additions & 0 deletions crates/bevy_render/src/render_resource/storage_buffer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ use wgpu::{util::BufferInitDescriptor, BindingResource, BufferBinding, BufferUsa
/// * [`DynamicUniformBuffer`](crate::render_resource::DynamicUniformBuffer)
/// * [`GpuArrayBuffer`](crate::render_resource::GpuArrayBuffer)
/// * [`BufferVec`](crate::render_resource::BufferVec)
/// * [`EncasedBufferVec`](crate::render_resource::EncasedBufferVec)
/// * [`Texture`](crate::render_resource::Texture)
///
/// [std430 alignment/padding requirements]: https://www.w3.org/TR/WGSL/#address-spaces-storage
Expand Down Expand Up @@ -155,6 +156,7 @@ impl<T: ShaderType + WriteInto> StorageBuffer<T> {
/// * [`DynamicUniformBuffer`](crate::render_resource::DynamicUniformBuffer)
/// * [`GpuArrayBuffer`](crate::render_resource::GpuArrayBuffer)
/// * [`BufferVec`](crate::render_resource::BufferVec)
/// * [`EncasedBufferVec`](crate::render_resource::EncasedBufferVec)
/// * [`Texture`](crate::render_resource::Texture)
///
/// [std430 alignment/padding requirements]: https://www.w3.org/TR/WGSL/#address-spaces-storage
Expand Down
2 changes: 2 additions & 0 deletions crates/bevy_render/src/render_resource/uniform_buffer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ use super::IntoBinding;
/// * [`DynamicUniformBuffer`]
/// * [`GpuArrayBuffer`](crate::render_resource::GpuArrayBuffer)
/// * [`BufferVec`](crate::render_resource::BufferVec)
/// * [`EncasedBufferVec`](crate::render_resource::EncasedBufferVec)
/// * [`Texture`](crate::render_resource::Texture)
///
/// [std140 alignment/padding requirements]: https://www.w3.org/TR/WGSL/#address-spaces-uniform
Expand Down Expand Up @@ -169,6 +170,7 @@ impl<'a, T: ShaderType + WriteInto> IntoBinding<'a> for &'a UniformBuffer<T> {
/// * [`DynamicUniformBuffer`]
/// * [`GpuArrayBuffer`](crate::render_resource::GpuArrayBuffer)
/// * [`BufferVec`](crate::render_resource::BufferVec)
/// * [`EncasedBufferVec`](crate::render_resource::EncasedBufferVec)
/// * [`Texture`](crate::render_resource::Texture)
///
/// [std140 alignment/padding requirements]: https://www.w3.org/TR/WGSL/#address-spaces-uniform
Expand Down

0 comments on commit 47ae76f

Please sign in to comment.