Split Vertex Buffers Into Position and Interleaved Attribute Streams #2350

The-O-King · 2021-02-25T18:46:04Z

Describe the project you are working on

Godot Renderers

Describe the problem or limitation you are having in your project

Applications with large amounts of vertex data are bottlenecked by memory bandwidth, especially on mobile devices; and applications with lower amounts of vertex data still throw away a lot of power/time/cache performance

Describe the feature / enhancement and how it helps to overcome the problem or limitation

Vertex stream splitting is a cache performance optimization which makes a difference on tile-based GPUs such as those on all Android Devices - in particular during the binning phase of the rendering process as this only requires vertex positions. It can also help improve vertex memory bandwidth usage in other situations such as shadows, where only vertex position data is needed

An initial implementation saw a 50% reduction in vertex memory bandwidth usage on Android devices

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

Vertex Stream splitting will take the existing single buffer used for interleaving all vertex attributes and modify it so that this single buffer is now split into two components - a contiguous block for vertex position data appended to a contiguous block to interleave the remaining vertex attribute data

Before:
|Position1/Normal1/Tangent1/UV1/Position2/Normal2/Tangent2/UV2......|

After:
|Position1/Position2...|Normal1/Tangent1/UV1/Normal2/Tangent2/UV2...|

A test implementation of this has already been written for Godot 3.2 and both it's GLES2/GLES3 renderer. Changes that need to be made occur in

rasterizer_storage_gles[2|3].cpp - the mesh_add_surface() function needs to take into account the new offsets and strides in the buffer for specifying the location of vertex positions and attributes
visual_server.cpp - the surface_set_data() function will need to write the mesh's vertex data into the buffer properly, and the get_array_from_surface() function will need to be updated to read vertex positions/attributes properly with the correct buffer offsets

These changes should also be portable to Godot 4.0 and the Vulkan rendering backend.

If this enhancement will not be used often, can it be worked around with a few lines of script?

N/A

Is there a reason why this should be core and not an add-on in the asset library?

This is a low level change to the way that vertex data management is done in the renderer

Android Games DevTech Team @ Google

The text was updated successfully, but these errors were encountered:

clayjohn · 2021-02-25T19:58:30Z

I think this is already implemented in 4.0. if not, I know reduz has discussed adding it.

reduz · 2021-02-25T22:03:31Z

I did a similar split in 4.0 but not entirely sure if its the same thing.

In 4.0 I have vertex/normal/tangent (20 bytes)
Uv/color/etc.

This split is useful for depth prepass (not so useful on mobile), and skeleton/blendshape processing (done on compute in 4.0). For shadow rendering I use a separate array of only welded vertices in order.to minimize bandwidth and maximize cache usage.

reduz · 2021-02-25T22:12:51Z

Maybe on mobile vertex/nornal/tangent could be deinterlaced, but I would probably want to check performamce gain given its less data anyway.

I also still don't entirely understand the optimization, I thought the binning happens at the rop stage in TBDR, after the vertex program.

For 3.2 I guess it makes sense, though over there I can only imagine it to be an import option.

BastiaanOlij · 2021-02-25T23:11:49Z

@reduz sounds like they already have the work done for 3.2 so if it works I'd say hell yez let's get it added.

For 4.0 I think this is worth some speed tests. I can imagine that if we change it to vertex1/vertex2/vertex.../normal1/normal2/normal.../tangent1/tangent2/tangent.../uv1_1/uv2_1/color1/others1/uv1_2/uv2_2/color2/others2 it is possible we get best of both worlds and do away with the extra buffer. Or maybe make it a setting between the layouts and make it possible to vary depending on platform (so using this layout on mobile, the current layout on desktop, etc).

And indeed, I wonder what the magic is that happens on the GPU that allows it to run enough of the vertex shader to only read position without reading all the other properties before deciding to discard fragments. I don't know enough about how this works internally to judge at all but eager to learn more about this :)

Oh and welcome Android Games DevTech Team, awesome to have you guys chiming in!

The-O-King · 2021-02-26T17:09:07Z

Thanks for the feedback and welcome :)

Mobile devices have an optimization where binning occurs before the full vertex shader is run; the driver takes the vertex shader and compiles a gl_Position shader (using glsl parlance) which only does the calculations necessary to compute gl_Position. This then gets run on all vertices to do binning, culling any vertices that won't contribute fragments to the image onscreen. Once the binning is complete, then the rest of the vertex shader is run on the active vertices!

So having the vertex positions being contiguous would help improve cache performance for example (especially with static meshes). One thing I have not tested is with skeletal meshes which would require bone weights along with vertex positions to compute the final gl_Position, so I'm not sure how much performance will improve in that scenario

I'm not sure how prevalent this optimization is on older devices but on my pixel 3 and pixel 4 it has made a big (50%) difference in vertex memory bandwidth usage for the test scene I created (basically just a single ~100k vertex torus haha)

And yes right now I have an initial implementation for 3.2 that I can clean up and share soon!

clayjohn · 2021-02-26T17:12:32Z

What is the impact on non-mobile devices? Should we be creating a separate path for mobile to take advantage of this optimization?

reduz · 2021-02-26T19:08:59Z

@clayjohn I think on desktop it does not really matter a lot, so we can simply uninterleave the vertices always.

The-O-King · 2021-02-26T19:58:34Z

I tried to get some performance data (look at overall framerate in the Godot editor) on my laptop with intel integrated and GTX 2070 Max-Q and did not see an appreciable drop in performance (I modified the scene to have 50 of these static torus meshes, each with the 100k vertices mentioned before so that it's more of a challenge on desktop haha)

I'm not as familiar with the GPU profiling tools on desktop, for future reference do you all have any preferred tools for looking at performance data on desktop platforms?

clayjohn · 2021-02-26T21:27:02Z

@The-O-King The best tools are usually the ones maintained by your GPU hardware vendor. So for NVidia, I think NSight is likely the best tool for profiling.

The-O-King · 2021-02-26T22:03:25Z

@clayjohn gotcha thanks for the pointer haha, I'll familiarize myself more with it!

The-O-King · 2021-02-26T22:03:32Z

Haven't created a full PR yet but I have a feature branch on my fork of godot up for people to take a look at :)

reduz mentioned making it an import option (right now it just assumes everything wants this new buffer format so you'll have to reimport any existing assets in) I can try to get that working next week!

The-O-King/godot@47be5f1

The-O-King · 2021-03-01T23:24:55Z

Hello everyone! Just wanted to update this with the PR that I submitted with the implementation

I added an import option for the stream splitting, however I also found that the easiest way to pass down the flag for stream splitting was to also pass down the rest of the mesh flags from the Import UI so right now I made it so that you could also select which individual vertex attributes you want to compress on import (this was actually going to be another proposal but I thought this might kill two birds with one stone haha)

This is what it currently looks like UI-wise, let me know what you think or if there's any feedback/improvements on what I've done so far

thanks :)

akien-mga · 2021-07-20T08:49:07Z

Implemented in 3.4 by godotengine/godot#46574.

YuriSizov added topic:core topic:rendering labels Feb 25, 2021

Calinou added platform:android topic:porting labels Feb 25, 2021

The-O-King mentioned this issue Mar 1, 2021

[3.4] Split Vertex Buffer Stream in Positions and Attributes godotengine/godot#46574

Merged

Calinou mentioned this issue Mar 3, 2021

Compress Vertex Normal/Tangent Vectors Using Octahedral Mapping #2395

Closed

Calinou added the topic:3d label Mar 3, 2021

akien-mga added this to the 3.x milestone Jul 20, 2021

akien-mga closed this as completed Jul 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split Vertex Buffers Into Position and Interleaved Attribute Streams #2350

Split Vertex Buffers Into Position and Interleaved Attribute Streams #2350

The-O-King commented Feb 25, 2021 •

edited

Loading

clayjohn commented Feb 25, 2021

reduz commented Feb 25, 2021

reduz commented Feb 25, 2021 •

edited

Loading

BastiaanOlij commented Feb 25, 2021

The-O-King commented Feb 26, 2021

clayjohn commented Feb 26, 2021

reduz commented Feb 26, 2021

The-O-King commented Feb 26, 2021

clayjohn commented Feb 26, 2021

The-O-King commented Feb 26, 2021

The-O-King commented Feb 26, 2021

The-O-King commented Mar 1, 2021

akien-mga commented Jul 20, 2021 •

edited

Loading

Split Vertex Buffers Into Position and Interleaved Attribute Streams #2350

Split Vertex Buffers Into Position and Interleaved Attribute Streams #2350

Comments

The-O-King commented Feb 25, 2021 • edited Loading

Describe the project you are working on

Describe the problem or limitation you are having in your project

Describe the feature / enhancement and how it helps to overcome the problem or limitation

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

If this enhancement will not be used often, can it be worked around with a few lines of script?

Is there a reason why this should be core and not an add-on in the asset library?

clayjohn commented Feb 25, 2021

reduz commented Feb 25, 2021

reduz commented Feb 25, 2021 • edited Loading

BastiaanOlij commented Feb 25, 2021

The-O-King commented Feb 26, 2021

clayjohn commented Feb 26, 2021

reduz commented Feb 26, 2021

The-O-King commented Feb 26, 2021

clayjohn commented Feb 26, 2021

The-O-King commented Feb 26, 2021

The-O-King commented Feb 26, 2021

The-O-King commented Mar 1, 2021

akien-mga commented Jul 20, 2021 • edited Loading

The-O-King commented Feb 25, 2021 •

edited

Loading

reduz commented Feb 25, 2021 •

edited

Loading

akien-mga commented Jul 20, 2021 •

edited

Loading