Optimize PointLight2D shadow rendering by reducing draw calls and RD state changes #100302

clayjohn · 2024-12-12T07:58:11Z

This dramatically reduces the CPU time spent on rendering shadows for PointLight2Ds

I think this fixes the remaining regression in #99420
Fixes: #73805

Basically the problem is that certain changes in 4.4 have increased the cost of:

Most RD API calls (this is the Thread Guard change)
Ending a draw list (this is the command intersection checks)

Rendering PointLight2D Shadows creates 4 draw lists per light on screen and 4 * lights_on_screen * occluders_on_screen draw calls. And each draw call comes with 4 other API calls.

Therefore, we are checking the thread guard 5 * 4 * lights_on_screen * occluders_on_screen In #99420 this means we check the thread guard over 100,000 times. So, despite it being a very cheap operation, it ends up reducing performance in a measurable way.

Eventually we will reduce the cost of Thread guards by disabling them in release builds and maybe by disabling them when RD functions are called internally. But for now, the best option is just to drastically reduce the algorithmic complexity and other costs of rendering shadows. I did that with a number of things:

Cull occluders against each light, so we only render occluders that matter
Move the projection creation to the GPU so we transfer less data (binding the push constant is one of the more expensive operations simply because of the memory copies)
Use viewport culling instead of creating a render pass for each direction
Save the occluder transforms in an SSBO and reuse for all lights so we only pay the upload cost once
Move culling into the fragment shader so we don't have to constantly switch pipelines

Most of these changes increase the cost on the GPU. However, this shader is still so simple that the GPU spends way more time waiting for commands than it does actually drawing things. So these changes have no measurable impact on GPU time.

Finally, I left one optimization on the table. We can reduce the entire draw loop to 4 draw calls per light by using one giant shared vertex buffer and using vertex pulling in the shader to read the vertex positions. I didn't implement this since:

It would add a significant amount of complexity and make the whole process harder to understand
It would require a lot of bookeeping
It would be much riskier than the current changes

Overall, since I got the performance gain I needed and the current code is not much more complex than it was before, I decided to leave it here.

Performance

In my test scene Performance goes from 330 FPS in master to 430 FPS (Windows, RX 3600, release builds)
For comparison, 4.4 dev3 was about 380 FPS. So I am confident that this already fully restores the performance from 4.3 and then some.

On a M2 MBP it goes from 160 FPS (dev3) to 400 FPS (debug build) (which makes sense since it is a tiling architecture)

On a Pixel 4 it goes from 17 FPS to 60 FPS (vsync locked)

This test project is intended to be a worst case scenario since it has so many lights and occluders on screen at once
light2dopt.zip

stuartcarnie

Those are some really nice improvements.

servers/rendering/renderer_rd/renderer_canvas_render_rd.cpp

…state changes. This dramatically reduces the CPU time spent on rendering shadows for PointLight2Ds

akien-mga · 2024-12-17T22:07:30Z

Thanks!

clayjohn added bug topic:rendering performance labels Dec 12, 2024

clayjohn added this to the 4.4 milestone Dec 12, 2024

clayjohn mentioned this pull request Dec 12, 2024

FPS almost halved going from 4.4 dev3 to 4.4 dev 4 (regression from #98652) #99420

Closed

clayjohn force-pushed the light2d-optimize branch from f40f196 to 25fd8fb Compare December 12, 2024 08:21

clayjohn marked this pull request as ready for review December 12, 2024 22:04

clayjohn requested a review from a team as a code owner December 12, 2024 22:04

clayjohn mentioned this pull request Dec 12, 2024

2D optimizations issue on Adreno ("SnapDragon") GPUs #73805

Closed

Calinou added the topic:2d label Dec 13, 2024

stuartcarnie approved these changes Dec 17, 2024

View reviewed changes

clayjohn mentioned this pull request Dec 17, 2024

Optimize 2D lights using specialization constants in RD renderer #100501

Draft

AThousandShips reviewed Dec 17, 2024

View reviewed changes

servers/rendering/renderer_rd/renderer_canvas_render_rd.cpp Outdated Show resolved Hide resolved

servers/rendering/renderer_rd/renderer_canvas_render_rd.cpp Outdated Show resolved Hide resolved

Optimize PointLight2D shadow rendering by reducing draw calls and RD …

7c61252

…state changes. This dramatically reduces the CPU time spent on rendering shadows for PointLight2Ds

clayjohn force-pushed the light2d-optimize branch from 25fd8fb to 7c61252 Compare December 17, 2024 15:41

akien-mga merged commit 190ae9f into godotengine:master Dec 17, 2024
20 checks passed

pmoosi mentioned this pull request Dec 19, 2024

PointLight2D shadow buffer error #100609

Closed

clayjohn mentioned this pull request Dec 21, 2024

Properly transform light rect and occluder rect to perform Light2D culling in canvas space #100677

Merged

clayjohn deleted the light2d-optimize branch January 28, 2025 21:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize PointLight2D shadow rendering by reducing draw calls and RD state changes #100302

Optimize PointLight2D shadow rendering by reducing draw calls and RD state changes #100302

clayjohn commented Dec 12, 2024 •

edited

Loading

stuartcarnie left a comment

akien-mga commented Dec 17, 2024

Optimize PointLight2D shadow rendering by reducing draw calls and RD state changes #100302

Optimize PointLight2D shadow rendering by reducing draw calls and RD state changes #100302

Conversation

clayjohn commented Dec 12, 2024 • edited Loading

stuartcarnie left a comment

Choose a reason for hiding this comment

akien-mga commented Dec 17, 2024

clayjohn commented Dec 12, 2024 •

edited

Loading