-
-
Notifications
You must be signed in to change notification settings - Fork 21.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[3.x] Shadow volume culling and tighter shadow caster culling #82584
Conversation
5fec369
to
bbc4857
Compare
If we mark light shadows as static or dynamic for each light, this could be decided based on whether the light is declared to be static or dynamic. |
This comment was marked as resolved.
This comment was marked as resolved.
9e3c8cb
to
5818a1b
Compare
5818a1b
to
089b09f
Compare
Could we cherry-pick this for 4.x ? |
089b09f
to
d07da23
Compare
Rendering meeting today:
UPDATE: The lookup generation prints the LUT to the standard output, and this can be copied directly to the c++ source.
```
LIGHT VOLUME TABLE BEGIN
Copy this to LUT_entry_sizes: {0, 4, 4, 0, 4, 6, 6, 8, 4, 6, 6, 8, 6, 6, 6, 6, 4, 6, 6, 8, 0, 8, 8, 0, 6, 6, 6, 6, 8, 6, 6, 4, 4, 6, 6, 8, 6, 6, 6, 6, 0, 8, 8, 0, 8, 6, 6, 4, 6, 6, 6, 6, 8, 6, 6, 4, 8, 6, 6, 4, 0, 4, 4, 0, } Copy this to LUT_entries: {0, 0, 0, 0, 0, 0, 0, }, LIGHT VOLUME TABLE END
|
f800bb7
to
e0f2da5
Compare
Tested locally, it works as expected. Visuals look correct too from my testing in various demo projects. Great work, this likely resolves one of Godot's largest rendering bottlenecks in complex scenes 🙂 Benchmark on tps-demoOS: Fedora 38 The project is modified to disable V-Sync. The FPS reported is the highest FPS attained over a period of 10 seconds after loading the level, although I can confirm the average values are always increased in a similar proportion. When CPU-limited, the FPS varies a fair bit over time due to the flying forklift moving in and out of view.
|
Existing shadow caster culling using the BVH takes no account of the camera. This PR adds the highly encapsulated class VisualServerLightCuller which can cut down the casters in the shadow volume to only those which can cast shadows on the camera frustum. This is used to: * More accurately defer dirty updates to shadows when the shadow volume does not intersect the camera frustum. * Tighter cull shadow casters to the view frustum. Lights dirty state is now automatically managed: * Continuous (tighter caster culling) * Static (all casters are rendered)
e0f2da5
to
8ca631a
Compare
I pushed some small improvements, but it turns out the bug in the master version is because it's being used multithread there, and it isn't thread safe. So 3.x version should be fine in that respect, and I'll see if I can fix up the master version. 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I trust the testing that has already been done.
The performance benefits speak for themselves. Let's get this in to 3.6
Thanks! |
Will there be equivalent improvements to Vulkan or is this GLES only? |
This is the 3.x PR, I'm just testing the master PR #84745 . There are improvements to all backends, as the culling takes place before the backend. |
Existing shadow caster culling using the BVH takes no account of the camera. This PR adds the highly encapsulated class VisualServerLightCuller which can cut down the casters in the shadow volume to only those which can cast shadows on the camera frustum.
This is used to:
Lights dirty state is now automatically managed:
Explanation
You can see roughly how it works in this old video of mine (ignore the rooms and portals, that is a separate system):
https://www.youtube.com/watch?v=1WT5AXZlsDc
The blue lines from the light sources to the camera frustum show the extra culling planes.
How does it work?
At runtime, the routine checks each plane of the camera frustum, and finds whether it is facing either towards or away from the light (0 or 1). These bits for the 6 planes form a 6 bit number, which is the lookup.
The lookup tells us a list of corner points from the camera frustum which form a silhouette, which can be used to generate culling planes together with the light origin (3 points form a culling plane).
References:
http://lspiroengine.com/?p=153
http://www.terathon.com/gdc06_lengyel.pdf
Performance
In tests in TPS demo, without GI and just using shadows, in many areas this halves the number of drawcalls / vertex count, in some cases reduces drawcalls by a factor of 10x. This can lead to 10-300% increase in FPS (the increase in FPS depends on settings used, if fill rate is bottlenecking then improving shadows has less dramatic effect and vice versa).
In WroughtFlesh, which uses directional light only, I get a more modest 10% or so improvement if FPS, due to the tighter caster culling with the directional light. So it seems like the benefits are higher the more omnis / spots are used.
Notes
Tighter caster culling and Multiple Cameras
There is one more situation in which tighter caster culling is problematic: when multiple viewports are in use, and the shadow volume intersects multiple cameras.
In this situation tighter caster culling will work - it will do a tight cull on the first camera, and a full render for the second camera. The problem is that it will do 2 shadow renders per frame instead of one.
The answer used here is to detect this situation (in
detect_light_intersects_multiple_cameras()
) and switch to a different modelight_intersects_multiple_cameras
.This reverts to the legacy approach of doing a full render on the first update. However, we still want to detect the situation where it changes back to a single camera. This is done by means of a timeout after a certain number of frames without a double update.
Directional Lights
Directional lights are handled separately in 3.x, they are always updated, and with different shadow maps if multiple cameras are used with viewports. Therefore they can always do the tighter caster cull.
Further work
There is one important further optimization which I have not used yet here. A shadowmap update is triggered by either an object that is paired with a light moving, or the light itself moving. However, if the object / objects moving that trigger the update are culled by tighter shadow casting, there is actually no need to update the shadow map at all, unless it is a full update. This could be significant in some cases, if there is e.g. a moving object that doesn't cast on the frustum that is triggering the whole process.