-
-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to performance on directional shadow maps #6948
Comments
This is how the "Perspective shadow map" family of techniques essentially work: They have superior results to Orthogonal shadows (referred to "SSM" below), but are behind even PSSM2: Note however that the classic PSM only involves a single split. Given how PSM has been shown to be a direct upgrade over the single-split Orthogonal shadows, it may be worth looking into perspective warping all of PSSM's splits: |
@myaaaaaaaaa the problem with perspective shadow maps is that as you rotate the view or move around you can see the shadows deform. |
Static & Dynamic Lights make a Ton of sense for Directional Lights, when paired with Stepped Rendering for Static Directional Shadows. Separation of the two could yield in additional benefits, such as having different Configuration for Dynamic Object Shadows, which Typically have vastly different needs to the rest of the scene, such as much higher resolution requirements but much smaller distances.. This could yield Two types of Performance Benefits:
Stepped Rendering PR has shown itself to yield to nearly cut the Shadow Map Rendering in Half in most of my Tests, and that's rendering 2 Cascades per frame, instead of 1 Cascade Per frame + 1-4 for a smaller substantially subset of geometry, that'd likely cut the rendering of scenes down further (based on the results, with some overhead, possibly down to 1/3 the current rendering time), while not providing any observable artifacts at 60hz |
LiSPSMs and TSMs improve on perspective shadow maps by using alternative frustum projection matrices, which provides results that some may find more acceptable. See the below video (truncated due to filesize) for a quick comparison between Orthogonal (SSM), PSM, and TSM: final2.webmSee the below link for the TSM paper and full videos: |
@mrjustaguy the problem is that the camera still moves and therefor the center point of the shadow map. That means that unless the player stands still, we will need to re-render all static geometry on each shadowmap update negating much of the benefit of splitting static and dynamic rendering of the shadowmap. That said, if we increase the distance over which we snap the center of the shadow map so the static map can be re-used longer, we could mitigate this. |
@myaaaaaaaaa looks like that video was made with setting to prove a point. I've not seen SSM that badly in a small scene like that. Looking at my own test project its a far larger environment and just the first cascade covers a larger area keeping good quality. That said, adding PSM and/or TSM support should definately be considered at some point. The big issue IMHO with shadows is that the various technique all have scenarios where their pro's outweigh their cons and they are obvious the better choice. But then you use them in another scenario and suddenly the weakness in a technique becomes apparent. |
@BastiaanOlij In theory yes, in Practice no, as seen in godotengine/godot#76291 even when moving the camera real hard (rotation and translation), if the objects are static the thing is issue free.. The only issue with that PR and it's proposal are the fact that Dynamic objects aren't taken into account properly, which by splitting the two isn't a problem anymore, and if we split the two, no need for updating 2 cascades at a time, 1 will probably suffice for statics, while all have to be for dynamic. In fact splitting the two and going with 1 static and up to 4 dynamic cascades per frame is going to be an even bigger boon to complex scenes compared to the OG plan of 2 normal cascades per frame, as a good rule of thumb is that over 90% of scene geometry is static in 99% of games Only downside is more memory usage as you've got The shadow map x2, but eh, even with 16k shadow map that's like only a GB for the DS so.. |
@mrjustaguy it's hard to predict what will count more, doubling the passes every other frame, or just doing everything in a single pass every frame. It would be a clear benefit if we didn't have to constantly rerender the static shadow maps because the player position moves. Also it will be 2 static and 4 dynamic per frame. |
Not really hard to predict, Most of the Cost is Triangle Processing, which the OG PR is all about reducing, this would just be able to reduce it further if you go with 1 static split and 4 dynamic per frame I mean take an example scene with 1m shadow Triangles.
With the more Aggressive setup, you're running 32.5% of the triangles that you would be in the reference setup. Now Yes this does ignore the Doubling of VRAM usage and running essentially 2 shadow maps at the same time, but given how little a Shadow map costs when it's not processing many triangles (even if it is fully covering stuff) it'd just be doubling that tiny amount of base cost of it. A nice real world way to test this would be to have 2 Directional Lights, one casting shadows for Dynamic objects only (4 split) and one for Static objects only (but set to ortho/2 split) however I think the culling ain't working for such a test to be setup right now as afaik it's just ignoring the cull masks rn Edit - Do Note that This is for Desktop, On Mobile the scenes are both simpler and I know that they may have issues handling stuff that is basically free for Desktop users, and that the overhead of rendering more passes and higher VRAM requirements could indeed make this unbeneficial for Mobile. Edit 2: I just re-read the proposal, and It's unclear to me as to how Multiview would change the math above as I don't know how it works |
One other possible optimization is the use of occlusion queries to perform "occlusion soft-culling", or reducing the LOD of mostly-occluded objects. See the following PR that implements this for the main render pass, which should be able to serve as a foundation for a hypothetical shadow pass implementation: godotengine/godot#76297 comparison.mp4Left: "Hard-culling" Occlusion queries have the well-known downside where newly unoccluded objects tend to spontaneously pop into existence due to receiving the occlusion results several frames late. However, with soft-culling, the occlusion query artifacts instead manifest as slightly delayed LOD changes, which should be much less noticeable, especially when done in the shadow pass. Additionally, there exists the See a relevant tweet from the developer of Wicked Engine regarding the combination of |
Support for the extension is quite low: It's not clear though which vendors actually support it, since when I click on the details, I see reports for all AMD, NVIDIA and Intel. The extension has existed since 2018, but I don't know since when graphics drivers have implemented it – though I don't think that many people are running outdated drivers. |
This seems to be primarily caused by Android: Note that these adoption rates are higher than Variable Rate Shading, which has already been integrated into Godot, and is also an extension that can be seamlessly disabled on unsupported devices: |
Did anyone here try implementing https://web.archive.org/web/20101208212121/http://visual-computing.intel-research.net/art/publications/sdsm/ and can explain the results they got? Trying to see if this is a rabbit hole we've went down. |
This was already proposed in #599, but I don't think the benefits outweigh the downsides. Alternative shadow rendering techniques from rendering papers often have poorly documented downsides that you only encounter once you start using them in production 🙂 ESM was also supported in Godot 2.x, and it had a notoriously "washed out" appearance. This was particularly obvious for small shadow casters that are close to the surface receiving the shadow (something VSMs also struggle with). If you want better directional shadow map quality, #3908 is likely the way to go as it's a battle-tested solution. |
i dont get the downsides, like they seem like pretty robust methods to me, but i understand the point on the poor documentation i havent heard of that godot 2.x shadow issue, and searching it hasnt really showed me much, honestly godot 2 was pretty raggedy anyways right? kinda to be expected, im sure with the resources godot has nowadays it would be possible to have a good ESM system (in my opinion at least) im looking at that pr, looks promising to me, how would i be able to use it? thanks for the response, truly helpful |
#3908 isn't implemented yet; it's only a proposal. There is a branch linked in the proposal, but it's not in a working state.
Start here: Internal rendering architecture However, expect this to be nontrivial. If you plan on submitting this work upstream, you should open a proposal first before working on a pull request. |
Describe the project you are working on
For the past few weeks I've been pulling the directional shadow map implementation apart for the Vulkan renderer to try and see where we can make improvements. This proposal attempts to bring some of the ideas we've already tried, some existing suggestions and some new suggestions together so we can further discuss where we should put our efforts.
Describe the problem or limitation you are having in your project
There are a number of issues with directional shadow maps, both on a level of quality and performance.
On the subject of quality it is important to note that there is nothing wrong with the approach Godot currently takes for directional lights. This is mostly a matter of perfecting settings and understanding that settings for good looking shadows differ widely depending on scene composition. There is no magical default that works.
The focus here will then be on performance and how we can minimize the overhead of updating directional shadow maps as these often require frequent updates.
To understand the approach Godot takes to rendering directional shadow maps, Jonathan Blows blog posts on stable cascade shadow maps is a good read:
http://the-witness.net/news/2010/03/graphics-tech-shadow-maps-part-1/
In order to further investigate and visualise the use of cascaded shadow maps two PRs were implemented:
For our test scene we can see that our cascade distances are nicely setup:
But looking at our frustums we can see that at this view angle we not only have limited coverage in the shadow maps, we're rendering a lot of geometry that is never sampled within the view frustums:
Moving the camera around we can see that at certain angles the coverage does increase but we still are left with large areas of the shadow maps never being used in the end used.
Now we could "solve" this by changing the projection we use to render the shadow maps to have better coverage but nearly all techniques will either lead to further visual artifacts or to other visual side effects as the player moves around the level.
In fact, the current approach is about as optimal as we can get it.
The challenge will be to reduce what we render.
Describe the feature / enhancement and how it helps to overcome the problem or limitation
Stepped update of the cascades
This has been detailed out in a previous proposal and has been implemented in godotengine/godot#76291
This is a good idea in theory but in practice has a number of drawbacks which resulted in this PR being on hold.
We think we will be able to resolve these drawbacks by rendering static and dynamic objects separately however for directional lights this may be a dead end.
Splitting static and dynamic objects
So as mentioned above lets look at this technique. In a nutshell this change will result in two shadow map textures being maintained.
One contains only static objects (object that do not move). Generally speaking the majority of a scene will be static and this allows us, provided our light is also stationary, to render all the static geometry to this shadow map only once (or at least at low frequency).
The other shadow map is updated every frame in which dynamic objects have moved within the clipping volume of the light. We start by copying the static shadow map into the dynamic shadow map and then render all the dynamic objects into this. This is then used when applying shadow to our render result. The obvious gain is that much less geometry is rendered each frame.
This is already on the roadmap however the operative word is
provided our light is also stationary
. Now while a directional light is stationary, it's shadowmaps are dependent on the position of the camera. As the player moves, even the static shadowmaps need frequent updating and the overhead of performing two passes may outweigh the gains.There may be a gain when using this in combination with the aforementioned stepping approach however this would require adding a reprojection of the static shadow map if we didn't update that shadow map in the current frame but are
Limiting the light shadowmap frustum
This is potentially the easiest win. Without changing the dimensions of the shadow maps, we can limit our render area and adjust the lights projection matrix (and thus clipping volume) to only render the general area the view frustum covers.
So we could end up rendering just:
Multiview shadowmaps
Just for illustration I enhanced the frustum drawing logic to draw all frustums in the last cascade (might actually update the PR with this):
No matter how we turn the camera or angle our light, our complete view frustum will always fit within our 4th cascade.
edit this is not entirely true for view frustums with a FOV less then 65 degrees, however by being smart with culling we can create a single drawlist with all culled objects and removing duplicates.
While this will put a requirement on hardware supporting multiview, and will require us to change the shadowmap logic to use layers, this opens the door to use the full frustum to cull what is rendered and only do a single pass instead of 4 passes, especially considering the last pass would have hit all objects to begin with.
The overhead saved by not processing 4 passes will likely outweigh the overhead introduced by multiview discarding triangles in the lower cascades.
Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams
n/a
If this enhancement will not be used often, can it be worked around with a few lines of script?
This is core to the rendering pipeline
Is there a reason why this should be core and not an add-on in the asset library?
This is core to the rendering pipeline
The text was updated successfully, but these errors were encountered: