-
-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[3.x] Implement Octahedral Map Normal/Tangent Attribute Compression #46800
Conversation
d381791
to
034e36f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a neat way to add this in as a feature. My only question is if it would be more efficient to just leave one axis out and doing a z = sqrt(1.0 - x*x - y*y)
approach. Don't know which is faster on modern GPUs, I can imagine cos/sin lookups are faster then sqrt.
Also this probably needs an implementation on master as well.
I feel @reduz should provide an opinion on this as well.
Two things: First to mirror Bastiaan above, I'm not sure spherical coordinates are the best approach. The spheremap transform described here has better quality and speed. And the simple reconstruction has better speed as well. Spherical coordinates are going to use a lot of cycles. Have you profiled how this PR affects performance, in theory there should be a nice gain by reducing bandwidth pressure, but for some render tasks bandwidth isn't the bottleneck. Second, this PR is only necessary for 3.x so don't worry about 4.0. in 4.0 reduz removed the "compress" option. |
I thought about the leave-the-z-element-out approach but I don't think that works for raw mesh data because that approach needs to assume the sign of the z element, which may work for G-Buffer deferred rendering (you can assume your vertices are facing towards the screen as they would be in view space) but using it for general mesh storage requires us to store the sign somewhere so it's more tricky I did do some profiling and I did not see any performance degradation using spherical coordinates (although there was more ALU usage), only reductions in vertex memory bandwidth usage and reductions in vertex fetch stalls. There is also a third option I have used before - QTangents, which store the entire tangent space, but might take a little more effort to get working within the current setup that Godot has which treats the normal/tangent as separate vectors entirely (although not impossible) Let me know what you all think :) |
@The-O-King Subject to your testing, I think it makes sense to just use the spheremap transformation rather than use spherical coordinates (no need for an option). If https://aras-p.info/texts/CompactNormalStorage.html is still accurate on modern hardware, then spheremap should result in better quality and less ALU usage. |
ec7a18d
to
432a017
Compare
432a017
to
4f3d483
Compare
Got it, yup I got a chance to do the implementation and it was cheaper to run while providing better visual quality :) definitely worth it! I've updated the PR and the description for this change, should I also go back and update the original Proposal as well? Also you mentioned that 4.0 doesn't need this change implemented - does it do mesh import compression differently (or have an even better technique?) |
Feel free to update the proposal, but don't feel obliged to do it if you don't have time 🙂 |
In 4.0 instead of choosing between compressed and uncompressed, it just uses uses a single format that should be close to optimal. See commit here: 70f5972#diff-dc12bc981c6b42b8579800ffb22b4f82e4f1153acb40db6dfc98a6655bcad0b8 See the datatypes here: godot/servers/rendering_server.h Lines 216 to 229 in a7fb5f8
Not sure if your PR is actually relevant or not for 4.0. If you can further compress these values, then a PR would be welcome. Regarding the shader permutations, we have been avoiding making the keys 64 bits and instead of been using I am not opposed to adding a 64 bit key, but I know that it has been rejected in the past. |
For 4.0 - 32 bits are used to store normals and tangent each, but with this technique only 16 bits are needed (along with some decompression in the vertex shader), so memory wise it would be saving 2 bytes for each normal/tangent For shader permutations - since this change requires that the actual inputs types of the vertex attributes change (from vec3/vec4 to vec2) wouldn't this necessitate additional permutations? I guess I'm also not sure what the difference between a |
8f90994
to
aa5fb73
Compare
Implement Octahedral Compression for normal/tangent vectors *Oct32 for uncompressed vectors *Oct16 for compressed vectors Reduces vertex size for each attribute by *Uncompressed: 12 bytes, vec4<float32> -> vec2<unorm16> *Compressed: 2 bytes, vec4<unorm8> -> vec2<unorm8> Binormal sign is encoded in the y coordinate of the encoded tangent Added conversion functions to go from octahedral mapping to cartesian for normal and tangent vectors sprite_3d and soft_body meshes write to their vertex buffer memory directly and need to convert their normals and tangents to the new oct format before writing Created a new mesh flag to specify whether a mesh is using octahedral compression or not Updated documentation to discuss new flag/defaults Created shader flags to specify whether octahedral or cartesian vectors are being used Updated importers to use octahedral representation as the default format for importing meshes Updated ShaderGLES2 to support 64 bit version codes as we hit the limit of the 32-bit integer that was previously used as a bitset to store enabled/disabled flags
aa5fb73
to
d274284
Compare
Thanks for pointing out the |
Thanks! |
This PR causes a regression that locks up my entire desktop environment on Ubuntu with Nvidia. I also tested with the follow-up PR to this PR (#51258) and I still get the same freezing. |
@aaronfranke Have you tried turning on the split stream as suggested in #51258 (comment)? |
@clayjohn Adding
|
@aaronfranke @clayjohn Updated that PR (#51258) that should fix the issues you are reporting here, it was a single line fix in RasterizerSceneGLES3 that didn't make it in this PR (RasterizerSceneGLES2 should be working fine) |
I can confirm that the bug is fixed with both the updated #51258 and the latest 3.x as of writing. Thanks! |
@The-O-King why was this only implemented in 3.4 and not in master (4.x)? |
@nikitalita it will be implemented in master eventually. |
Yup I will be looking into the 4.x implementation probably at the start of the new year :) for now we have been working on 3.x and getting an idea for all the potential problem spots/considerations, that way we will have a bit more context on what we may encounter moving it over to 4.x! |
Bugsquad edit:
3.x
version of #60309.Using spherical coordinates, normal and tangent vectors can be compressed moreefficiently than the current compression technique used
Using sphere mapping techniques described in this link and this linknormal and tangent vectors can be compressed more efficiently than the current
compression technique we have implemented
Using an octahedral mapping technique described in this paper normal and tangent
vectors can be compressed more efficiently than the current compression technique
we have implemented
Rather than each normal/tangent being a vec4 they can be stored as
vec2 saving 2 bytes each
This will help reduce vertex memory bandwidth usage, which is particularly useful
on mobile devices
Needed to update the GLES2 shader class since I hit the limit of the 32 bit mask
being used to set the state of conditionals in the shaders themselves - let me
know if the updated implementation needs changing
Currently, this change would require a reimport of all assets since the underlying
data representation has changed
Also it is more important now to use the "Ensure Correct Normals" flag when
using non-uniform scaling as decompression can result in slightly skewed normals
that will skew further with scaling
Implementation of godotengine/godot-proposals#2395