Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.x] Implement Octahedral Map Normal/Tangent Attribute Compression #46800

Merged
merged 1 commit into from
Jul 30, 2021

Conversation

The-O-King
Copy link
Contributor

@The-O-King The-O-King commented Mar 8, 2021

Bugsquad edit: 3.x version of #60309.

Using spherical coordinates, normal and tangent vectors can be compressed more
efficiently than the current compression technique used

Using sphere mapping techniques described in this link and this link
normal and tangent vectors can be compressed more efficiently than the current
compression technique we have implemented

Using an octahedral mapping technique described in this paper normal and tangent
vectors can be compressed more efficiently than the current compression technique
we have implemented
Rather than each normal/tangent being a vec4 they can be stored as
vec2 saving 2 bytes each

This will help reduce vertex memory bandwidth usage, which is particularly useful
on mobile devices

Needed to update the GLES2 shader class since I hit the limit of the 32 bit mask
being used to set the state of conditionals in the shaders themselves - let me
know if the updated implementation needs changing

Currently, this change would require a reimport of all assets since the underlying
data representation has changed
Also it is more important now to use the "Ensure Correct Normals" flag when
using non-uniform scaling as decompression can result in slightly skewed normals
that will skew further with scaling

Implementation of godotengine/godot-proposals#2395

@The-O-King The-O-King requested review from a team as code owners March 8, 2021 18:28
@Calinou Calinou added this to the 3.2 milestone Mar 8, 2021
@The-O-King The-O-King force-pushed the normal_compression branch 2 times, most recently from d381791 to 034e36f Compare March 8, 2021 19:24
Base automatically changed from 3.2 to 3.x March 16, 2021 11:11
@aaronfranke aaronfranke modified the milestones: 3.2, 3.3 Mar 16, 2021
Copy link
Contributor

@BastiaanOlij BastiaanOlij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a neat way to add this in as a feature. My only question is if it would be more efficient to just leave one axis out and doing a z = sqrt(1.0 - x*x - y*y) approach. Don't know which is faster on modern GPUs, I can imagine cos/sin lookups are faster then sqrt.

Also this probably needs an implementation on master as well.

I feel @reduz should provide an opinion on this as well.

@BastiaanOlij BastiaanOlij requested a review from reduz March 18, 2021 01:43
@clayjohn
Copy link
Member

Two things:

First to mirror Bastiaan above, I'm not sure spherical coordinates are the best approach. The spheremap transform described here has better quality and speed. And the simple reconstruction has better speed as well. Spherical coordinates are going to use a lot of cycles. Have you profiled how this PR affects performance, in theory there should be a nice gain by reducing bandwidth pressure, but for some render tasks bandwidth isn't the bottleneck.

Second, this PR is only necessary for 3.x so don't worry about 4.0. in 4.0 reduz removed the "compress" option.

@The-O-King
Copy link
Contributor Author

The-O-King commented Mar 18, 2021

I thought about the leave-the-z-element-out approach but I don't think that works for raw mesh data because that approach needs to assume the sign of the z element, which may work for G-Buffer deferred rendering (you can assume your vertices are facing towards the screen as they would be in view space) but using it for general mesh storage requires us to store the sign somewhere so it's more tricky

I did do some profiling and I did not see any performance degradation using spherical coordinates (although there was more ALU usage), only reductions in vertex memory bandwidth usage and reductions in vertex fetch stalls.
The spheremap transformation is quite interesting though, and doesn't seem difficult at all to implement! I'll try to implement it as well and see how it looks/performs (seems more promising than spherical not gonna lie!) let me know if you all would like that as an alternative

There is also a third option I have used before - QTangents, which store the entire tangent space, but might take a little more effort to get working within the current setup that Godot has which treats the normal/tangent as separate vectors entirely (although not impossible)

Let me know what you all think :)

@clayjohn
Copy link
Member

@The-O-King Subject to your testing, I think it makes sense to just use the spheremap transformation rather than use spherical coordinates (no need for an option). If https://aras-p.info/texts/CompactNormalStorage.html is still accurate on modern hardware, then spheremap should result in better quality and less ALU usage.

@The-O-King The-O-King force-pushed the normal_compression branch 2 times, most recently from ec7a18d to 432a017 Compare March 22, 2021 19:36
@The-O-King The-O-King changed the title [3.2] Implement Spherical Coordinate Normal/Tangent Attribute Compression [3.2] Implement Sphere Map Normal/Tangent Attribute Compression Mar 22, 2021
@The-O-King
Copy link
Contributor Author

Got it, yup I got a chance to do the implementation and it was cheaper to run while providing better visual quality :) definitely worth it!

I've updated the PR and the description for this change, should I also go back and update the original Proposal as well?

Also you mentioned that 4.0 doesn't need this change implemented - does it do mesh import compression differently (or have an even better technique?)

@Calinou
Copy link
Member

Calinou commented Mar 22, 2021

I've updated the PR and the description for this change, should I also go back and update the original Proposal as well?

Feel free to update the proposal, but don't feel obliged to do it if you don't have time 🙂

@clayjohn
Copy link
Member

In 4.0 instead of choosing between compressed and uncompressed, it just uses uses a single format that should be close to optimal. See commit here: 70f5972#diff-dc12bc981c6b42b8579800ffb22b4f82e4f1153acb40db6dfc98a6655bcad0b8

See the datatypes here:

ARRAY_VERTEX = 0, // RG32F or RGB32F (depending on 2D bit)
ARRAY_NORMAL = 1, // A2B10G10R10
ARRAY_TANGENT = 2, // A2B10G10R10, A flips sign of binormal
ARRAY_COLOR = 3, // RGBA16F
ARRAY_TEX_UV = 4, // RG32F
ARRAY_TEX_UV2 = 5, // RG32F
ARRAY_CUSTOM0 = 6, // depends on ArrayCustomFormat
ARRAY_CUSTOM1 = 7,
ARRAY_CUSTOM2 = 8,
ARRAY_CUSTOM3 = 9,
ARRAY_BONES = 10, // RGBA16UI (x2 if 8 weights)
ARRAY_WEIGHTS = 11, // RGBA16UNORM (x2 if 8 weights)
ARRAY_INDEX = 12, // 16 or 32 bits depending on length > 0xFFFF
ARRAY_MAX = 13

Not sure if your PR is actually relevant or not for 4.0. If you can further compress these values, then a PR would be welcome.

Regarding the shader permutations, we have been avoiding making the keys 64 bits and instead of been using #if defined(DEF_NAME) instead. This results in fewer permutations. We discussed this in adding #31062

I am not opposed to adding a 64 bit key, but I know that it has been rejected in the past.

@The-O-King
Copy link
Contributor Author

For 4.0 - 32 bits are used to store normals and tangent each, but with this technique only 16 bits are needed (along with some decompression in the vertex shader), so memory wise it would be saving 2 bytes for each normal/tangent
I've seen A2B10G10R10 be used pretty commonly for normals/tangents definitely, but I'm not super sure how big a quality difference it and Sphere Mapping would have, that could be something researched though if that savings seems worth it!

For shader permutations - since this change requires that the actual inputs types of the vertex attributes change (from vec3/vec4 to vec2) wouldn't this necessitate additional permutations? I guess I'm also not sure what the difference between a #ifdef DEF_NAME and #if defined(DEF_NAME) is in terms of reducing permutations

@akien-mga akien-mga changed the title [3.2] Implement Sphere Map Normal/Tangent Attribute Compression [3.3] Implement Sphere Map Normal/Tangent Attribute Compression Mar 22, 2021
@akien-mga akien-mga modified the milestones: 3.3, 3.4 Mar 22, 2021
@akien-mga akien-mga changed the title [3.3] Implement Sphere Map Normal/Tangent Attribute Compression [3.x] Implement Sphere Map Normal/Tangent Attribute Compression Mar 22, 2021
servers/visual_server.cpp Outdated Show resolved Hide resolved
doc/classes/Mesh.xml Outdated Show resolved Hide resolved
@The-O-King The-O-King force-pushed the normal_compression branch from 8f90994 to aa5fb73 Compare July 30, 2021 14:25
Implement Octahedral Compression for normal/tangent vectors
*Oct32 for uncompressed vectors
*Oct16 for compressed vectors

Reduces vertex size for each attribute by
*Uncompressed: 12 bytes, vec4<float32> -> vec2<unorm16>
*Compressed: 2 bytes, vec4<unorm8> -> vec2<unorm8>

Binormal sign is encoded in the y coordinate of the encoded tangent

Added conversion functions to go from octahedral mapping to cartesian
for normal and tangent vectors

sprite_3d and soft_body meshes write to their vertex buffer memory
directly and need to convert their normals and tangents to the new oct
format before writing

Created a new mesh flag to specify whether a mesh is using octahedral
compression or not
Updated documentation to discuss new flag/defaults

Created shader flags to specify whether octahedral or cartesian vectors
are being used

Updated importers to use octahedral representation as the default format
for importing meshes

Updated ShaderGLES2 to support 64 bit version codes as we hit the limit
of the 32-bit integer that was previously used as a bitset to store
enabled/disabled flags
@The-O-King The-O-King force-pushed the normal_compression branch from aa5fb73 to d274284 Compare July 30, 2021 14:29
@The-O-King
Copy link
Contributor Author

Thanks for pointing out the _bind_method() @akien-mga! Hopefully that takes care of the documentation

@akien-mga akien-mga merged commit 9735f28 into godotengine:3.x Jul 30, 2021
@akien-mga
Copy link
Member

Thanks!

@aaronfranke
Copy link
Member

aaronfranke commented Aug 5, 2021

This PR causes a regression that locks up my entire desktop environment on Ubuntu with Nvidia. I also tested with the follow-up PR to this PR (#51258) and I still get the same freezing.

@clayjohn
Copy link
Member

clayjohn commented Aug 5, 2021

@aaronfranke Have you tried turning on the split stream as suggested in #51258 (comment)?

@aaronfranke
Copy link
Member

aaronfranke commented Aug 5, 2021

@clayjohn Adding mesh_storage/split_stream=true to project.godot and opening the project with #51258 did not help much, my system still freezes and locks up. However, it did give me this backtrace:

Dumping the backtrace. Please include this when reporting the bug on https://github.com/godotengine/godot/issues
[1] /lib/x86_64-linux-gnu/libc.so.6(+0x46210) [0x7f10cbf17210] (??:0)
[2] /lib/x86_64-linux-gnu/libnvidia-glcore.so.460.91.03(+0xeeed71) [0x7f10c9444d71] (??:0)
[3] /lib/x86_64-linux-gnu/libnvidia-glcore.so.460.91.03(+0xeef12c) [0x7f10c944512c] (??:0)
[4] /lib/x86_64-linux-gnu/libnvidia-glcore.so.460.91.03(+0xa2fcc6) [0x7f10c8f85cc6] (??:0)
[5] /lib/x86_64-linux-gnu/libnvidia-glcore.so.460.91.03(+0xa2e5fb) [0x7f10c8f845fb] (??:0)
[6] /lib/x86_64-linux-gnu/libnvidia-glcore.so.460.91.03(+0xa377ef) [0x7f10c8f8d7ef] (??:0)
[7] RasterizerSceneGLES3::_render_list(RasterizerSceneGLES3::RenderList::Element**, int, Transform const&, CameraMatrix const&, RasterizerStorageGLES3::Sky*, bool, bool, bool, bool, bool) (/home/aaronfranke/workspace/godot-3/drivers/gles3/rasterizer_scene_gles3.cpp:1894)
[8] RasterizerSceneGLES3::render_scene(Transform const&, CameraMatrix const&, int, bool, RasterizerScene::InstanceBase**, int, RID*, int, RID*, int, RID, RID, RID, RID, int) (/home/aaronfranke/workspace/godot-3/drivers/gles3/rasterizer_scene_gles3.cpp:4028)
[9] VisualServerScene::_render_scene(Transform, CameraMatrix const&, int, bool, RID, RID, RID, RID, int) (/home/aaronfranke/workspace/godot-3/servers/visual/visual_server_scene.cpp:2738)
[10] VisualServerScene::render_camera(RID, RID, Vector2, RID) (/home/aaronfranke/workspace/godot-3/servers/visual/visual_server_scene.cpp:2317 (discriminator 2))
[11] VisualServerViewport::_draw_3d(VisualServerViewport::Viewport*, ARVRInterface::Eyes) (/home/aaronfranke/workspace/godot-3/servers/visual/visual_server_viewport.cpp:68)
[12] VisualServerViewport::_draw_viewport(VisualServerViewport::Viewport*, ARVRInterface::Eyes) (/home/aaronfranke/workspace/godot-3/servers/visual/visual_server_viewport.cpp:109)
[13] VisualServerViewport::draw_viewports() (/home/aaronfranke/workspace/godot-3/servers/visual/visual_server_viewport.cpp:338)
[14] VisualServerRaster::draw(bool, double) (/home/aaronfranke/workspace/godot-3/servers/visual/visual_server_raster.cpp:108 (discriminator 2))
[15] VisualServerWrapMT::draw(bool, double) (/home/aaronfranke/workspace/godot-3/servers/visual/visual_server_wrap_mt.cpp:92)
[16] Main::iteration() (/home/aaronfranke/workspace/godot-3/main/main.cpp:2166)
[17] OS_X11::run() (/home/aaronfranke/workspace/godot-3/platform/x11/os_x11.cpp:3638)
[18] /home/aaronfranke/workspace/godot-3/bin/godot.x11.tools.64(main+0x125) [0x1777a9b] (/home/aaronfranke/workspace/godot-3/platform/x11/godot_x11.cpp:57)
[19] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f10cbef80b3] (??:0)
[20] /home/aaronfranke/workspace/godot-3/bin/godot.x11.tools.64(_start+0x2e) [0x17778be] (??:?)
-- END OF BACKTRACE --

@The-O-King
Copy link
Contributor Author

@aaronfranke @clayjohn Updated that PR (#51258) that should fix the issues you are reporting here, it was a single line fix in RasterizerSceneGLES3 that didn't make it in this PR (RasterizerSceneGLES2 should be working fine)

@aaronfranke
Copy link
Member

I can confirm that the bug is fixed with both the updated #51258 and the latest 3.x as of writing. Thanks!

@nikitalita
Copy link
Contributor

@The-O-King why was this only implemented in 3.4 and not in master (4.x)?

@clayjohn
Copy link
Member

@nikitalita it will be implemented in master eventually.

@The-O-King
Copy link
Contributor Author

Yup I will be looking into the 4.x implementation probably at the start of the new year :) for now we have been working on 3.x and getting an idea for all the potential problem spots/considerations, that way we will have a bit more context on what we may encounter moving it over to 4.x!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants