Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[spirv] Fix bug of CTBuffer DX memory layout with matrix #3672

Merged
merged 17 commits into from
Apr 29, 2021

Conversation

jaebaek
Copy link
Collaborator

@jaebaek jaebaek commented Apr 7, 2021

When a CTBuffer contains a matrix and we use FXC memory layout for it i.e.,
-fvk-use-dx-layout, the memory layout of the generated struct is different
from what FXC generates. FXC memory layout rule for matrices or array of
matrices are:

  1. floatMxN means N float vectors and each vector has M elements.
  • How to calculate size: 16 * (N - 1) + 4 * M bytes
  • How to calculate offset:
    • If the size is greater than or equal to 16 bytes: the offset must be aligned to 16 bytes
    • Otherwise (less than 16): it cannot be split into multiple 16 bytes slots.
  • For example, float2x3 has 16 * (3 - 1) + 4 * 2 = 40 bytes as its size. Since its size 40 bytes is greater than 16 bytes, it must be aligned to 16 bytes.
  1. floatMxN[K] means an array of floatMxN with K elements.
  • size: (K - 1) * N * 16 + 16 * (N - 1) + 4 * M
  • offset:
    • If K > 1, it must be aligned to 16 bytes
    • If K == 1, it is the same with floatMxN.
  • For example, the size of float3x2 foo[7]; is (7 - 1) * 2 * 16 + 16 * (2 - 1) + 4 * 3 = 220.

The non-trivial case is float1xN which is a matrix with N vectors and each vector has 1 element.
Its size should be 16 * (N - 1) + 4 based on the FXC memory layout rule.
For example, the size of float1x2 must be 20 in bytes, which means we want to put the first float value of
float1x2 at the offset 0 in bytes and the second float value at the offset 16 in bytes.
It means we must not generate it as a SPIR-V vector type because setting it as a SPIR-V vector results in
putting the first at the offset 0 in bytes and the second at the offset 4 in bytes.
In addition, we cannot set it as a SPIR-V matrix type because SPIR-V does not allow a matrix with a single
row and a vector with a single element.
The only available option is to set it as a SPIR-V array with ArrayStride 16.

Since we currently consider float1xN as an OpTypeVector and generate all SPIR-V code based on the assumption.
Changing the type of float1xN to OpTypeArray needs huge engineering costs to handle all the cases.
For example, in many places e.g., addition, subtraction, multiplication, we use OpVectorShuffle for float1xN because we consider it as OpTypeArray.

Our solution is to create two variables for CTBuffer including type1xN with FXC memory layout:

  1. Original: One with correct subtypes and memory layouts i.e., OpTypeArray for type1xN
  2. Clone: One with Private storage class i.e., without physical memory layout
    • OpTypeVector for type1xN as the current DXC does.

The Original variable is in charge of getting CTBuffer data from CPU.
We create a module initialization function to copy the Original variable to the Clone variable.
We insert OpFunctionCall for the module initialization function into all entry points.
We use the Clone variable for the CTBuffer in all places.

@jaebaek jaebaek added the spirv Work related to SPIR-V label Apr 7, 2021
@jaebaek jaebaek requested a review from ehsannas April 7, 2021 22:03
@jaebaek jaebaek self-assigned this Apr 7, 2021
@AppVeyorBot
Copy link

@ehsannas
Copy link
Contributor

ehsannas commented Apr 8, 2021

The non-trivial case is float1xN which is a matrix with N vectors and each vector has 1 element.
Its size should be 16 * (N - 1) + 4 based on the FXC memory layout rule.
For example, the size of float1x2 must be 20 in bytes, which means we want to put the first float value of
float1x2 at the offset 0 in bytes and the second float value at the offset 16 in bytes.
It means we must not generate it as a SPIR-V vector type because setting it as a SPIR-V vector results in
putting the first at the offset 0 in bytes and the second at the offset 4 in bytes.
In addition, we cannot set it as a SPIR-V matrix type because SPIR-V does not allow a matrix with a single
row and a vector with a single element.

This is terrifying 😨

@jaebaek jaebaek changed the title [spirv] Fix bug of CTBuffer DX memory layout with matrix [WIP] [spirv] Fix bug of CTBuffer DX memory layout with matrix Apr 13, 2021
@AppVeyorBot
Copy link

@AppVeyorBot
Copy link

@jaebaek
Copy link
Collaborator Author

jaebaek commented Apr 17, 2021

I want to update the unit tests but it is almost done.
The basic idea is to make a clone variable for CTBuffer with FXC memory layout rule.

The clone variable has the same type and memory layout with the one the current DXC generates for CTBuffer.
The clone variable will be used as the CTBuffer but it does not follow the correct FXC memory layout.

I create a module initialization function to copy the CTBuffer with the correct memory layout to the clone variable and add OpFunctionCall to the wrapper function of entry points.

Mostly this copy will be optimized out by spirv-opt.
I will update the unit tests only more next Monday.

@AppVeyorBot
Copy link

@AppVeyorBot
Copy link

@jaebaek jaebaek changed the title [WIP] [spirv] Fix bug of CTBuffer DX memory layout with matrix [spirv] Fix bug of CTBuffer DX memory layout with matrix Apr 19, 2021
@AppVeyorBot
Copy link

@AppVeyorBot
Copy link

@AppVeyorBot
Copy link

@AppVeyorBot
Copy link

@AppVeyorBot
Copy link

@jaebaek
Copy link
Collaborator Author

jaebaek commented Apr 20, 2021

This will fix #3463

@AppVeyorBot
Copy link

@AppVeyorBot
Copy link

@AppVeyorBot
Copy link

@AppVeyorBot
Copy link

Copy link
Collaborator Author

@jaebaek jaebaek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your code review.

@AppVeyorBot
Copy link

@AppVeyorBot
Copy link

@AppVeyorBot
Copy link

@jaebaek jaebaek merged commit 689ab7d into microsoft:master Apr 29, 2021
@jaebaek jaebaek deleted the dx_layout branch April 29, 2021 21:16
Keenuts added a commit to Keenuts/DirectXShaderCompiler that referenced this pull request Apr 15, 2024
When the requested constant buffer layout is DX, some type lowering
can become complex (Nx1 matrices for ex). To simplify the lowering,
the backend "clones" those variables (See microsoft#3672)
 - on one end, we expose the correct layout
 - but then, we copy this to a local variable to have a different
   layout compatible with the operations we usually do on such types.

This means the type can sometimes be an HybridStructType, which is
not completely lowered, or a normal lowerer SPIR-V type.
Both should be allowed, but some codepaths had to be updated to reflect
this.

Fixes microsoft#6511

Signed-off-by: Nathan Gauër <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spirv Work related to SPIR-V
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants