[spirv] Fix bug of CTBuffer DX memory layout with matrix #3672

jaebaek · 2021-04-07T22:03:26Z

When a CTBuffer contains a matrix and we use FXC memory layout for it i.e.,
-fvk-use-dx-layout, the memory layout of the generated struct is different
from what FXC generates. FXC memory layout rule for matrices or array of
matrices are:

floatMxN means N float vectors and each vector has M elements.

How to calculate size: 16 * (N - 1) + 4 * M bytes
How to calculate offset:
- If the size is greater than or equal to 16 bytes: the offset must be aligned to 16 bytes
- Otherwise (less than 16): it cannot be split into multiple 16 bytes slots.
For example, float2x3 has 16 * (3 - 1) + 4 * 2 = 40 bytes as its size. Since its size 40 bytes is greater than 16 bytes, it must be aligned to 16 bytes.

floatMxN[K] means an array of floatMxN with K elements.

size: (K - 1) * N * 16 + 16 * (N - 1) + 4 * M
offset:
- If K > 1, it must be aligned to 16 bytes
- If K == 1, it is the same with floatMxN.
For example, the size of float3x2 foo[7]; is (7 - 1) * 2 * 16 + 16 * (2 - 1) + 4 * 3 = 220.

The non-trivial case is float1xN which is a matrix with N vectors and each vector has 1 element.
Its size should be 16 * (N - 1) + 4 based on the FXC memory layout rule.
For example, the size of float1x2 must be 20 in bytes, which means we want to put the first float value of
float1x2 at the offset 0 in bytes and the second float value at the offset 16 in bytes.
It means we must not generate it as a SPIR-V vector type because setting it as a SPIR-V vector results in
putting the first at the offset 0 in bytes and the second at the offset 4 in bytes.
In addition, we cannot set it as a SPIR-V matrix type because SPIR-V does not allow a matrix with a single
row and a vector with a single element.
The only available option is to set it as a SPIR-V array with ArrayStride 16.

Since we currently consider float1xN as an OpTypeVector and generate all SPIR-V code based on the assumption.
Changing the type of float1xN to OpTypeArray needs huge engineering costs to handle all the cases.
For example, in many places e.g., addition, subtraction, multiplication, we use OpVectorShuffle for float1xN because we consider it as OpTypeArray.

Our solution is to create two variables for CTBuffer including type1xN with FXC memory layout:

Original: One with correct subtypes and memory layouts i.e., OpTypeArray for type1xN
Clone: One with Private storage class i.e., without physical memory layout
- OpTypeVector for type1xN as the current DXC does.

The Original variable is in charge of getting CTBuffer data from CPU.
We create a module initialization function to copy the Original variable to the Clone variable.
We insert OpFunctionCall for the module initialization function into all entry points.
We use the Clone variable for the CTBuffer in all places.

AppVeyorBot · 2021-04-08T00:21:29Z

✅ Build DirectXShaderCompiler 1.0.4760 completed (commit f9560a9200 by @jaebaek)

ehsannas · 2021-04-08T16:09:03Z

The non-trivial case is float1xN which is a matrix with N vectors and each vector has 1 element.
Its size should be 16 * (N - 1) + 4 based on the FXC memory layout rule.
For example, the size of float1x2 must be 20 in bytes, which means we want to put the first float value of
float1x2 at the offset 0 in bytes and the second float value at the offset 16 in bytes.
It means we must not generate it as a SPIR-V vector type because setting it as a SPIR-V vector results in
putting the first at the offset 0 in bytes and the second at the offset 4 in bytes.
In addition, we cannot set it as a SPIR-V matrix type because SPIR-V does not allow a matrix with a single
row and a vector with a single element.

This is terrifying 😨

tools/clang/lib/SPIRV/LowerTypeVisitor.cpp

AppVeyorBot · 2021-04-15T02:56:22Z

❌ Build DirectXShaderCompiler 1.0.4786 failed (commit 7247ed3234 by @jaebaek)

AppVeyorBot · 2021-04-16T18:18:31Z

❌ Build DirectXShaderCompiler 1.0.4792 failed (commit e2ff96cd17 by @jaebaek)

jaebaek · 2021-04-17T02:18:54Z

I want to update the unit tests but it is almost done.
The basic idea is to make a clone variable for CTBuffer with FXC memory layout rule.

The clone variable has the same type and memory layout with the one the current DXC generates for CTBuffer.
The clone variable will be used as the CTBuffer but it does not follow the correct FXC memory layout.

I create a module initialization function to copy the CTBuffer with the correct memory layout to the clone variable and add OpFunctionCall to the wrapper function of entry points.

Mostly this copy will be optimized out by spirv-opt.
I will update the unit tests only more next Monday.

AppVeyorBot · 2021-04-17T03:14:32Z

❌ Build DirectXShaderCompiler 1.0.1 failed (commit eaeb02ad29 by @jaebaek)

AppVeyorBot · 2021-04-19T14:50:32Z

❌ Build DirectXShaderCompiler 1.0.2 failed (commit 8406085935 by @jaebaek)

AppVeyorBot · 2021-04-19T16:42:03Z

❌ Build DirectXShaderCompiler 1.0.4 failed (commit 2970cadb3d by @jaebaek)

AppVeyorBot · 2021-04-19T17:46:35Z

❌ Build DirectXShaderCompiler 1.0.5 failed (commit 1165681c55 by @jaebaek)

AppVeyorBot · 2021-04-19T19:52:59Z

❌ Build DirectXShaderCompiler 1.0.7 failed (commit 993eabf668 by @jaebaek)

AppVeyorBot · 2021-04-19T21:43:19Z

❌ Build DirectXShaderCompiler 1.0.9 failed (commit 1011c00644 by @jaebaek)

AppVeyorBot · 2021-04-20T13:53:00Z

✅ Build DirectXShaderCompiler 1.0.10 completed (commit 1011c00644 by @jaebaek)

jaebaek · 2021-04-20T14:01:51Z

This will fix #3463

AppVeyorBot · 2021-04-20T14:47:26Z

✅ Build DirectXShaderCompiler 1.0.11 completed (commit 0f23667e57 by @jaebaek)

AppVeyorBot · 2021-04-20T19:13:46Z

✅ Build DirectXShaderCompiler 1.0.15 completed (commit 46f1c5eefc by @jaebaek)

AppVeyorBot · 2021-04-20T22:08:42Z

✅ Build DirectXShaderCompiler 1.0.17 completed (commit e138a15065 by @jaebaek)

AppVeyorBot · 2021-04-20T22:57:55Z

✅ Build DirectXShaderCompiler 1.0.18 completed (commit 7a86a0adc6 by @jaebaek)

tools/clang/unittests/SPIRV/CodeGenSpirvTest.cpp

tools/clang/test/CodeGenSPIRV/vk.layout.cbuffer.fxc.matrix.struct.hlsl

tools/clang/lib/SPIRV/DeclResultIdMapper.h

tools/clang/include/clang/SPIRV/SpirvBuilder.h

jaebaek

Thank you for your code review.

tools/clang/unittests/SPIRV/CodeGenSpirvTest.cpp

tools/clang/test/CodeGenSPIRV/vk.layout.cbuffer.fxc.matrix.struct.hlsl

tools/clang/lib/SPIRV/LowerTypeVisitor.cpp

tools/clang/lib/SPIRV/DeclResultIdMapper.h

tools/clang/include/clang/SPIRV/SpirvBuilder.h

AppVeyorBot · 2021-04-26T18:25:48Z

✅ Build DirectXShaderCompiler 1.0.29 completed (commit ff073e97ea by @jaebaek)

AppVeyorBot · 2021-04-29T16:08:07Z

✅ Build DirectXShaderCompiler 1.0.59 completed (commit 0faab2c3ee by @jaebaek)

…rToClone()

AppVeyorBot · 2021-04-29T18:46:28Z

✅ Build DirectXShaderCompiler 1.0.60 completed (commit 8ff663d06d by @jaebaek)

When the requested constant buffer layout is DX, some type lowering can become complex (Nx1 matrices for ex). To simplify the lowering, the backend "clones" those variables (See microsoft#3672) - on one end, we expose the correct layout - but then, we copy this to a local variable to have a different layout compatible with the operations we usually do on such types. This means the type can sometimes be an HybridStructType, which is not completely lowered, or a normal lowerer SPIR-V type. Both should be allowed, but some codepaths had to be updated to reflect this. Fixes microsoft#6511 Signed-off-by: Nathan Gauër <[email protected]>

jaebaek added the spirv Work related to SPIR-V label Apr 7, 2021

jaebaek requested a review from ehsannas April 7, 2021 22:03

jaebaek self-assigned this Apr 7, 2021

ehsannas reviewed Apr 8, 2021

View reviewed changes

tools/clang/lib/SPIRV/LowerTypeVisitor.cpp Show resolved Hide resolved

jaebaek changed the title ~~[spirv] Fix bug of CTBuffer DX memory layout with matrix~~ [WIP] [spirv] Fix bug of CTBuffer DX memory layout with matrix Apr 13, 2021

jaebaek force-pushed the dx_layout branch from 24b7c33 to 70618bd Compare April 16, 2021 17:18

jaebaek force-pushed the dx_layout branch from 70618bd to 810da09 Compare April 17, 2021 02:14

jaebaek force-pushed the dx_layout branch from 6b4a54a to aac033e Compare April 19, 2021 14:53

jaebaek changed the title ~~[WIP] [spirv] Fix bug of CTBuffer DX memory layout with matrix~~ [spirv] Fix bug of CTBuffer DX memory layout with matrix Apr 19, 2021

jaebaek force-pushed the dx_layout branch from aac033e to 29d95b7 Compare April 19, 2021 16:46

jaebaek added 8 commits April 19, 2021 14:35

Fix cbuffer DX memory layout with matrix bug

a97a5ee

Handle majorness correctly

b9fdb79

clang-format

d642a72

Create a clone variable for CTBuffer with FXC rule including matrix 1xN

185b0da

Call module_init function in all entry points

fd0f629

Skip LowerTypeVisitor if we already visited

d0836e3

Update unit tests

5e9a63a

Update unittests

9ca9d38

jaebaek force-pushed the dx_layout branch from 29d95b7 to 9ca9d38 Compare April 19, 2021 18:41

Fix wrong struct interface type bug

e1a3172

jaebaek force-pushed the dx_layout branch from 6d6e394 to e1a3172 Compare April 20, 2021 13:44

Check offset with various types e.g., half, double

1ac0d6a

jaebaek added 2 commits April 20, 2021 17:05

Update unit test to check 16bytes boundary

368c1f6

Refactoring for better naming

55654b4

ehsannas suggested changes Apr 23, 2021

View reviewed changes

jaebaek added 3 commits April 26, 2021 11:24

Update unit tests

aaa6ae8

Refactoring

1ae4e8b

No method to create instructions specialized for module init

ef9c3fc

jaebaek commented Apr 26, 2021

View reviewed changes

Refactoring: simplify DeclResultIdMapper

d69c8c6

Refactoring: split SpirvBuilder::createCopyInstructionsFromFxcCTBuffe…

1d42760

…rToClone()

ehsannas approved these changes Apr 29, 2021

View reviewed changes

jaebaek merged commit 689ab7d into microsoft:master Apr 29, 2021

jaebaek deleted the dx_layout branch April 29, 2021 21:16

This was referenced Apr 30, 2021

[spirv] do not round size of last row vector for dx layout #2645

Closed

[SPIR-V] Matrices in constant buffers are unnecessarily padded when using DirectX memory layout #3463

Closed

galibzon mentioned this pull request Sep 13, 2021

[SPIR-V] "-fvk-use-dx-layout" alignment error #3945

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[spirv] Fix bug of CTBuffer DX memory layout with matrix #3672

[spirv] Fix bug of CTBuffer DX memory layout with matrix #3672

jaebaek commented Apr 7, 2021 •

edited

Loading

AppVeyorBot commented Apr 8, 2021

ehsannas commented Apr 8, 2021

AppVeyorBot commented Apr 15, 2021

AppVeyorBot commented Apr 16, 2021

jaebaek commented Apr 17, 2021

AppVeyorBot commented Apr 17, 2021

AppVeyorBot commented Apr 19, 2021

AppVeyorBot commented Apr 19, 2021

AppVeyorBot commented Apr 19, 2021

AppVeyorBot commented Apr 19, 2021

AppVeyorBot commented Apr 19, 2021

AppVeyorBot commented Apr 20, 2021

jaebaek commented Apr 20, 2021

AppVeyorBot commented Apr 20, 2021

AppVeyorBot commented Apr 20, 2021

AppVeyorBot commented Apr 20, 2021

AppVeyorBot commented Apr 20, 2021

jaebaek left a comment

AppVeyorBot commented Apr 26, 2021

AppVeyorBot commented Apr 29, 2021

AppVeyorBot commented Apr 29, 2021

[spirv] Fix bug of CTBuffer DX memory layout with matrix #3672

[spirv] Fix bug of CTBuffer DX memory layout with matrix #3672

Conversation

jaebaek commented Apr 7, 2021 • edited Loading

AppVeyorBot commented Apr 8, 2021

ehsannas commented Apr 8, 2021

AppVeyorBot commented Apr 15, 2021

AppVeyorBot commented Apr 16, 2021

jaebaek commented Apr 17, 2021

AppVeyorBot commented Apr 17, 2021

AppVeyorBot commented Apr 19, 2021

AppVeyorBot commented Apr 19, 2021

AppVeyorBot commented Apr 19, 2021

AppVeyorBot commented Apr 19, 2021

AppVeyorBot commented Apr 19, 2021

AppVeyorBot commented Apr 20, 2021

jaebaek commented Apr 20, 2021

AppVeyorBot commented Apr 20, 2021

AppVeyorBot commented Apr 20, 2021

AppVeyorBot commented Apr 20, 2021

AppVeyorBot commented Apr 20, 2021

jaebaek left a comment

Choose a reason for hiding this comment

AppVeyorBot commented Apr 26, 2021

AppVeyorBot commented Apr 29, 2021

AppVeyorBot commented Apr 29, 2021

jaebaek commented Apr 7, 2021 •

edited

Loading