-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mono] [llvm-aot] Fixed storing Vector3 into memory #111000
Conversation
/azp run runtime-llvm |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-llvm, runtime-extra-platforms |
Azure Pipelines successfully started running 2 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I put down a few comments out of curiosity so feel free to merge.
btw. nice PR number 111000
const int mask_values [] = { 0, 1, 2 }; | ||
|
||
LLVMValueRef truncatedVec3 = LLVMBuildShuffleVector ( | ||
builder, | ||
lhs, | ||
LLVMGetUndef (t), | ||
create_const_vector_i32 (mask_values, 3), | ||
"truncated_vec3" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this be replaced by something like?
LLVMValueRef truncatedVec3 = LLVMBuildTrunc (builder, lhs, LLVMVectorType (LLVMFloatType (), 3), "truncated_vec3);
However, I haven't tried it locally so I'm unsure whether it would make the same and be more efficient or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the trunc instruction does something different. Looking at the LLVM reference it seems to be casting each element of a vector to a different type - not casting a vector and dropping its last element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I thought it works the same as on primitives. Thank you for the explanation.
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
/ba-g failures unrelated |
/backport to release/9.0-staging |
Started backporting to release/9.0-staging: https://github.com/dotnet/runtime/actions/runs/12600960089 |
"truncated_vec3" | ||
); | ||
|
||
mono_llvm_build_aligned_store (builder, truncatedVec3, dest, FALSE, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the alignment being set to 1
? A Vector3
is composed of 3x float
fields and so is presumed to be 4 byte aligned unless the code is explicitly using something like Unsafe.ReadUnaligned
or Unsafe.WriteUnaligned
(which use the IL unaligned.
prefix).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change reuses existing logic. This code path was previously taken for all VectorX
writes so I kept it consistent. I agree that this might be incorrect. We should investigate this separately.
if (mono_class_value_size (ins->klass, NULL) == 12) { | ||
const int mask_values [] = { 0, 1, 2 }; | ||
|
||
LLVMValueRef truncatedVec3 = LLVMBuildShuffleVector ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using ShuffleVector
here?
In other places, we instead do this via doing LLVMBuildExtractElement
+ LLVMBuildInsertValue
(or Store) 3x. This alternative correctly represents the primitive steps and allows LLVM to optimize it as best fit for the underlying architecture.
Fixes #110820
Vector3 is sometimes handled as Vector4 (Vector3 + 0) for perf purposes. On mono llvm aot it was incorrectly stored into memory with the trailing 0. This change fixes that behaviour.