-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64][CodeGen] Fix wrong operand order when creating vcmla intrinsic #65278
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note in the commit message whether this is actually a visible miscompile; at first glance, multiplication is commutative.
Assuming it is a miscompile, please verify that using the intrinsics from clang produces the correct result.
Otherwise LGTM
For the fcmla instruction with the rot value of 90 or 270, the results obtained by exchanging the positions of the last two parameters are different. Here is an example:
The result:
I'm not sure whether I need add this test case in my patch. |
I don't think we need an end-to-end testcase; it should be sufficient to have testcases for clang->llvm IR and llvm IR->asm. So should be fine as-is. |
The test case you have is testing different lanes from the vcmla instrinsics, not just commutativity. The different lanes would be expected to be different. The lane indices would need to be the same to test commutivity:
When you say "Fix wrong operand order" is there a bug here? And if so can you explain where. The same change may need to be applied for ARM too, but I was under the impression that enough testing had been done to catch problems like this (baring maybe some edge cases with Nan's and whatnot) |
I meet a run time error in SPECCPU2006 433 because getting a wrong calculation result. And I can get a right result by adding this patch or close the optimization of complex-deinterleaving. |
8cdfd48
to
2593952
Compare
Is the milc compiled with -Ofast or without fast math? I don't think we've seen the same thing here. From what I can tell this this should just change from one complex multiply to another. It will be different but I'm not sure it's better or worse. For example with a normal complex mul:
would be:
vs, if the operands were the other way:
The brackets show where the fused multiply-accumulates happen. So there will be slightly different rounding between the two versions, but neither match the original exactly. The new order sounds like a more natural order, so I'm not against it. It might just be that milc is a little susceptible to fast math rounding differences. |
…sic (llvm#65278) Co-authored-by: lizhijin <[email protected]>
…sic (llvm#65278) Co-authored-by: lizhijin <[email protected]>
No description provided.