-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RyuJIT] Add "rorx" instruction (BMI2) and emit it instead of "rol" when possible #41772
Conversation
@@ -594,6 +594,7 @@ INST3(blsr, "blsr", IUM_WR, BAD_CODE, BAD_CODE, | |||
INST3(bextr, "bextr", IUM_WR, BAD_CODE, BAD_CODE, SSE38(0xF7), INS_Flags_IsDstDstSrcAVXInstruction) // Bit Field Extract | |||
|
|||
// BMI2 | |||
INST3(rorx, "rorx", IUM_WR, BAD_CODE, BAD_CODE, SSE3A(0xF0), INS_FLAGS_None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to double-check that neither the IsDstDstSrcAVXInstruction
nor IsDstSrcSrcAVXInstruction
flags apply. The instruction looks to be a dst, src, imm
operand in this case and so it shouldn't be flagged as RMW and should go down the right encoding paths in emit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tried both of these flags and observed SEH exceptions on start 😞 with INS_FLAGS_None
works like a charm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks!
Are there any pmi diffs in the framework assemblies from this? |
here is
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, modulo the question about handling GT_ROR
would it make sense to only emit rorx for typeWidth==64? |
|
Thanks for the llvm-mca ref! :)
https://godbolt.org/z/5hnWK1 - gcc generates slightly optimized result than clang, but both use mov+shr. Perhaps the cases where it regresses is when code is shifting Int64 to fit into Int32 variable. |
@am11 Both gcc and clang emit the same codegen for it with rorx: https://godbolt.org/z/qz4vxo |
if add a condition to emit
not sure we need to bother because I still can see perf win on String.GetHashCode: [Benchmark]
public int HelloWorldString1() => "Привет Мир!".GetHashCode();
[Benchmark]
public int HelloWorldString2() => "Hello World!".GetHashCode();
[Benchmark]
public int LongString() => "Lorem ipsum .... %long string% ....".GetHashCode();
|
I think we want to use |
Changed the optimization to be a pure size improvement: only for 64bit integers and if targetReg != operandReg, e.g. public static class Tests
{
static uint Test1(uint x) => BitOperations.RotateRight(x, 20);
static uint Test2(uint x) => BitOperations.RotateRight(x * 100, 20); // x*100 copies x to rax
static ulong Test3(ulong x) => BitOperations.RotateRight(x, 20);
static ulong Test4(ulong x) => BitOperations.RotateRight(x * 100, 20); // x*100 copies x to rax
} it's emitted only for |
@tannergooding - there have been some changes since you last reviewed, though all, I believe, are in line with your comments. Any objection to merging this? |
@CarolEidt, only thing I noticed is that there isn't an LSRA change that marks the case where we'd use Otherwise I think it looks fine and I don't have an issue merging it. |
Yes, and that will be the case for any post-register-allocation peephole optimizations we add. |
Emit BMI2's
rorx
instead ofrol
if argument is a constant value. Fixes #32290Old codegen:
New codegen:
Slightly improves
String.GetHashCode
(noticeable on small strings) sinceMarvin
usesRotateLeft
under the hood.Not sure perfScore is correct for
rorx
./cc @dotnet/jit-contrib @tannergooding