-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT doesn't always fold 'add' instructions into 'lea' which immediately follows it #51599
Comments
Increasing the arguments to addressing is not always a clear win as on Intel 3+ operand addressing has a latency of 3 clocks and 1 port available to execute on vs 2 operand addressing which is 1 clock and has a choice of 2 ports... However... going from |
This is likely another case for forward substitution: #6973. When a chain of computations are expressed via inlines the trees get broken and nothing glues them back together. |
For I imagine that would also be beneficial for inlining heuristics and just for general throughput in heavily optimized code. |
Sure, seems like a good idea. We will need to invest in forward sub soonish as we can't intrinsify our way out of everything. |
Definitely. I was thinking about this from the perspective of:
|
I imagine the forward substitution will be important for scenarios like generic-math, given that
|
Moving to future but with priority 1 to consider for .NET7. |
I prototyped treating most of This resulted in the following for the
Most of the diffs are due to fewer blocks. For example, Most of the diffs come from - lea r10, bword ptr [rcx+16]
vmovdqu xmm0, xmmword ptr [rdx+16]
- vmovdqu xmmword ptr [r10], xmm0
+ vmovdqu xmmword ptr [rcx+16], xmm0 So I think its probably worth investigating this further and looking at applying the same to Notably this also triggers |
Andy indicated (on Discord) I might need to do something like https://github.com/dotnet/coreclr/pull/21217/files to avoid the assert in #10821 |
System.Runtime.CompilerServices.Unsafe is OOB package today. Treating methods in OOBs as intrinsics tends to be problematic since the JIT needs to be forward compatible with future version of the OOB package. (We used to have this setup with #45475 (comment) has some discussion about deprecating System.Runtime.CompilerServices.Unsafe OOB. So this optimization is another ref-count on that. |
Extend ref counting done by local morph so that we can determine single-def single-use locals. Add a phase that runs just after local morph that will attempt to forward single-def single-use local defs to uses when they are in adjacent statements. Fix or work around issues uncovered elsewhere: * `gtFoldExprCompare` might fold "identical" volatile subtrees * `fgGetStubAddrArg` cannot handle complex trees * some simd/hw operations can lose struct handles * some calls cannot handle struct local args Addresses dotnet#6973 and related issues. Still sorting through exactly which ones are fixed, so list below may need revising. Fixes dotnet#48605. Fixes dotnet#51599. Fixes dotnet#55472. Improves some but not all cases in dotnet#12280 and dotnet#62064. Does not fix dotnet#33002, dotnet#47082, or dotnet#63116; these require handling multiple uses or bypassing statements.
Extend ref counting done by local morph so that we can determine single-def single-use locals. Add a phase that runs just after local morph that will attempt to forward single-def single-use local defs to uses when they are in adjacent statements. Fix or work around issues uncovered elsewhere: * `gtFoldExprCompare` might fold "identical" volatile subtrees * `fgGetStubAddrArg` cannot handle complex trees * some simd/hw operations can lose struct handles * some calls cannot handle struct local args * morph expects args not to interfere * fix arm; don't forward sub no return calls * update debuginfo test (we may want to revisit this) * handle subbing past normalize on store assignment * clean up nullcheck of new helper Addresses #6973 and related issues. Still sorting through exactly which ones are fixed, so list below may need revising. Fixes #48605. Fixes #51599. Fixes #55472. Improves some but not all cases in #12280 and #62064. Does not fix #33002, #47082, or #63116; these require handling multiple uses or bypassing statements.
Repro code:
In this codegen, both the add and the mov instruction can be collapsed into a single
mov eax, dword ptr [rcx + rdx]
instruction, where the +8 and the -8 cancel each other out.We see this sometimes with very low-level object manipulation in corelib. For example, the codegen listed at the top of #51548 (working on an
Array.Clear
optimization) has this pattern. We even have a specialized intrinsic as a temporary workaround in extremely hot code paths. (This intrinsic is used inMemory<T>.get_Span
, for instance.)It'd be great if JIT could somehow collapse these sequences where they occur. That would ideally result in better codegen and would allow us to remove this intrinsic from the code base.
The text was updated successfully, but these errors were encountered: