-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can "Math.DivRem" do division operation only once? #4155
Comments
Math.DivRem is not in the .NET Core public surface, but I agree that it would be nice for the code generators to recognize the pattern and perform the division just one. |
cc: @KrzysztofCwalina, as IIRC, Krzysztof bumped up against this as a measurable cost when doing some recent fomatting/parsing experimentation. |
This would not be a very difficult pattern to recognize: a div and a rem with incoming value numbers that match, though it doesn't "fit" with any existing (or other common) optimization patterns. The thing that would be challenging would be to deal with the resulting IR node that would produce two values in registers. Although that would not be all that difficult, it would require more pervasive changes. |
👍 In short term, couldn't Long term though, adding an optimization pattern to recognize |
@tannergooding - you would think that it would be simpler, but actually it requires the messiest part of the work (handling the node in the IR that produces two values), while only relieving the JIT of what is probably the more straightforward part of the implementation. It might actually be easier on x86 (32 bit) where we already have to deal with longs that occupy multiple registers. |
@CarolEidt, I'm actually wondering why this would require more pervasive changes? I would think that instructions like I've not looked at the code, but I would guess that various instructions do indicate what registers end up being input/output by the instruction and that the changes here would be to allow the node to actually carry the second value (which could probably be done via inheritance and type checking). I actually think this is in a similar vein to having an intrinsic that takes 3 arguments where the GenTreeOp currently only allows 2 arguments. |
@CarolEidt |
@tannergooding @CarolEidt |
No, but .NET 5 brought about some register allocator changes that should (hopefully) make it simpler to support now. |
Before this is addressed in .NET 5, any remedies we could do ourselves. e.g. is it possible to roll our own DivRem method to achieve doing division only once? |
@tannergooding |
The new API surface is setup in a way that would enable that optimization on x86/x64. I don't believe there is any instruction to compute quotient and remainder simultaneously on ARM32/ARM64. Whether or not it gets enabled likely depends on prioritization and availability from the dev's familiar enough with that area of the codebase. |
Seems like it's doing a single division now. Still not using the x86 optimization though. |
I wonder why this issue so difficult to fix? .Net 7 is coming soon, when will this issue be fixed? @tannergooding |
The issue is complex because you have to take an instruction with very explicit register usage and then map that back to a higher level concept. In particular, This then needs to be mapped, at a higher level, back to In the case of the first, the In the case of the latter, what you have is a struct containing two fields. On some platforms this will be "one register" (often In each of these cases, the JIT has some complexities it has to work around and deal with such as eliding the pointers/indirections and ensuring the data can be consumed directly. There is also the note that this is an x86/x64 specific problem. Other platforms just have "div" and if you also need the remainder, you compute it exactly as x86/x64 is already in Now, the support was added to the JIT to better support instructions that "return" multiple registers. However, it is still quite a complex area due to all the scenarios, edge cases, and considerations listed above. We'll get it eventually, but its not the highest priority consideration since its likely not the bottleneck in most applications and when we do get to it, we have to ensure that codegen is good and not worse than the "fallback" we're generating today. |
For an update, in 8.0, |
On non-xarch platforms, this is the only way to compute the remainder as well, notably. There is no single instruction that computes div+rem simultaneously. |
Yes. So the main complaint that it performs two divisions is no longer valid. |
And the overall history of the issue, which needs to be taken into account, covers that such a change went in. This issue then still exists to track the underlying spirit of the bug which is that its doing more work than necessary on some platforms. |
@tannergooding |
it is indeed significantly faster
some comments above answer this question in great detail, e.g.: #4155 (comment) |
On x86, a division can take 15-30 clock cycles (sometimes over 40) and can only start one ever three or four. Multiplication, on the other hand, takes about eight and can start a new one every cycle. |
For x86/x64, yes. It isn't possible on other platforms as they don't provide such support. However, even for x86/x64 the benefit there is really using 1 instruction, it isn't actually much faster (than doing Modern CPUs (almost any in the past 12-14 years) account for these things and often have special
Notably this very much depends on microarchitecture, whether you're doing signed or unsigned, and it has massively improved in even more recent years
Your multiplication info is also quite a bit off, both 32-bit and 64-bit have been |
Currently it does division twice, but most CPUs support to get the two values by doing division only once.
Besides,this method should support uint and ulong too.
category:cq
theme:div-mod-rem
skill-level:expert
cost:large
impact:medium
The text was updated successfully, but these errors were encountered: