-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: x*x*x*x -> (x*x)*(x*x) transformation #65141
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Thanks, I love the fact it's now possible to compare C++ and C# back to back on the same godbolt link! But we need to fix the ordering 🙂 As for the request - yeah, this optimization is useful when you need to compute Power function |
@EgorBo Is there a reason why there is no |
technically it removes just one imul which is not super expensive 🤷♂️ (Lat=3, Rec.TP=1 on Zen3)
Feel free to file an API proposal, it seems like developers (and even us) sometimes use double-one and cast back to int |
While checking the above stuff, found another one. Why there is a huge codegen difference in this. Although in the end, both are doing the same stuff. |
it's 32bit. 64bit integers are terrible for 32bit targets. Either switch to x64 on sharplab or return back to godbolt 🙂 |
ahh, switched to x64 bit, somehow I missed that one. Found an ordering difference, not sure will that makes a diff. |
@EgorBo it removed the As you say, this isn't so significant for a one off call, but if an app were doing this many times it would be a good perf improvement. |
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsAs usual, was playing with the JIT ASM compiler for C# and found an interesting optimization that can be done for codegen. Demo link -> https://godbolt.org/z/1v9sMz7Y5 (contains both C# and C++ compiler example) As you can see above, in C#, we always move the parameter to the If you look at the This is applicable for multiple even multiplications (2^n). Just update the code with multiple I know it's not a groundbreaking optimization idea, but still one lesser assembly instruction.
|
Simply put, because it isn't useful in practice. -- There is a related discussion here dotnet/csharplang#2585 Integers have a very narrow range comparatively and the number of valid inputs are fairly exact and limited. Most inputs will actually throw or are better handled a different way. Ignoring Now that might sound like a lot, but lets actually break it down:
This means that there is a very very small window of valid integer values that can be represented as powers in the first place; most of which (~45,000 of them; which is ~93.7%) are only valid as a power of two (e.g. These values are also very specific and represent a very small portion of the overall 2.14 billion positive integers you can otherwise have. Their usage outside |
It is not worthwhile, in my opinion, to expose a |
There is a different message here perhaps -- I have worried in the past that the jit's tree costing/balancing is too oriented towards reducing register consumption. Tree height reduction (see for example) is often useful if we think there are enough registers around to support a more aggressive computation shape. |
Let x1 = 0x2000_0000, x2 = 2, x3 = 4, x4 = 0.
For the checked scope case, there's not only a difference in whether or not an exception was thrown, but whether or not the x4 expression should have evaluated. Though it looks like the issue is really only about computing x^4 (so x2=x1, x3=x1, x4=x1). Assuming it's already bound to a single local, so there's not a side effect lost by the exception, or gained by it not happening), I can't think how it'd be a problem for that so long as the overflow flags are checked after both halves. |
@tannergooding You forgot all 2^32 - 1 values of |
sigh 😆 |
@bartonjs agreed life is more complicated with checked math. But the jit doesn't optimize checked constructs, by and large. |
From the discussion it is unlikely that we will optimize it for integer. |
By looking at the discussion and resulting codegen, it's just one instruction less which is Thanks, everyone! |
As usual, was playing with the JIT ASM compiler for C# and found an interesting optimization that can be done for codegen.
Demo link -> https://godbolt.org/z/1v9sMz7Y5 (contains both C# and C++ compiler example)
As you can see above, in C#, we always move the parameter to the
eax
and callimul
on it as many times as there is multiplication. This is ok when the multiplication is having 2,3 or even 5 with itself, but when it exceeds there is possible optimization can be made just likegcc
one.If you look at the
GCC
compiler, whenever it finds there is the multiplication of number itself in even number of times, before even moving parameter toeax
, it doesimul
of it first and then moves it to theeax
, resulting in a lessimul
instruction.This is applicable for multiple even multiplications (2^n). Just update the code with multiple
(num * num)
and seegcc
will always result in one less instruction compared to C# ones!I know it's not a groundbreaking optimization idea, but still one lesser assembly instruction.
cc @EgorBo @AndyAyersMS
The text was updated successfully, but these errors were encountered: