JIT: xxxx -> (xx)(xx) transformation #65141

ShreyasJejurkar · 2022-02-10T14:56:24Z

As usual, was playing with the JIT ASM compiler for C# and found an interesting optimization that can be done for codegen.

Demo link -> https://godbolt.org/z/1v9sMz7Y5 (contains both C# and C++ compiler example)

As you can see above, in C#, we always move the parameter to the eax and call imul on it as many times as there is multiplication. This is ok when the multiplication is having 2,3 or even 5 with itself, but when it exceeds there is possible optimization can be made just like gcc one.

If you look at the GCC compiler, whenever it finds there is the multiplication of number itself in even number of times, before even moving parameter to eax, it does imul of it first and then moves it to the eax , resulting in a less imul instruction.

This is applicable for multiple even multiplications (2^n). Just update the code with multiple (num * num) and see gcc will always result in one less instruction compared to C# ones!

I know it's not a groundbreaking optimization idea, but still one lesser assembly instruction.

cc @EgorBo @AndyAyersMS

The text was updated successfully, but these errors were encountered:

dotnet-issue-labeler · 2022-02-10T14:56:29Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

EgorBo · 2022-02-10T15:08:39Z

Thanks, I love the fact it's now possible to compare C++ and C# back to back on the same godbolt link! But we need to fix the ordering 🙂

As for the request - yeah, this optimization is useful when you need to compute Power function Math.Pow(X, const) (especially for floats in ffast-math) but we don't expand it for const power and we don't even have Math.Pow for integers, so I assume this pattern is highly unlikely to be seen in the wild, unless you point me to some place where we can meet it 🙂

KieranFoot · 2022-02-10T15:13:31Z

@EgorBo (x*x)*(x*x) reduces the instruction count nicely.

Is there a reason why there is no int version of Math.Pow() ? Is there simply no requirement for it?

EgorBo · 2022-02-10T15:19:46Z

@EgorBo (x*x)*(x*x) reduces the instruction count nicely.

technically it removes just one imul which is not super expensive 🤷‍♂️ (Lat=3, Rec.TP=1 on Zen3)

Is there a reason why there is no int version of Math.Pow() ? Is there simply no requirement for it?

Feel free to file an API proposal, it seems like developers (and even us) sometimes use double-one and cast back to int

ShreyasJejurkar · 2022-02-10T15:22:08Z

While checking the above stuff, found another one. Why there is a huge codegen difference in this. Although in the end, both are doing the same stuff.

https://sharplab.io/#v2:EYLgtghglgdgNAFxBAzmAPgAQEwEYCwAUJgMwAEOZAwmQN5FmMXkA2A9jAOZkCyAFLARkYAVzABKOgyYzMAdmFiyAKkVgVajaLABuaYwC++ssdJl2XMgBUBMIdsn1CM2Qp4QEACwB0AISicPCIsfNpwapKq7l5+AUEhYRF6zkxGhAZAA

EgorBo · 2022-02-10T15:25:17Z

While checking the above stuff, found another one. Why there is a huge codegen difference in this. Although in the end, both are doing the same stuff.

https://sharplab.io/#v2:EYLgtghglgdgNAFxBAzmAPgAQEwEYCwAUJgMwAEOZAwmQN5FmMXkA2A9jAOZkCyAFLARkYAVzABKOgyYzMAdmFiyAKkVgVajaLABuaYwC++ssdJl2XMgBUBMIdsn1CM2Qp4QEACwB0AISicPCIsfNpwapKq7l5+AUEhYRF6zkxGhAZAA

it's 32bit. 64bit integers are terrible for 32bit targets. Either switch to x64 on sharplab or return back to godbolt 🙂

ShreyasJejurkar · 2022-02-10T15:27:11Z

ahh, switched to x64 bit, somehow I missed that one. Found an ordering difference, not sure will that makes a diff.
https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIGYACMhgYQYG8aHunGAbCADsA5gwCyACgCWgjA0EBXfAEoOXHhuIB2eUoYAqXfgNGTi/AG513AL7WG9+gwEiGAFWmyjqztQ2adMWwMAAsAOgAhKWExBT4JczRvEyDQyOjY+MTvKz8eO2obIA==

KieranFoot · 2022-02-10T15:31:50Z

@EgorBo it removed the movapd and one mulsd.

As you say, this isn't so significant for a one off call, but if an app were doing this many times it would be a good perf improvement.

ghost · 2022-02-10T15:41:57Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

As usual, was playing with the JIT ASM compiler for C# and found an interesting optimization that can be done for codegen.

Demo link -> https://godbolt.org/z/1v9sMz7Y5 (contains both C# and C++ compiler example)

As you can see above, in C#, we always move the parameter to the eax and call imul on it as many times as there is multiplication. This is ok when the multiplication is having 2,3 or even 5 with itself, but when it exceeds there is possible optimization can be made just like gcc one.

If you look at the GCC compiler, whenever it finds there is the multiplication of number itself in even number of times, before even moving parameter to eax, it does imul of it first and then moves it to the eax , resulting in a less imul instruction.

This is applicable for multiple even multiplications (2^n). Just update the code with multiple (num * num) and see gcc will always result in one less instruction compared to C# ones!

I know it's not a groundbreaking optimization idea, but still one lesser assembly instruction.

cc @EgorBo @AndyAyersMS

Author:	ShreyasJejurkar
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `untriaged`
Milestone:	-

tannergooding · 2022-02-10T16:42:30Z

Is there a reason why there is no int version of Math.Pow() ? Is there simply no requirement for it?

Simply put, because it isn't useful in practice. -- There is a related discussion here dotnet/csharplang#2585

Integers have a very narrow range comparatively and the number of valid inputs are fairly exact and limited. Most inputs will actually throw or are better handled a different way.

Ignoring x^1 and 1^x (which are just x and 1, respectively), there are 48,036 valid powers that are between 0 and int.MaxValue.

Now that might sound like a lot, but lets actually break it down:

2 goes up to the power of 30
3 goes up to the power of 19
4 goes up to the power of 15
5 goes up to the power of 13
6 and 7 go up to the power of 11
8 goes up to the power of 10
9 and 10 go up to the power of 9
11 through 14 go up to the power of 8
15 through 21 go up to the power of 7
22 through 35 go up to the power of 6
36 through 73 go up to the power of 5
74 through 215 go up to the power of 4
216 through 1290 go up to the power of 3
1291 through 46340 go up to the power of 2

This means that there is a very very small window of valid integer values that can be represented as powers in the first place; most of which (~45,000 of them; which is ~93.7%) are only valid as a power of two (e.g. x * x).

These values are also very specific and represent a very small portion of the overall 2.14 billion positive integers you can otherwise have. Their usage outside 2^x and 10^x are also going to be fairly limited, both of which have better alternatives that are perhaps more clear (e.g. 2 << x for powers of two or 10_000_000 or similar for powers of ten).

tannergooding · 2022-02-10T16:44:45Z

It is not worthwhile, in my opinion, to expose a Math.Pow function for integers that will only be able to handle approx. 3000 scenarios (and where otherwise just x * x is better). At that point you're simply better off manually handling your scenario if the perf difference is enough to be meaningful

AndyAyersMS · 2022-02-10T16:59:20Z

There is a different message here perhaps -- I have worried in the past that the jit's tree costing/balancing is too oriented towards reducing register consumption. Tree height reduction (see for example) is often useful if we think there are enough registers around to support a more aggressive computation shape.

bartonjs · 2022-02-10T17:00:44Z

x1 * x2 * x3 * x4 being re-evaluated from ((x1 * x2) * x3) * x4 to (x1 * x2) * (x3 * x4) isn't valid in a checked scope.

Let x1 = 0x2000_0000, x2 = 2, x3 = 4, x4 = 0.

x1 * x2 = 0x4000_0000
- x1 * x2 * x3 = 0x0000_0001_0000_0000
  - Both Int32 and UInt32 trip OverflowException here
- x3 * x4 = 0
  - (x1 * x2) * (x3 * x4) = 0
  - No OverflowException

For the checked scope case, there's not only a difference in whether or not an exception was thrown, but whether or not the x4 expression should have evaluated.

Though it looks like the issue is really only about computing x^4 (so x2=x1, x3=x1, x4=x1). Assuming it's already bound to a single local, so there's not a side effect lost by the exception, or gained by it not happening), I can't think how it'd be a problem for that so long as the overflow flags are checked after both halves.

bartonjs · 2022-02-10T17:05:08Z

@tannergooding You forgot all 2^32 - 1 values of 0^x = 1, and 0^0 whose value, apparently, leaves room for debate. 😏

tannergooding · 2022-02-10T17:23:19Z

sigh 😆

AndyAyersMS · 2022-02-10T17:33:41Z

@bartonjs agreed life is more complicated with checked math. But the jit doesn't optimize checked constructs, by and large.

JulieLeeMSFT · 2022-02-11T00:06:05Z

From the discussion it is unlikely that we will optimize it for integer.
@ShreyasJejurkar is it ok to close this issue?

ShreyasJejurkar · 2022-02-11T04:20:21Z

By looking at the discussion and resulting codegen, it's just one instruction less which is imul, if it's not making a huge perf difference, then am totally ok to close this issue.

Thanks, everyone!

dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Feb 10, 2022

EgorBo changed the title ~~A chance for optimization.~~ JIT: x*x*x*x -> (x*x)*(x*x) transformation Feb 10, 2022

SingleAccretion added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 10, 2022

ShreyasJejurkar closed this as completed Feb 11, 2022

ghost locked as resolved and limited conversation to collaborators Mar 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: xxxx -> (xx)(xx) transformation #65141

JIT: xxxx -> (xx)(xx) transformation #65141

ShreyasJejurkar commented Feb 10, 2022 •

edited

Loading

dotnet-issue-labeler bot commented Feb 10, 2022

EgorBo commented Feb 10, 2022

KieranFoot commented Feb 10, 2022

EgorBo commented Feb 10, 2022 •

edited

Loading

ShreyasJejurkar commented Feb 10, 2022

EgorBo commented Feb 10, 2022 •

edited

Loading

ShreyasJejurkar commented Feb 10, 2022

KieranFoot commented Feb 10, 2022

ghost commented Feb 10, 2022

tannergooding commented Feb 10, 2022

tannergooding commented Feb 10, 2022 •

edited

Loading

AndyAyersMS commented Feb 10, 2022

bartonjs commented Feb 10, 2022

bartonjs commented Feb 10, 2022

tannergooding commented Feb 10, 2022

AndyAyersMS commented Feb 10, 2022

JulieLeeMSFT commented Feb 11, 2022

ShreyasJejurkar commented Feb 11, 2022

JIT: x*x*x*x -> (x*x)*(x*x) transformation #65141

JIT: x*x*x*x -> (x*x)*(x*x) transformation #65141

Comments

ShreyasJejurkar commented Feb 10, 2022 • edited Loading

dotnet-issue-labeler bot commented Feb 10, 2022

EgorBo commented Feb 10, 2022

KieranFoot commented Feb 10, 2022

EgorBo commented Feb 10, 2022 • edited Loading

ShreyasJejurkar commented Feb 10, 2022

EgorBo commented Feb 10, 2022 • edited Loading

ShreyasJejurkar commented Feb 10, 2022

KieranFoot commented Feb 10, 2022

ghost commented Feb 10, 2022

tannergooding commented Feb 10, 2022

tannergooding commented Feb 10, 2022 • edited Loading

AndyAyersMS commented Feb 10, 2022

bartonjs commented Feb 10, 2022

bartonjs commented Feb 10, 2022

tannergooding commented Feb 10, 2022

AndyAyersMS commented Feb 10, 2022

JulieLeeMSFT commented Feb 11, 2022

ShreyasJejurkar commented Feb 11, 2022

JIT: xxxx -> (xx)(xx) transformation #65141

JIT: xxxx -> (xx)(xx) transformation #65141

ShreyasJejurkar commented Feb 10, 2022 •

edited

Loading

EgorBo commented Feb 10, 2022 •

edited

Loading

EgorBo commented Feb 10, 2022 •

edited

Loading

tannergooding commented Feb 10, 2022 •

edited

Loading