-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: optimize large structs #24416
Comments
Just to ask an obvious question, is the problem that we want to handle all n, or just that n<=4 is too small? If the latter, and a reasonable cap is say 16, it might be easier to just extend the current approach, maybe using a bit of codegen magic. Probably we should Do It Right, but worth asking... |
My intent is to handle all n. |
Looks like this should also help #20859 |
Change https://golang.org/cl/106495 mentions this issue: |
Propagate values through some wide Zero/Move operations. Among other things this allows us to optimize some kinds of array initialization. For example, the following code no longer requires a temporary be allocated on the stack. Instead it writes the values directly into the return value. func f(i uint32) [4]uint32 { return [4]uint32{i, i+1, i+2, i+3} } The return value is unnecessarily cleared but removing that is probably a task for dead store analysis (I think it needs to be able to match multiple Store ops to wide Zero ops). In order to reliably remove stack variables that are rendered unnecessary by these new rules I've added a new generic version of the unread autos elimination pass. These rules are triggered more than 5000 times when building and testing the standard library. Updates #15925 (fixes for arrays of up to 4 elements). Updates #24386 (fixes for up to 4 kept elements). Updates #24416. compilebench results: name old time/op new time/op delta Template 353ms ± 5% 359ms ± 3% ~ (p=0.143 n=10+10) Unicode 219ms ± 1% 217ms ± 4% ~ (p=0.740 n=7+10) GoTypes 1.26s ± 1% 1.26s ± 2% ~ (p=0.549 n=9+10) Compiler 6.00s ± 1% 6.08s ± 1% +1.42% (p=0.000 n=9+8) SSA 15.3s ± 2% 15.6s ± 1% +2.43% (p=0.000 n=10+10) Flate 237ms ± 2% 240ms ± 2% +1.31% (p=0.015 n=10+10) GoParser 285ms ± 1% 285ms ± 1% ~ (p=0.878 n=8+8) Reflect 797ms ± 3% 807ms ± 2% ~ (p=0.065 n=9+10) Tar 334ms ± 0% 335ms ± 4% ~ (p=0.460 n=8+10) XML 419ms ± 0% 423ms ± 1% +0.91% (p=0.001 n=7+9) StdCmd 46.0s ± 0% 46.4s ± 0% +0.85% (p=0.000 n=9+9) name old user-time/op new user-time/op delta Template 337ms ± 3% 346ms ± 5% ~ (p=0.053 n=9+10) Unicode 205ms ±10% 205ms ± 8% ~ (p=1.000 n=10+10) GoTypes 1.22s ± 2% 1.21s ± 3% ~ (p=0.436 n=10+10) Compiler 5.85s ± 1% 5.93s ± 0% +1.46% (p=0.000 n=10+8) SSA 14.9s ± 1% 15.3s ± 1% +2.62% (p=0.000 n=10+10) Flate 229ms ± 4% 228ms ± 6% ~ (p=0.796 n=10+10) GoParser 271ms ± 3% 275ms ± 4% ~ (p=0.165 n=10+10) Reflect 779ms ± 5% 775ms ± 2% ~ (p=0.971 n=10+10) Tar 317ms ± 4% 319ms ± 5% ~ (p=0.853 n=10+10) XML 404ms ± 4% 409ms ± 5% ~ (p=0.436 n=10+10) name old alloc/op new alloc/op delta Template 34.9MB ± 0% 35.0MB ± 0% +0.26% (p=0.000 n=10+10) Unicode 29.3MB ± 0% 29.3MB ± 0% +0.02% (p=0.000 n=10+10) GoTypes 115MB ± 0% 115MB ± 0% +0.30% (p=0.000 n=10+10) Compiler 519MB ± 0% 521MB ± 0% +0.30% (p=0.000 n=10+10) SSA 1.55GB ± 0% 1.57GB ± 0% +1.34% (p=0.000 n=10+9) Flate 24.1MB ± 0% 24.2MB ± 0% +0.10% (p=0.000 n=10+10) GoParser 28.1MB ± 0% 28.1MB ± 0% +0.07% (p=0.000 n=10+10) Reflect 78.7MB ± 0% 78.7MB ± 0% +0.03% (p=0.000 n=8+10) Tar 34.4MB ± 0% 34.5MB ± 0% +0.12% (p=0.000 n=10+10) XML 43.2MB ± 0% 43.2MB ± 0% +0.13% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 330k ± 0% 330k ± 0% -0.01% (p=0.017 n=10+10) Unicode 337k ± 0% 337k ± 0% +0.01% (p=0.000 n=9+10) GoTypes 1.15M ± 0% 1.15M ± 0% +0.03% (p=0.000 n=10+10) Compiler 4.77M ± 0% 4.77M ± 0% +0.03% (p=0.000 n=9+10) SSA 12.5M ± 0% 12.6M ± 0% +1.16% (p=0.000 n=10+10) Flate 221k ± 0% 221k ± 0% +0.05% (p=0.000 n=9+10) GoParser 275k ± 0% 275k ± 0% +0.01% (p=0.014 n=10+9) Reflect 944k ± 0% 944k ± 0% -0.02% (p=0.000 n=10+10) Tar 324k ± 0% 323k ± 0% -0.12% (p=0.000 n=10+10) XML 384k ± 0% 384k ± 0% -0.01% (p=0.001 n=10+10) name old object-bytes new object-bytes delta Template 476kB ± 0% 476kB ± 0% -0.04% (p=0.000 n=10+10) Unicode 218kB ± 0% 218kB ± 0% ~ (all equal) GoTypes 1.58MB ± 0% 1.58MB ± 0% -0.04% (p=0.000 n=10+10) Compiler 6.25MB ± 0% 6.24MB ± 0% -0.09% (p=0.000 n=10+10) SSA 15.9MB ± 0% 16.1MB ± 0% +1.22% (p=0.000 n=10+10) Flate 304kB ± 0% 304kB ± 0% -0.13% (p=0.000 n=10+10) GoParser 370kB ± 0% 370kB ± 0% -0.00% (p=0.000 n=10+10) Reflect 1.27MB ± 0% 1.27MB ± 0% -0.12% (p=0.000 n=10+10) Tar 421kB ± 0% 419kB ± 0% -0.64% (p=0.000 n=10+10) XML 518kB ± 0% 517kB ± 0% -0.12% (p=0.000 n=10+10) name old export-bytes new export-bytes delta Template 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) Unicode 6.52kB ± 0% 6.52kB ± 0% ~ (all equal) GoTypes 29.2kB ± 0% 29.2kB ± 0% ~ (all equal) Compiler 88.0kB ± 0% 88.0kB ± 0% ~ (all equal) SSA 109kB ± 0% 109kB ± 0% ~ (all equal) Flate 4.49kB ± 0% 4.49kB ± 0% ~ (all equal) GoParser 8.10kB ± 0% 8.10kB ± 0% ~ (all equal) Reflect 7.71kB ± 0% 7.71kB ± 0% ~ (all equal) Tar 9.15kB ± 0% 9.15kB ± 0% ~ (all equal) XML 12.3kB ± 0% 12.3kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 676kB ± 0% 672kB ± 0% -0.59% (p=0.000 n=10+10) CmdGoSize 7.26MB ± 0% 7.24MB ± 0% -0.18% (p=0.000 n=10+10) name old data-bytes new data-bytes delta HelloSize 10.2kB ± 0% 10.2kB ± 0% ~ (all equal) CmdGoSize 248kB ± 0% 248kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 125kB ± 0% 125kB ± 0% ~ (all equal) CmdGoSize 145kB ± 0% 145kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.46MB ± 0% 1.45MB ± 0% -0.31% (p=0.000 n=10+10) CmdGoSize 14.7MB ± 0% 14.7MB ± 0% -0.17% (p=0.000 n=10+10) Change-Id: Ic72b0c189dd542f391e1c9ab88a76e9148dc4285 Reviewed-on: https://go-review.googlesource.com/106495 Run-TryBot: Michael Munday <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Keith Randall <[email protected]>
Change https://golang.org/cl/206937 mentions this issue: |
If golang/go#24416 gets fixed we could revert this. name old time/op new time/op delta StdIPv4-4 361ns ± 9% 386ns ± 8% ~ (p=0.167 n=5+5) IPv4-4 273ns ±12% 301ns ±14% ~ (p=0.238 n=5+5) IPv4_inline-4 251ns ± 5% 262ns ±10% ~ (p=0.730 n=5+5) StdIPv6-4 507ns ±18% 490ns ±13% ~ (p=0.651 n=5+5) IPv6-4 414ns ± 8% 416ns ± 6% ~ (p=1.000 n=5+5) IPv4Contains-4 30.1ns ± 4% 13.1ns ±12% -56.41% (p=0.008 n=5+5) ParseIP/v4-4 55.9ns ± 5% 49.5ns ±10% -11.35% (p=0.032 n=5+5) ParseIP/v6-4 244ns ± 9% 217ns ± 8% -10.76% (p=0.040 n=5+5) ParseIP/v6_ellipsis-4 157ns ±12% 157ns ± 7% ~ (p=0.730 n=5+5) ParseIP/v6_v4-4 191ns ±12% 168ns ±19% ~ (p=0.310 n=5+5) ParseIP/v6_zone-4 464ns ±21% 448ns ±12% ~ (p=0.841 n=5+5) StdParseIP/v4-4 186ns ± 5% 186ns ±13% ~ (p=1.000 n=5+5) StdParseIP/v6-4 362ns ± 7% 330ns ± 5% -8.68% (p=0.016 n=5+5) StdParseIP/v6_ellipsis-4 258ns ±17% 280ns ± 6% ~ (p=0.254 n=5+5) StdParseIP/v6_v4-4 427ns ±28% 457ns ±15% ~ (p=0.548 n=5+5) StdParseIP/v6_zone-4 285ns ±17% 305ns ±13% ~ (p=0.421 n=5+5) IPString/v4-4 153ns ±21% 122ns ±17% -19.79% (p=0.048 n=5+5) IPString/v6-4 356ns ± 3% 345ns ±22% ~ (p=0.151 n=5+5) IPString/v6_ellipsis-4 346ns ±13% 329ns ±20% ~ (p=0.548 n=5+5) IPString/v6_v4-4 309ns ±15% 266ns ±11% -13.90% (p=0.032 n=5+5) IPString/v6_zone-4 362ns ±18% 330ns ±21% ~ (p=0.421 n=5+5) IPPrefixMasking/IPv4_/32-4 54.8ns ± 4% 15.1ns ± 4% -72.44% (p=0.008 n=5+5) IPPrefixMasking/IPv4_/17-4 54.9ns ± 4% 14.6ns ± 5% -73.38% (p=0.008 n=5+5) IPPrefixMasking/IPv4_/0-4 55.0ns ± 4% 13.5ns ± 6% -75.50% (p=0.008 n=5+5) IPPrefixMasking/IPv6_/128-4 55.0ns ± 1% 15.5ns ± 7% -71.79% (p=0.008 n=5+5) IPPrefixMasking/IPv6_/65-4 54.2ns ± 2% 14.6ns ± 5% -73.04% (p=0.008 n=5+5) IPPrefixMasking/IPv6_/0-4 55.4ns ± 2% 14.9ns ± 9% -73.06% (p=0.008 n=5+5) IPPrefixMasking/IPv6_zone_/128-4 54.4ns ± 1% 15.0ns ±13% -72.44% (p=0.008 n=5+5) IPPrefixMasking/IPv6_zone_/65-4 55.5ns ± 2% 14.8ns ±16% -73.31% (p=0.008 n=5+5) IPPrefixMasking/IPv6_zone_/0-4 54.8ns ± 4% 14.7ns ± 8% -73.23% (p=0.008 n=5+5) ParseIPPort/v4-4 118ns ± 8% 103ns ± 9% -12.88% (p=0.008 n=5+5) ParseIPPort/v6-4 318ns ±12% 262ns ± 6% -17.38% (p=0.008 n=5+5) ParseIPPort/v6_ellipsis-4 243ns ±10% 240ns ±19% ~ (p=0.841 n=5+5) ParseIPPort/v6_v4-4 279ns ±14% 237ns ±16% -14.91% (p=0.040 n=5+5) ParseIPPort/v6_zone-4 578ns ± 9% 546ns ± 7% ~ (p=0.222 n=5+5) IPRangePrefix-4 142ns ± 5% 61ns ±11% -56.74% (p=0.008 n=5+5) IPSetFuzz-4 41.0µs ±16% 33.3µs ±15% -18.87% (p=0.016 n=5+5) Fixes #102 Signed-off-by: Brad Fitzpatrick <[email protected]>
If golang/go#24416 gets fixed we could revert this. name old time/op new time/op delta StdIPv4-4 361ns ± 9% 386ns ± 8% ~ (p=0.167 n=5+5) IPv4-4 273ns ±12% 301ns ±14% ~ (p=0.238 n=5+5) IPv4_inline-4 251ns ± 5% 262ns ±10% ~ (p=0.730 n=5+5) StdIPv6-4 507ns ±18% 490ns ±13% ~ (p=0.651 n=5+5) IPv6-4 414ns ± 8% 416ns ± 6% ~ (p=1.000 n=5+5) IPv4Contains-4 30.1ns ± 4% 13.1ns ±12% -56.41% (p=0.008 n=5+5) ParseIP/v4-4 55.9ns ± 5% 49.5ns ±10% -11.35% (p=0.032 n=5+5) ParseIP/v6-4 244ns ± 9% 217ns ± 8% -10.76% (p=0.040 n=5+5) ParseIP/v6_ellipsis-4 157ns ±12% 157ns ± 7% ~ (p=0.730 n=5+5) ParseIP/v6_v4-4 191ns ±12% 168ns ±19% ~ (p=0.310 n=5+5) ParseIP/v6_zone-4 464ns ±21% 448ns ±12% ~ (p=0.841 n=5+5) StdParseIP/v4-4 186ns ± 5% 186ns ±13% ~ (p=1.000 n=5+5) StdParseIP/v6-4 362ns ± 7% 330ns ± 5% -8.68% (p=0.016 n=5+5) StdParseIP/v6_ellipsis-4 258ns ±17% 280ns ± 6% ~ (p=0.254 n=5+5) StdParseIP/v6_v4-4 427ns ±28% 457ns ±15% ~ (p=0.548 n=5+5) StdParseIP/v6_zone-4 285ns ±17% 305ns ±13% ~ (p=0.421 n=5+5) IPString/v4-4 153ns ±21% 122ns ±17% -19.79% (p=0.048 n=5+5) IPString/v6-4 356ns ± 3% 345ns ±22% ~ (p=0.151 n=5+5) IPString/v6_ellipsis-4 346ns ±13% 329ns ±20% ~ (p=0.548 n=5+5) IPString/v6_v4-4 309ns ±15% 266ns ±11% -13.90% (p=0.032 n=5+5) IPString/v6_zone-4 362ns ±18% 330ns ±21% ~ (p=0.421 n=5+5) IPPrefixMasking/IPv4_/32-4 54.8ns ± 4% 15.1ns ± 4% -72.44% (p=0.008 n=5+5) IPPrefixMasking/IPv4_/17-4 54.9ns ± 4% 14.6ns ± 5% -73.38% (p=0.008 n=5+5) IPPrefixMasking/IPv4_/0-4 55.0ns ± 4% 13.5ns ± 6% -75.50% (p=0.008 n=5+5) IPPrefixMasking/IPv6_/128-4 55.0ns ± 1% 15.5ns ± 7% -71.79% (p=0.008 n=5+5) IPPrefixMasking/IPv6_/65-4 54.2ns ± 2% 14.6ns ± 5% -73.04% (p=0.008 n=5+5) IPPrefixMasking/IPv6_/0-4 55.4ns ± 2% 14.9ns ± 9% -73.06% (p=0.008 n=5+5) IPPrefixMasking/IPv6_zone_/128-4 54.4ns ± 1% 15.0ns ±13% -72.44% (p=0.008 n=5+5) IPPrefixMasking/IPv6_zone_/65-4 55.5ns ± 2% 14.8ns ±16% -73.31% (p=0.008 n=5+5) IPPrefixMasking/IPv6_zone_/0-4 54.8ns ± 4% 14.7ns ± 8% -73.23% (p=0.008 n=5+5) ParseIPPort/v4-4 118ns ± 8% 103ns ± 9% -12.88% (p=0.008 n=5+5) ParseIPPort/v6-4 318ns ±12% 262ns ± 6% -17.38% (p=0.008 n=5+5) ParseIPPort/v6_ellipsis-4 243ns ±10% 240ns ±19% ~ (p=0.841 n=5+5) ParseIPPort/v6_v4-4 279ns ±14% 237ns ±16% -14.91% (p=0.040 n=5+5) ParseIPPort/v6_zone-4 578ns ± 9% 546ns ± 7% ~ (p=0.222 n=5+5) IPRangePrefix-4 142ns ± 5% 61ns ±11% -56.74% (p=0.008 n=5+5) IPSetFuzz-4 41.0µs ±16% 33.3µs ±15% -18.87% (p=0.016 n=5+5) Fixes #102 Signed-off-by: Brad Fitzpatrick <[email protected]>
It looks, for Go toolchain v1.16, structs with 8 int fields and [8]int values have already been viewed as small-size values. |
It definitely compiles to better code now, but I think the underlying issue is still there. I think it just needs a better test case.
|
I'm interested to know what's the status on this? The CL206937 was abandoned but it's unclear why. Are there any extra insights that came from that experiment? |
It's not abandoned, it just isn't finished. It had issues. Performance wasn't good, and there were pathological cases, like large fields of even larger structs weren't handled well. |
Change https://go.dev/cl/611076 mentions this issue: |
Change https://go.dev/cl/611075 mentions this issue: |
The SSA backend currently only handle struct with up to 4 fields. Thus, there are different operations corresponding to number fields of the struct. This CL generalizes these with just one OpStructMake, allow struct types with arbitrary number of fields. However, the ssa.MaxStruct is still kept as-is, and future CL will increase this value to optimize large structs. Updates #24416 Change-Id: I192ffbea881186693584476b5639394e79be45c5 Reviewed-on: https://go-review.googlesource.com/c/go/+/611075 Reviewed-by: Keith Randall <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Keith Randall <[email protected]> Auto-Submit: Cuong Manh Le <[email protected]> Reviewed-by: David Chase <[email protected]>
The compiler currently compiles large structs conservatively. If a struct type has more than 4 fields (or a few other conditions), we treat that type as unSSAable. All operations on variables of that type go to the stack, as if their address was taken.
This is suboptimal in various ways. For example:
f
is compiled optimally, to:g
is quite a bit worse:We zero a temporary variable on the stack, then copy it to the destination.
We should process large structs through SSA as well. This will require a fair amount of work in the SSA backend to introduce struct builders, selectors of arbitrary width, stack allocation of large types, maybe heap allocation if they are really huge, etc.
Arrays of size > 1 are in a similar state, but they are somewhat harder because non-constant indexes add an additional complication.
The text was updated successfully, but these errors were encountered: