cmd/compile: looprotate picking wrong jump target #20355

josharian · 2017-05-13T15:26:07Z

This is easier to explain with some visuals. They're a bit complicated; bear with me. Ask if you have questions about how to read them.

The screenshots below are of ssa.html with a CFG of the fannkuch benchmark, including CL 43293. If you want to look around yourself, here's the ssa.html. Be patient--this is hard on your browser. To click to highlight a node you must click on the text in the node, not anywhere in the ellipse, because I hate javascript.

First, here's looprotate doing a good job.

The nodes are blocks. The node labels contain block id and block kind; the number in parens indicates its index in f.Blocks (i.e. where is has been laid out). Edges are labeled with successor numbers. Dashed edges are unlikely. Green edges indicate adjacency in f.Blocks. Dotted green edges aren't part of the CFG; they are there only to show layout adjacency. Node coloring was done manually, by clicking on them.

You can see that b3 is the header for the loop b4 b5 b6, which can panic (via b9) or terminate normally (via b7). Looprotate changes the layout so that after b3 comes b5, not b4. Observe the new dotted green edge and updated parenthesized indices.

Now here's looprotate doing something untoward. Same graph, different highlighted nodes. The second screenshot shows the bottom of the graph.

b7 is a loop header for a very big loop starting at b11 (observe the edge coming in from b12 offscreen below). But b11 is not the node that decides whether the loop should continue. That's b61, which (if still looping) continues to b12, which returns to b11.

However, looprotate bypasses b11 in favor of b14, which doesn't really make much sense. The third screenshot shows what this looks like after the trim pass. Definitely doesn't seem like ideal code layout.

cc @randall77

josharian · 2017-05-13T16:44:28Z

cc @dr2chase @cherrymui @tzneal

gopherbot · 2017-05-14T00:07:21Z

CL https://golang.org/cl/43464 mentions this issue.

josharian · 2017-05-14T14:40:04Z

Started looking at this. Have some ideas.

gopherbot · 2017-05-14T22:41:01Z

CL https://golang.org/cl/43491 mentions this issue.

randall77 · 2017-05-15T19:08:58Z

(You don't mean to say that b7 is the loop header, do you? b11 is the header. b7 isn't even in the loop.)

I agree that this second reordering is not desirable. From my perspective, that's just because the branch at the end of b11 isn't an exit-loop branch. Maybe that should be part of the condition for triggering the rotate? Then we'd only move single blocks.

Maybe the condition is that if we rotate the loop, the non-local edge out of the header is now a fallthrough edge (the non-loop successor is the immediately next block).

I think your CL 43491 is correct, but it may be overkill.

gopherbot · 2017-05-15T19:34:03Z

CL https://golang.org/cl/43502 mentions this issue.

josharian · 2017-05-16T17:35:24Z

I ran some benchmarks with CL 43491 ("josh") and CL 43502 ("khr") on top of 5548f7d ("tip"). For each, I'll show all three together, tip -> khr, tip -> josh, and khr -> josh. It's lots of data, but it's hard to make sense of without all the pairwise stats.

Looking through these, I'm weakly inclined to go for the only slightly more complicated josh CL, mostly on the basis of the macro (compilebench) benchmark and the big append regression. I also think it is interesting that the khr CL has a big +/- impact on lots of go1 benchmarks, but the josh CL does not. It makes me suspect that there are better heuristics available to decide whether to not unroll (khr) or move many blocks (josh). I think we should pull in either khr or josh for 1.9 and then revisit in 1.10 whether they can be fruitfully combined. Other changes to the layout pass may also affect this.

@randall77 now that we have some data, please make an executive decision about how to proceed.

Fannkuch running alone:

all three
name \ time/op  fk-tip       fk-khr       fk-josh
Fannkuch11-8    2.73s ± 2%  2.81s ± 2%  2.82s ± 1%

tip -> khr
Fannkuch11-8   2.73s ± 2%   2.81s ± 2%  +2.78%  (p=0.000 n=29+30)

tip -> josh
Fannkuch11-8   2.73s ± 2%   2.82s ± 1%  +3.16%  (p=0.000 n=29+30)

khr -> josh
Fannkuch11-8   2.81s ± 2%   2.82s ± 1%   ~     (p=0.059 n=30+30)

All go1 benchmarks:

all three
name \ time/op           go1-tip         go1-khr         go1-josh
BinaryTree17-8              2.29s ± 1%     2.22s ± 2%     2.25s ± 2%
Fannkuch11-8                2.73s ± 1%     2.80s ± 2%     2.82s ± 1%
FmtFprintfEmpty-8          33.8ns ± 3%    33.9ns ± 2%    34.4ns ± 3%
FmtFprintfString-8         56.4ns ± 2%    58.8ns ± 3%    57.3ns ± 2%
FmtFprintfInt-8            61.0ns ± 2%    61.5ns ± 2%    61.3ns ± 3%
FmtFprintfIntInt-8         92.3ns ± 1%    88.5ns ± 2%    92.0ns ± 1%
FmtFprintfPrefixedInt-8     109ns ± 2%     114ns ± 1%     107ns ± 2%
FmtFprintfFloat-8           188ns ± 2%     185ns ± 2%     185ns ± 1%
FmtManyArgs-8               406ns ± 2%     387ns ± 2%     397ns ± 2%
GobDecode-8                5.18ms ± 1%    5.20ms ± 2%    5.18ms ± 1%
GobEncode-8                4.05ms ± 1%    4.15ms ± 1%    4.05ms ± 2%
Gzip-8                      194ms ± 2%     192ms ± 2%     194ms ± 1%
Gunzip-8                   31.0ms ± 2%    29.5ms ± 2%    30.6ms ± 1%
HTTPClientServer-8         64.4µs ± 1%    63.2µs ± 0%    63.8µs ± 1%
JSONEncode-8               10.2ms ± 3%    10.5ms ± 2%    10.5ms ± 2%
JSONDecode-8               42.9ms ± 2%    42.2ms ± 2%    41.2ms ± 2%
Mandelbrot200-8            3.67ms ± 1%    3.69ms ± 2%    3.68ms ± 1%
GoParse-8                  2.67ms ± 2%    2.67ms ± 3%    2.67ms ± 1%
RegexpMatchEasy0_32-8      67.2ns ± 2%    67.2ns ± 2%    67.7ns ± 2%
RegexpMatchEasy0_1K-8       160ns ± 1%     158ns ± 1%     159ns ± 2%
RegexpMatchEasy1_32-8      63.1ns ± 2%    63.4ns ± 1%    63.4ns ± 1%
RegexpMatchEasy1_1K-8       281ns ± 3%     275ns ± 2%     282ns ± 3%
RegexpMatchMedium_32-8      100ns ± 1%      97ns ± 1%     100ns ± 2%
RegexpMatchMedium_1K-8     31.9µs ± 4%    30.7µs ± 2%    31.8µs ± 2%
RegexpMatchHard_32-8       1.49µs ± 2%    1.49µs ± 2%    1.52µs ± 5%
RegexpMatchHard_1K-8       44.5µs ± 2%    44.4µs ± 1%    44.4µs ± 2%
Revcomp-8                   358ms ± 2%     392ms ± 3%     359ms ± 3%
Template-8                 47.0ms ± 3%    47.1ms ± 2%    46.7ms ± 2%
TimeParse-8                 262ns ± 2%     245ns ± 2%     248ns ± 2%
TimeFormat-8                262ns ± 2%     254ns ± 2%     260ns ± 2%

tip -> khr
BinaryTree17-8              2.29s ± 1%     2.22s ± 2%  -3.02%  (p=0.000 n=19+19)
Fannkuch11-8                2.73s ± 1%     2.80s ± 2%  +2.91%  (p=0.000 n=20+20)
FmtFprintfEmpty-8          33.8ns ± 3%    33.9ns ± 2%    ~     (p=0.788 n=18+20)
FmtFprintfString-8         56.4ns ± 2%    58.8ns ± 3%  +4.27%  (p=0.000 n=20+20)
FmtFprintfInt-8            61.0ns ± 2%    61.5ns ± 2%  +0.85%  (p=0.022 n=20+18)
FmtFprintfIntInt-8         92.3ns ± 1%    88.5ns ± 2%  -4.18%  (p=0.000 n=18+20)
FmtFprintfPrefixedInt-8     109ns ± 2%     114ns ± 1%  +5.26%  (p=0.000 n=20+19)
FmtFprintfFloat-8           188ns ± 2%     185ns ± 2%  -1.76%  (p=0.000 n=20+20)
FmtManyArgs-8               406ns ± 2%     387ns ± 2%  -4.76%  (p=0.000 n=20+19)
GobDecode-8                5.18ms ± 1%    5.20ms ± 2%    ~     (p=0.231 n=20+20)
GobEncode-8                4.05ms ± 1%    4.15ms ± 1%  +2.36%  (p=0.000 n=17+19)
Gzip-8                      194ms ± 2%     192ms ± 2%  -0.91%  (p=0.004 n=20+20)
Gunzip-8                   31.0ms ± 2%    29.5ms ± 2%  -4.61%  (p=0.000 n=20+20)
HTTPClientServer-8         64.4µs ± 1%    63.2µs ± 0%  -1.84%  (p=0.000 n=20+15)
JSONEncode-8               10.2ms ± 3%    10.5ms ± 2%  +2.16%  (p=0.000 n=20+19)
JSONDecode-8               42.9ms ± 2%    42.2ms ± 2%  -1.51%  (p=0.000 n=20+19)
Mandelbrot200-8            3.67ms ± 1%    3.69ms ± 2%  +0.55%  (p=0.040 n=20+20)
GoParse-8                  2.67ms ± 2%    2.67ms ± 3%    ~     (p=0.478 n=19+20)
RegexpMatchEasy0_32-8      67.2ns ± 2%    67.2ns ± 2%    ~     (p=0.783 n=20+20)
RegexpMatchEasy0_1K-8       160ns ± 1%     158ns ± 1%  -1.02%  (p=0.000 n=18+20)
RegexpMatchEasy1_32-8      63.1ns ± 2%    63.4ns ± 1%    ~     (p=0.113 n=20+18)
RegexpMatchEasy1_1K-8       281ns ± 3%     275ns ± 2%  -2.17%  (p=0.000 n=20+20)
RegexpMatchMedium_32-8      100ns ± 1%      97ns ± 1%  -2.98%  (p=0.000 n=18+20)
RegexpMatchMedium_1K-8     31.9µs ± 4%    30.7µs ± 2%  -3.78%  (p=0.000 n=20+20)
RegexpMatchHard_32-8       1.49µs ± 2%    1.49µs ± 2%    ~     (p=0.344 n=20+20)
RegexpMatchHard_1K-8       44.5µs ± 2%    44.4µs ± 1%    ~     (p=0.604 n=19+18)
Revcomp-8                   358ms ± 2%     392ms ± 3%  +9.64%  (p=0.000 n=19+20)
Template-8                 47.0ms ± 3%    47.1ms ± 2%    ~     (p=0.444 n=20+19)
TimeParse-8                 262ns ± 2%     245ns ± 2%  -6.21%  (p=0.000 n=20+19)
TimeFormat-8                262ns ± 2%     254ns ± 2%  -3.07%  (p=0.000 n=20+20)
[Geo mean]                 39.7µs         39.5µs       -0.50%

tip -> josh
name                     old time/op    new time/op    delta
BinaryTree17-8              2.29s ± 1%     2.25s ± 2%  -1.82%  (p=0.000 n=19+19)
Fannkuch11-8                2.73s ± 1%     2.82s ± 1%  +3.49%  (p=0.000 n=20+20)
FmtFprintfEmpty-8          33.8ns ± 3%    34.4ns ± 3%  +1.75%  (p=0.015 n=18+20)
FmtFprintfString-8         56.4ns ± 2%    57.3ns ± 2%  +1.61%  (p=0.000 n=20+20)
FmtFprintfInt-8            61.0ns ± 2%    61.3ns ± 3%    ~     (p=0.154 n=20+20)
FmtFprintfIntInt-8         92.3ns ± 1%    92.0ns ± 1%    ~     (p=0.119 n=18+18)
FmtFprintfPrefixedInt-8     109ns ± 2%     107ns ± 2%  -1.24%  (p=0.007 n=20+20)
FmtFprintfFloat-8           188ns ± 2%     185ns ± 1%  -1.34%  (p=0.000 n=20+18)
FmtManyArgs-8               406ns ± 2%     397ns ± 2%  -2.31%  (p=0.000 n=20+19)
GobDecode-8                5.18ms ± 1%    5.18ms ± 1%    ~     (p=0.968 n=20+20)
GobEncode-8                4.05ms ± 1%    4.05ms ± 2%    ~     (p=0.684 n=17+20)
Gzip-8                      194ms ± 2%     194ms ± 1%    ~     (p=0.659 n=20+20)
Gunzip-8                   31.0ms ± 2%    30.6ms ± 1%  -1.09%  (p=0.001 n=20+20)
HTTPClientServer-8         64.4µs ± 1%    63.8µs ± 1%  -0.99%  (p=0.000 n=20+18)
JSONEncode-8               10.2ms ± 3%    10.5ms ± 2%  +2.16%  (p=0.000 n=20+20)
JSONDecode-8               42.9ms ± 2%    41.2ms ± 2%  -3.86%  (p=0.000 n=20+20)
Mandelbrot200-8            3.67ms ± 1%    3.68ms ± 1%  +0.47%  (p=0.019 n=20+19)
GoParse-8                  2.67ms ± 2%    2.67ms ± 1%    ~     (p=0.845 n=19+18)
RegexpMatchEasy0_32-8      67.2ns ± 2%    67.7ns ± 2%  +0.66%  (p=0.028 n=20+18)
RegexpMatchEasy0_1K-8       160ns ± 1%     159ns ± 2%  -0.64%  (p=0.049 n=18+20)
RegexpMatchEasy1_32-8      63.1ns ± 2%    63.4ns ± 1%    ~     (p=0.091 n=20+18)
RegexpMatchEasy1_1K-8       281ns ± 3%     282ns ± 3%    ~     (p=0.615 n=20+20)
RegexpMatchMedium_32-8      100ns ± 1%     100ns ± 2%    ~     (p=0.636 n=18+20)
RegexpMatchMedium_1K-8     31.9µs ± 4%    31.8µs ± 2%    ~     (p=0.857 n=20+19)
RegexpMatchHard_32-8       1.49µs ± 2%    1.52µs ± 5%    ~     (p=0.073 n=20+20)
RegexpMatchHard_1K-8       44.5µs ± 2%    44.4µs ± 2%    ~     (p=0.361 n=19+20)
Revcomp-8                   358ms ± 2%     359ms ± 3%    ~     (p=0.550 n=19+20)
Template-8                 47.0ms ± 3%    46.7ms ± 2%    ~     (p=0.192 n=20+20)
TimeParse-8                 262ns ± 2%     248ns ± 2%  -5.32%  (p=0.000 n=20+20)
TimeFormat-8                262ns ± 2%     260ns ± 2%    ~     (p=0.114 n=20+20)
[Geo mean]                 39.7µs         39.6µs       -0.25%

khr -> josh
name                     old time/op    new time/op    delta
BinaryTree17-8              2.22s ± 2%     2.25s ± 2%  +1.23%  (p=0.000 n=19+19)
Fannkuch11-8                2.80s ± 2%     2.82s ± 1%  +0.56%  (p=0.040 n=20+20)
FmtFprintfEmpty-8          33.9ns ± 2%    34.4ns ± 3%  +1.68%  (p=0.011 n=20+20)
FmtFprintfString-8         58.8ns ± 3%    57.3ns ± 2%  -2.55%  (p=0.000 n=20+20)
FmtFprintfInt-8            61.5ns ± 2%    61.3ns ± 3%    ~     (p=0.627 n=18+20)
FmtFprintfIntInt-8         88.5ns ± 2%    92.0ns ± 1%  +4.03%  (p=0.000 n=20+18)
FmtFprintfPrefixedInt-8     114ns ± 1%     107ns ± 2%  -6.18%  (p=0.000 n=19+20)
FmtFprintfFloat-8           185ns ± 2%     185ns ± 1%    ~     (p=0.061 n=20+18)
FmtManyArgs-8               387ns ± 2%     397ns ± 2%  +2.57%  (p=0.000 n=19+19)
GobDecode-8                5.20ms ± 2%    5.18ms ± 1%    ~     (p=0.149 n=20+20)
GobEncode-8                4.15ms ± 1%    4.05ms ± 2%  -2.23%  (p=0.000 n=19+20)
Gzip-8                      192ms ± 2%     194ms ± 1%  +1.01%  (p=0.001 n=20+20)
Gunzip-8                   29.5ms ± 2%    30.6ms ± 1%  +3.69%  (p=0.000 n=20+20)
HTTPClientServer-8         63.2µs ± 0%    63.8µs ± 1%  +0.87%  (p=0.000 n=15+18)
JSONEncode-8               10.5ms ± 2%    10.5ms ± 2%    ~     (p=0.901 n=19+20)
JSONDecode-8               42.2ms ± 2%    41.2ms ± 2%  -2.38%  (p=0.000 n=19+20)
Mandelbrot200-8            3.69ms ± 2%    3.68ms ± 1%    ~     (p=0.813 n=20+19)
GoParse-8                  2.67ms ± 3%    2.67ms ± 1%    ~     (p=0.496 n=20+18)
RegexpMatchEasy0_32-8      67.2ns ± 2%    67.7ns ± 2%  +0.72%  (p=0.007 n=20+18)
RegexpMatchEasy0_1K-8       158ns ± 1%     159ns ± 2%    ~     (p=0.207 n=20+20)
RegexpMatchEasy1_32-8      63.4ns ± 1%    63.4ns ± 1%    ~     (p=0.808 n=18+18)
RegexpMatchEasy1_1K-8       275ns ± 2%     282ns ± 3%  +2.65%  (p=0.000 n=20+20)
RegexpMatchMedium_32-8     96.7ns ± 1%    99.6ns ± 2%  +3.05%  (p=0.000 n=20+20)
RegexpMatchMedium_1K-8     30.7µs ± 2%    31.8µs ± 2%  +3.50%  (p=0.000 n=20+19)
RegexpMatchHard_32-8       1.49µs ± 2%    1.52µs ± 5%  +2.01%  (p=0.020 n=20+20)
RegexpMatchHard_1K-8       44.4µs ± 1%    44.4µs ± 2%    ~     (p=0.691 n=18+20)
Revcomp-8                   392ms ± 3%     359ms ± 3%  -8.54%  (p=0.000 n=20+20)
Template-8                 47.1ms ± 2%    46.7ms ± 2%  -0.69%  (p=0.028 n=19+20)
TimeParse-8                 245ns ± 2%     248ns ± 2%  +0.95%  (p=0.008 n=19+20)
TimeFormat-8                254ns ± 2%     260ns ± 2%  +2.58%  (p=0.000 n=20+20)
[Geo mean]                 39.5µs         39.6µs       +0.25%

append benchmarks from #14758:

all three
name \ time/op  append-tip      append-khr      append-josh
Foo-8              324ns ± 2%     486ns ± 2%     294ns ± 2%
Bar-8              435ns ± 2%     579ns ± 1%     434ns ± 2%

tip -> khr
Foo-8          324ns ± 2%     486ns ± 2%  +50.12%  (p=0.000 n=27+29)
Bar-8          435ns ± 2%     579ns ± 1%  +33.03%  (p=0.000 n=30+29)
[Geo mean]     375ns          530ns       +41.32%

tip -> josh
name        old time/op    new time/op    delta
Foo-8          324ns ± 2%     294ns ± 2%   -9.32%  (p=0.000 n=27+30)
Bar-8          435ns ± 2%     434ns ± 2%     ~     (p=0.231 n=30+30)
[Geo mean]     375ns          357ns        -4.93%

khr -> josh
name        old time/op    new time/op    delta
Foo-8          486ns ± 2%     294ns ± 2%  -39.60%  (p=0.000 n=29+30)
Bar-8          579ns ± 1%     434ns ± 2%  -25.07%  (p=0.000 n=29+30)
[Geo mean]     530ns          357ns       -32.72%

compilebench:

all three
name \ time/op       cb-tip       cb-khr       cb-josh
Template              205ms ± 5%   204ms ± 5%   205ms ± 6%
Unicode              85.0ms ± 5%  84.3ms ± 5%  83.7ms ± 4%
GoTypes               530ms ± 2%   526ms ± 3%   526ms ± 4%
Compiler              2.44s ± 3%   2.44s ± 3%   2.43s ± 3%
SSA                   4.89s ± 2%   4.86s ± 2%   4.85s ± 2%
Flate                 118ms ± 4%   117ms ± 3%   116ms ± 2%
GoParser              139ms ± 4%   138ms ± 4%   138ms ± 3%
Reflect               334ms ± 3%   330ms ± 4%   331ms ± 3%
Tar                   102ms ± 5%   102ms ± 5%   100ms ± 4%
XML                   192ms ± 8%   196ms ±10%   197ms ±10%
[Geo mean]            322ms        320ms        320ms     

tip -> khr
name        old time/op       new time/op       delta
Template          205ms ± 5%        204ms ± 5%    ~     (p=0.105 n=96+95)
Unicode          85.0ms ± 5%       84.3ms ± 5%  -0.75%  (p=0.003 n=99+98)
GoTypes           530ms ± 2%        526ms ± 3%  -0.65%  (p=0.000 n=99+96)
Compiler          2.44s ± 3%        2.44s ± 3%    ~     (p=0.492 n=97+99)
SSA               4.89s ± 2%        4.86s ± 2%  -0.57%  (p=0.000 n=97+98)
Flate             118ms ± 4%        117ms ± 3%  -0.65%  (p=0.001 n=99+96)
GoParser          139ms ± 4%        138ms ± 4%  -0.73%  (p=0.001 n=99+100)
Reflect           334ms ± 3%        330ms ± 4%  -1.12%  (p=0.000 n=98+96)
Tar               102ms ± 5%        102ms ± 5%  -0.47%  (p=0.032 n=97+98)
XML               192ms ± 8%        196ms ±10%    ~     (p=0.091 n=86+100)
[Geo mean]        322ms             320ms       -0.35%

tip -> josh
Template          205ms ± 5%        205ms ± 6%    ~     (p=0.672 n=96+97)
Unicode          85.0ms ± 5%       83.7ms ± 4%  -1.48%  (p=0.000 n=99+95)
GoTypes           530ms ± 2%        526ms ± 4%  -0.69%  (p=0.000 n=99+98)
Compiler          2.44s ± 3%        2.43s ± 3%    ~     (p=0.119 n=97+100)
SSA               4.89s ± 2%        4.85s ± 2%  -0.67%  (p=0.000 n=97+99)
Flate             118ms ± 4%        116ms ± 2%  -1.51%  (p=0.000 n=99+90)
GoParser          139ms ± 4%        138ms ± 3%  -1.32%  (p=0.000 n=99+99)
Reflect           334ms ± 3%        331ms ± 3%  -0.88%  (p=0.000 n=98+97)
Tar               102ms ± 5%        100ms ± 4%  -1.70%  (p=0.000 n=97+97)
XML               192ms ± 8%        197ms ±10%  +2.60%  (p=0.040 n=86+100)
[Geo mean]        322ms             320ms       -0.60%

khr -> josh
name        old time/op       new time/op       delta
Template          204ms ± 5%        205ms ± 6%    ~     (p=0.061 n=95+97)
Unicode          84.3ms ± 5%       83.7ms ± 4%  -0.74%  (p=0.005 n=98+95)
GoTypes           526ms ± 3%        526ms ± 4%    ~     (p=0.962 n=96+98)
Compiler          2.44s ± 3%        2.43s ± 3%    ~     (p=0.398 n=99+100)
SSA               4.86s ± 2%        4.85s ± 2%    ~     (p=0.327 n=98+99)
Flate             117ms ± 3%        116ms ± 2%  -0.86%  (p=0.000 n=96+90)
GoParser          138ms ± 4%        138ms ± 3%  -0.60%  (p=0.004 n=100+99)
Reflect           330ms ± 4%        331ms ± 3%    ~     (p=0.187 n=96+97)
Tar               102ms ± 5%        100ms ± 4%  -1.24%  (p=0.000 n=98+97)
XML               196ms ±10%        197ms ±10%    ~     (p=0.919 n=100+100)
[Geo mean]        320ms             320ms       -0.25%

josharian · 2017-05-16T19:25:50Z

Another data point. For the code in #16122:

all three
name \ time/op  tip         khr         josh
Speed/encode-8  438µs ± 2%  430µs ± 1%  374µs ± 0%
Speed/decode-8  356µs ± 1%  314µs ± 1%  320µs ± 1%

tip -> khr
name            old time/op  new time/op  delta
Speed/encode-8   438µs ± 2%   430µs ± 1%   -1.81%  (p=0.000 n=10+10)
Speed/decode-8   356µs ± 1%   314µs ± 1%  -11.76%  (p=0.000 n=10+9)

tip -> josh
name            old time/op  new time/op  delta
Speed/encode-8   438µs ± 2%   374µs ± 0%  -14.71%  (p=0.000 n=10+9)
Speed/decode-8   356µs ± 1%   320µs ± 1%   -9.93%  (p=0.000 n=10+10)

khr -> josh
name            old time/op  new time/op  delta
Speed/encode-8   430µs ± 1%   374µs ± 0%  -13.13%  (p=0.000 n=10+9)
Speed/decode-8   314µs ± 1%   320µs ± 1%   +2.07%  (p=0.000 n=9+10)

Again, seems like josh is a bit better than khr, but an appropriate combo might be better yet.

josharian · 2017-05-16T19:51:52Z

(foolish comment deleted)

DO NOT SUBMIT This CL adds CFG graphs to ssa.html. It execs dot to generate SVG, which then gets inlined into the html. Some standard naming and javascript hacks enable integration with the rest of ssa.html: Clicking on blocks highlights the relevant part of the CFG graph, and vice versa. Sample output and screenshots can be seen in issues golang#20355 and golang#20356. There are serious problems with this CL, though. Performance: * Calling dot after every pass is noticeably slow. * The generated output is giant. * The browser is very slow to render the resulting page. * Clicking on blocks is even slower than before. * Some things I want to do, like allow the user to change the table column widths, lock up the browser. Appearance: * The CFGs can easily be large and overwhelming. Toggling them on/off might be workable, if the performance concerns above were addressed. * I can't figure out the right size to render the CFGs; simple ones are currently oversized and cartoonish, while larger ones are unreadable. * They requires an unsatisfying amount of explanation (see golang#20356). Block layout information is particularly inferior/confusing. * Dead blocks float awkwardly in the sky, with no indication that they are dead. * It'd be nice to somehow add visual information about loops, which we can calculate, and which is non-obvious in large graphs, but I don't know how. * It'd be nice to add more information per block, like the number of values it contains, or even the values themselves, but adding info to a node makes the graph even less readable. Just adding the f.Blocks index in parens was not good. Bugs, incompleteness: * I seem to have broken highlighting around the entire block in the text. * Need to hook up some way to communicate dot-related errors without bringing down the process. * Might need some way to enable/disable dot entirely. Change-Id: I19abc3007f396bdb710ba7563668d343c0924feb

josharian added the Performance label May 13, 2017

josharian added this to the Go1.9 milestone May 13, 2017

josharian changed the title ~~cmd/compile: looprotate disrupting non-loops~~ cmd/compile: looprotate picking wrong jump target May 13, 2017

This was referenced May 13, 2017

cmd/compile: Fannkuch11 on AMD64 slow down 6% after removing assembler backend instruction reordering #18977

Open

cmd/compile: lay out loop-free, likeliness-free control flow more compactly #20356

Open

gopherbot closed this as completed in 638ebb0 May 18, 2017

josharian mentioned this issue May 18, 2017

cmd/compile: improve loop rotation #20411

Open

golang locked and limited conversation to collaborators May 18, 2018

gopherbot added the FrozenDueToAge label May 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: looprotate picking wrong jump target #20355

cmd/compile: looprotate picking wrong jump target #20355

josharian commented May 13, 2017

josharian commented May 13, 2017

gopherbot commented May 14, 2017

josharian commented May 14, 2017

gopherbot commented May 14, 2017

randall77 commented May 15, 2017

gopherbot commented May 15, 2017

josharian commented May 16, 2017

josharian commented May 16, 2017

josharian commented May 16, 2017 •

edited

Loading

cmd/compile: looprotate picking wrong jump target #20355

cmd/compile: looprotate picking wrong jump target #20355

Comments

josharian commented May 13, 2017

josharian commented May 13, 2017

gopherbot commented May 14, 2017

josharian commented May 14, 2017

gopherbot commented May 14, 2017

randall77 commented May 15, 2017

gopherbot commented May 15, 2017

josharian commented May 16, 2017

josharian commented May 16, 2017

josharian commented May 16, 2017 • edited Loading

josharian commented May 16, 2017 •

edited

Loading