-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: looprotate picking wrong jump target #20355
Comments
CL https://golang.org/cl/43464 mentions this issue. |
Started looking at this. Have some ideas. |
CL https://golang.org/cl/43491 mentions this issue. |
(You don't mean to say that b7 is the loop header, do you? b11 is the header. b7 isn't even in the loop.) I agree that this second reordering is not desirable. From my perspective, that's just because the branch at the end of b11 isn't an exit-loop branch. Maybe that should be part of the condition for triggering the rotate? Then we'd only move single blocks. Maybe the condition is that if we rotate the loop, the non-local edge out of the header is now a fallthrough edge (the non-loop successor is the immediately next block). I think your CL 43491 is correct, but it may be overkill. |
CL https://golang.org/cl/43502 mentions this issue. |
I ran some benchmarks with CL 43491 ("josh") and CL 43502 ("khr") on top of 5548f7d ("tip"). For each, I'll show all three together, tip -> khr, tip -> josh, and khr -> josh. It's lots of data, but it's hard to make sense of without all the pairwise stats. Looking through these, I'm weakly inclined to go for the only slightly more complicated josh CL, mostly on the basis of the macro (compilebench) benchmark and the big append regression. I also think it is interesting that the khr CL has a big +/- impact on lots of go1 benchmarks, but the josh CL does not. It makes me suspect that there are better heuristics available to decide whether to not unroll (khr) or move many blocks (josh). I think we should pull in either khr or josh for 1.9 and then revisit in 1.10 whether they can be fruitfully combined. Other changes to the layout pass may also affect this. @randall77 now that we have some data, please make an executive decision about how to proceed. Fannkuch running alone:
All go1 benchmarks:
append benchmarks from #14758:
compilebench:
|
Another data point. For the code in #16122:
Again, seems like josh is a bit better than khr, but an appropriate combo might be better yet. |
(foolish comment deleted) |
DO NOT SUBMIT This CL adds CFG graphs to ssa.html. It execs dot to generate SVG, which then gets inlined into the html. Some standard naming and javascript hacks enable integration with the rest of ssa.html: Clicking on blocks highlights the relevant part of the CFG graph, and vice versa. Sample output and screenshots can be seen in issues golang#20355 and golang#20356. There are serious problems with this CL, though. Performance: * Calling dot after every pass is noticeably slow. * The generated output is giant. * The browser is very slow to render the resulting page. * Clicking on blocks is even slower than before. * Some things I want to do, like allow the user to change the table column widths, lock up the browser. Appearance: * The CFGs can easily be large and overwhelming. Toggling them on/off might be workable, if the performance concerns above were addressed. * I can't figure out the right size to render the CFGs; simple ones are currently oversized and cartoonish, while larger ones are unreadable. * They requires an unsatisfying amount of explanation (see golang#20356). Block layout information is particularly inferior/confusing. * Dead blocks float awkwardly in the sky, with no indication that they are dead. * It'd be nice to somehow add visual information about loops, which we can calculate, and which is non-obvious in large graphs, but I don't know how. * It'd be nice to add more information per block, like the number of values it contains, or even the values themselves, but adding info to a node makes the graph even less readable. Just adding the f.Blocks index in parens was not good. Bugs, incompleteness: * I seem to have broken highlighting around the entire block in the text. * Need to hook up some way to communicate dot-related errors without bringing down the process. * Might need some way to enable/disable dot entirely. Change-Id: I19abc3007f396bdb710ba7563668d343c0924feb
This is easier to explain with some visuals. They're a bit complicated; bear with me. Ask if you have questions about how to read them.
The screenshots below are of ssa.html with a CFG of the fannkuch benchmark, including CL 43293. If you want to look around yourself, here's the ssa.html. Be patient--this is hard on your browser. To click to highlight a node you must click on the text in the node, not anywhere in the ellipse, because I hate javascript.
First, here's looprotate doing a good job.
The nodes are blocks. The node labels contain block id and block kind; the number in parens indicates its index in f.Blocks (i.e. where is has been laid out). Edges are labeled with successor numbers. Dashed edges are unlikely. Green edges indicate adjacency in f.Blocks. Dotted green edges aren't part of the CFG; they are there only to show layout adjacency. Node coloring was done manually, by clicking on them.
You can see that b3 is the header for the loop b4 b5 b6, which can panic (via b9) or terminate normally (via b7). Looprotate changes the layout so that after b3 comes b5, not b4. Observe the new dotted green edge and updated parenthesized indices.
Now here's looprotate doing something untoward. Same graph, different highlighted nodes. The second screenshot shows the bottom of the graph.
b7 is a loop header for a very big loop starting at b11 (observe the edge coming in from b12 offscreen below). But b11 is not the node that decides whether the loop should continue. That's b61, which (if still looping) continues to b12, which returns to b11.
However, looprotate bypasses b11 in favor of b14, which doesn't really make much sense. The third screenshot shows what this looks like after the trim pass. Definitely doesn't seem like ideal code layout.
cc @randall77
The text was updated successfully, but these errors were encountered: