-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use existing framework for backward dataflow analyses #71006
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM at a glance
r? @pnkfelix |
PlaceContext::MutatingUse(MutatingUseContext::Call) | ||
| PlaceContext::MutatingUse(MutatingUseContext::Yield) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PlaceContext::MutatingUse(MutatingUseContext::Call) | |
| PlaceContext::MutatingUse(MutatingUseContext::Yield) => { | |
PlaceContext::MutatingUse(MutatingUseContext::Call | MutatingUseContext::Yield) => { |
b0311bc
to
e98838d
Compare
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 21b30e641d32ff5312258093bb349577d80a6587 with merge 5efb242d5baf509eed2d5015d3d2d01efd652224... |
☀️ Try build successful - checks-azure |
Queued 5efb242d5baf509eed2d5015d3d2d01efd652224 with parent 7688266, future comparison URL. |
d807451
to
3fd69b4
Compare
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
3fd69b4
to
28ffc91
Compare
I'm now using an associated type instead of an associated const to indicate the direction of a dataflow analysis. This allowed me to remove some @pnkfelix Let me know what I can do to ease review of this. An intra-block liveness analysis will make some MIR optimizations easier to implement, and I'd like to start experimenting. It will also be useful for #51003. |
☔ The latest upstream changes (presumably #71323) made this pull request unmergeable. Please resolve the merge conflicts. |
src/librustc_middle/mir/cache.rs
Outdated
assert!( | ||
cache.predecessors.is_some(), | ||
"Cannot construct ReadOnlyBodyAndCache without computed predecessors" | ||
); | ||
Self { body, cache } | ||
} | ||
|
||
pub fn new_unchecked(body: &'a Body<'tcx>, cache: &'a Cache) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method doesn't have any documentation, so I cannot tell if it is meant to be an efficiency hack (which seems very unlikely, since an assert of is_some()
should be incredibly cheap to check), or if it is something where at the call site (or at future expected call sites) you cannot guarantee that cache.predecessors.is_some()
.
(I briefly tried to infer the intent by looking at the diff, so that I might suggest what comment to write here, or whether to advise removing the method entirely... From what I can tell, at the single call site where this is called, you are invoking cache.ensure_predecessors
before this is invoked for the backwards flow case, so that will ensure that cache.predecessors.is_some()
. But I'm guessing that predecessors is none for the forwards flow case...)
This is all a long winded way of saying that you should spell out in a doc-comment under what circumstances you expect ReadOnlyBodyAndCache::new_unchecked
to be invoked rather than ReadOnlyBodyAndCache::new
. (If the whole system doesn't require cache.predecessors.is_some()
, then why have the assert in fn new
? And if the whole system does require it, then when can this be safely used?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the part that will be removed after #71044 lands. Sorry. That should have been in the commit message along with HACK
.
This is here because we want join_state_into_successors
to have the same signature for both forward and backward analyses. A ReadOnlyBodyAndCache
is required to access a basic block predecessors, but the current API requires that you always compute and cache the predecessor graph before creating a ReadOnlyBodyAndCache
. This would be wasted work for the vast majority of dataflow analyses, which run forward.
FWIW, figuring this out how this was supposed to work caused me a great deal of consternation and was the motivation for #71044, which goes back to using Body
everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should fold #71044 into this PR? I'll finish looking at this PR first before I look at that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh but perhaps you are concerned about performance issues with PR #71044, while you are confident that this PR should not inject performance problems, and thus it would be better to land them separately; is that a correct inference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I was also concerned about regressions from this PR. I wanted to benchmark #71044 separately from this one, since the concerns/potential performance issues are very different in that PR (additional locking and slow serialization) than this one (renaming APIs and additional layers of abstraction).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've rebased on the latest master, which includes #71044. The HACK commit has been removed. Now both forward and backward dataflow analyses can use Body
everywhere, and the reverse CFG will only be computed when necessary (see the most recent commit).
A general comment on one particular change:
I believe I understand your motivation for trying to find an alternative to "entry set", but I think this rename makes things more confusing. I'm going to spell out my thinking explicitly here; little (or perhaps none) of this will be news to you, @ecstatic-morse , but I want to write it down to justify why I think the above rename is ill chosen. You can skip to the Bikeshed section at the end if you like. DigressionFIxed PointsA fixed point of a function f: D -> D is some input d that maps to itself (i.e. f(d) = d). In the context of a dataflow analysis, the domain D are the bitvectors that correspond to the state of the analysis at various points in the control flow of the program, and the analysis searches for a set of bitvectors such that the transfer functions attached to each block (and the operations for combining them at control flow merge points) end up causing no change to those bitvectors (i.e., each such bitvector is mapped to itself). In short, to me (in the context of a dataflow analysis) a fixpoint is a point in the state-space for the bitvectors themselves. Entry setsIn the data flow code itself, the entry set for a block is the bitvector associated with a particular point in the control flow of the block. The term "fixpoint" for this thing seems inappropriate. The entry set may or may not be a bitvector that has reached a fixed point; in general that will depend on whether the whole analysis has reached a fixed point. And the entry set is definitely used in the code before the analysis has reached a fixed point. In any case, it just seems like a mismatch to use a term that describes a point in the analysis state space to describe a part of the data structure that is most strongly associated with control flow. The term "entry set" is meant to convey that this particular bitvector is the one that represents the analysis state on entry to the block. As I said above, I can understand wanting to rename this, since one is likely to think that the entry point corresponds to when control flows into the block; but for a backwards analysis, these bitvectors stored for each block will be associated with the point where control flows out of the block (i.e., the point immediately after the block’s Terminator). If “entry” is too strongly correlated with (forward) control-flow entry, then “entry set” is a problem. BikeshedSo the options here as I see them are either:
|
@pnkfelix's concern is a valid one. As they hypothesize above, I found it hard to remember whether the "entry" of a block was in control-flow order (always before the first statement) or dataflow order (varies based on the direction of the analysis) while implementing this. I will point out that, once I think that being very clear about the definition of "entry" within the dataflow framework is a valid approach. To me, this implies that we would refer to positions in control-flow order using "start" and "end" rather than "entry" and "exit". One of the reasons I chose "entry" and "exit" over "start" and "end" for control-flow order was somewhat silly: The term "block start" would frequently appear next to the So re: the bikeshed, I would choose to use "block start" and "block end" while preserving the name "entry set" for the dataflow state. The other alternatives are also acceptable. |
d7f310c
to
005a57b
Compare
☔ The latest upstream changes (presumably #71539) made this pull request unmergeable. Please resolve the merge conflicts. |
⌛ Testing commit 512b1edc88ce854301fd928bf19bedd02d51eae8 with merge 997203d00bb9b07794a632d4e8e0976428e04a3c... |
💔 Test failed - checks-azure |
...emulating `MutatingUseContext::Call`
512b1ed
to
21c72b6
Compare
@bors r+ |
📌 Commit 21c72b6 has been approved by |
☀️ Test successful - checks-azure |
This PR adds support for backward analyses to the dataflow framework and adds a new live variable analysis (based on the existing one in
librustc_mir/util/liveness.rs
). By adding these to the framework instead of having a separate API, all newly implemented backward dataflow analyses get cursors/visitors,rustc_peek
tests, and graphviz visualizations for free. In the near-term, this makes it much easier to implement global dead-store elimination, and I believe that this will enable even more MIR optimizations in the future.This PR makes many changes to the dataflow API, since some concepts and terminology only make sense in forward dataflow. Below is a list of the important changes.
entry_set
->fixpoint
(the fixpoint for backward dataflow problems is after the block's terminator)seek_{before,after}
->seek_{before,after}_primary_effect
(the unprefixed dataflow effect is now referred to as the "primary" effect instead of the "after" effect. The "before" effect remains the same, although I considered changing it to the "antecedent" effect. In both backward and forward dataflow, the "before" effect is applied prior to the "primary" effect. I feel very strongly that this is the correct choice, as it means consumers don't have to switch betweenseek_before
andseek_after
based on the direction of their analysis.seek_after_assume_call_returns
is now gone. Users can useResultsCursor::apply_custom_effect
to emulate it.visit_{statement,terminator}_exit
->visit_{statement,terminator}_after_primary_effect
visit_{statement,terminator}
->visit_{statement,terminator}_before_primary_effect
Implementing this also required refactoring the dataflow cursor implementation so it could work in both directions. This is a large percentage of the diff, since the cursor code is rather complex. The fact that the cursor is exhaustively tested in both directions should reassure whomever is unlucky enough to review this 🤣.
In order to avoid computing the reverse CFG for forward dataflow analyses, I've added some hacks to the existing
mir::BodyAndCache
interface. I've requested changes to this interface that would let me implement this more efficiently.r? @eddyb (feel free to reassign)
cc @rust-lang/wg-mir-opt