-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MIR] Initial implementation of inlining #36593
Conversation
r? @arielb1 (rust_highfive has picked a reviewer for you, use r? to override) |
/cc @pcwalton |
I'm currently changing the spans in the inlined MIR to the span of the callsite, which isn't ideal. However, keeping the spans from the original function triggers a multibyte-related assertion. @michaelwoerister any ideas about what might cause that? |
Where can I read about motivation for this? Why does MIR inlining make sense in presence on LLVM inlining? |
@petrochenkov From what I've been told, the main motivation for any optimization on MIR is to improve generic code before it's copied all over the place as part of monomorphization (similar reasoning applies to (I asked on IRC because, like you, I couldn't find a document describing this. There might not be one.) |
One major motivation is that it improves compile times. Not by a huge amount, but I saw about a 10% improvement in codegen+LLVM pass times when compiling libstd. |
|
||
impl fmt::Debug for Location { | ||
fn fmt(&self, fmt: &mut fmt::Formatter) -> fmt::Result { | ||
write!(fmt, "{:?}[{}]", self.block, self.statement_index) | ||
} | ||
} | ||
|
||
impl<'tcx> TypeFoldable<'tcx> for Mir<'tcx> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could these go to mir/tcx
?
let terminator = caller_mir[callsite.bb].terminator.take().unwrap(); | ||
match terminator.kind { | ||
TerminatorKind::Call { | ||
func: _, args, destination: Some(destination), cleanup } => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: weird style
let mut first_block = true; | ||
let mut cost = 0; | ||
|
||
for blk in callee_mir.basic_blocks() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be a MIR visitor instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The early returns makes the direct way a bit easier. It could be a visitor, but I'd rather land this and then refactor once the early returns aren't necessary any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I don't really care, but it seems like we could use cost.saturating_add()
and just set cost to INT_MAX
or something instead of early return.)
} | ||
|
||
struct Integrator<'a, 'tcx: 'a> { | ||
block_idx: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment on what this is?
Not off the top of my head, unfortunately. |
@Aatch Handling unwinding should be pretty easy as far as I can tell. Namely you have three patterns:
Note, how you do not need to handle any of the unwind edges inside the callee or caller other than remembering which block the |
@petrochenkov To add to what others have said, it improves other optimizations. For example (and this is what I want it for), if you inline |
☔ The latest upstream changes (presumably #36388) made this pull request unmergeable. Please resolve the merge conflicts. |
@nagisa I did have it inlining in more cases, but I ran into issues. I reduced the scope of the pass so I can get something landed first. Handling more cases can be done later. |
@nagisa I'm not sure about that third case. If there's no cleanup to be done, then there won't be a |
Ah right, you are absolutely right. Its also a problem in the other cases Ideally we'd want something like the nounwind LLVM attribute but relying on On Sep 21, 2016 5:34 AM, "James Miller" [email protected] wrote:
|
Anything I can do to help this along? |
@pcwalton I'm just about to push a rebased version with a few other improvements (I fixed the unwinding-related problem). I still need to figure out the debuginfo thing. It's not strictly necessary, as the pass is currently opt-in, but I'd like to at least figure out why it's being breaking, even if it doesn't get fixed so I can make a note of it like I did with the reachable symbols thing. So investigating debuginfo would probably be a big help. |
@Aatch How do I reproduce this bug? I've tried in various ways on Linux and can't seem to. |
@bors: try |
⌛ Trying commit 312b9df with merge 41d36b4... |
I feel confident in saying this is good enough for a first pass, with the understanding that the pass itself is still unstable and may have unknown bugs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Only had a few nits. Thanks so much for your work!
@nikomatsakis I think you should take a look at this too.
return_ty: self.return_ty.fold_with(folder), | ||
var_decls: self.var_decls.fold_with(folder), | ||
arg_decls: self.arg_decls.fold_with(folder), | ||
temp_decls: self.temp_decls.fold_with(folder), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: odd indentation here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
let kind = match self.kind { | ||
Assign(ref lval, ref rval) => Assign(lval.fold_with(folder), rval.fold_with(folder)), | ||
SetDiscriminant { ref lvalue, variant_index } => SetDiscriminant{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: put a space before the {
at end of line to be consistent with the pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Call { ref func, ref args, ref destination, cleanup } => { | ||
let dest = if let Some((ref loc, dest)) = *destination { | ||
Some((loc.fold_with(folder), dest)) | ||
} else { None }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could use destination.map()
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh but then you need destination.as_ref().map()
-- but that's ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
} | ||
|
||
#[inline] | ||
pub fn get_mut<'a>(&'a mut self, index: I) -> Option<&'a mut T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think you can lifetime-elide the lifetimes here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
callgraph | ||
} | ||
|
||
pub fn scc_iter<'g>(&'g self) -> SCCIterator<'g> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think you can lifetime elide 'g
here
false | ||
} else { | ||
true | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should just have an is_nop()
function on Statement
. I probably should have written one, sorry about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be in SimplifyCfg
?
|
||
if Some(*target) == *cleanup { | ||
*cleanup == None; | ||
} else if !self.in_cleanup_block{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a space before the {
at end of line
@@ -532,8 +528,58 @@ impl<'a, 'tcx> Inliner<'a, 'tcx> { | |||
let return_block = destination.1; | |||
|
|||
// Copy the arguments if needed. | |||
let args : Vec<_> = { | |||
let args : Vec<_> = if is_box_free { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yuck! Shouldn't this be done at HIR→MIR lowering time?
This is fine for now, but I'd factor it out into a separate function, with a big ol' FIXME on top of it :)
highlight_end: h_end, | ||
} | ||
h_end: usize) -> Option<DiagnosticSpanLine> { | ||
fm.get_line(index).map(|text| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment here saying what could cause this to return None
(as the commit message does).
note_const_eval_err(bcx.tcx(), &err, span, "expression", &mut diag); | ||
diag.emit(); | ||
if let Some(span) = self.diag_span(span, terminator.source_info.scope) { | ||
let err = ConstEvalErr{ span: span, kind: err }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: put a space before {
. (I know you just copy and pasted code from above, but might as well clean this up while you're here.)
☔ The latest upstream changes (presumably #36639) made this pull request unmergeable. Please resolve the merge conflicts. |
Inlining needs to apply the substitutions from the callsite. Also implements TypeFoldable on IndexVec
The deaggregator will generate assignments that aren't allowed in const expressions. This also refactors `is_const_fn` into a method on `TyCtxt`.
|
||
match self.kind { | ||
Assign(ref lval, ref rval) => { lval.visit_with(visitor) || rval.visit_with(visitor) } | ||
SetDiscriminant { ref lvalue, .. } | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: similarly here, can we spell out the fields like x: _
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
use mir::repr::TerminatorKind::*; | ||
|
||
match self.kind { | ||
If { ref cond, .. } => cond.visit_with(visitor), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: similarly here I'd avoid the ..
if we can; it's so easy for these visitors to go wrong, and so hard to track down
fn super_fold_with<'gcx: 'tcx, F: TypeFolder<'gcx, 'tcx>>(&self, folder: &mut F) -> Self { | ||
match self { | ||
&Lvalue::Projection(ref p) => Lvalue::Projection(p.fold_with(folder)), | ||
_ => self.clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
kind: &TerminatorKind<'tcx>, _loc: Location) { | ||
if let TerminatorKind::Call { | ||
func: Operand::Constant(ref f) | ||
, .. } = *kind { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this formatting is pretty hard to read... can't we get it on one line? maybe use a match
? Anything but this :P
children: graph::AdjacentTargets<'g, DefId, ()> | ||
} | ||
|
||
pub struct SCCIterator<'g> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be factored into librustc_data_structures
, and ideally the graph algorithms portion of that code. But for now just open a FIXME about it.
|
||
//! MIR-based callgraph. | ||
//! | ||
//! This only considers direct calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not an issue, but it's interesting that this is an "optimistic" call graph -- i.e., it is only those things that we know will be called. Often people would assume a pessimistic approach. Maybe worth highlighting just a bit in the comment here.
}; | ||
let src = MirSource::from_node(self.tcx, id); | ||
if let MirSource::Fn(_) = src { | ||
let mir = if let Some(m) = map.map.get(&def_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason for this to be absent? I guess cross-crate cases, eh?
continue; | ||
}; | ||
let src = MirSource::from_node(self.tcx, id); | ||
if let MirSource::Fn(_) = src { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'd rather see
match MirSource::from_node(self.tcx, id) {
MirSource::Fn(_) => continue,
_ => (),
}
let mut changed = false; | ||
|
||
loop { | ||
local_change = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: why not let mut local_change = false;
?
|
||
csi -= 1; | ||
if scc.len() == 1 { | ||
callsites.swap_remove(csi); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, I feel like the ordering tricks here are getting a bit subtle.
First we sorted callsites
so that things in the SCC were at the end -- but now we are calling swap_remove
, which usually only makes sense if ordering doesn't matter -- and this is part of a loop? So on the next iteration, the ordering is going to be messed up?
Perhaps the scc.len() == 1
condition means this is not true?
Maybe it'd be simpler to just keep a parallel "merged" array of booleans and set merged[csi] = true
or something?
EDIT: Oh, and we are growing the array too...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If scc.len() == 1
then there's only one function in the current SCC, so the order doesn't matter. The order is just between "not in SCC" and "in SCC". The thing is, most SCCs are a single function, so in that case, the order doesn't matter at all. It's just a heuristic, the order doesn't matter for correctness, it just prioritises inlining from outside the SCC first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If scc.len() == 1 then there's only one function in the current SCC, so the order doesn't matter. The order is just between "not in SCC" and "in SCC". The thing is, most SCCs are a single function, so in that case, the order doesn't matter at all. It's just a heuristic, the order doesn't matter for correctness, it just prioritises inlining from outside the SCC first.
OK, that makes sense. This should at minimum be a comment. We can iterate on the overall structure later.
foreign_mir.as_ref().map(|m| &**m) | ||
}; | ||
|
||
let callee_mir = if let Some(m) = callee_mir { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would this ever be None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calls to functions that don't have a MIR. Foreign functions, for example.
let terminator = bb_data.terminator(); | ||
if let TerminatorKind::Call { | ||
func: Operand::Constant(ref f), .. } = terminator.kind { | ||
if let ty::TyFnDef(callee_def_id, substs, _) = f.ty.sty { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sort of duplicates the logic in the callgraph code, right? Perhaps we can have a helper that enumerates the direct callees of a given MIR block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also duplicates the logic just a half page up in this same function, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the ordering important? Won't we iterate up to a fixed point anyway?
if let TerminatorKind::Call { | ||
func: Operand::Constant(ref f), .. } = terminator.kind { | ||
if let ty::TyFnDef(callee_def_id, substs, _) = f.ty.sty { | ||
// Don't inline the same function multiple times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, more specifically this prevents us from inlining recursive calls from the callee -- but these could be indirect and still be a problem, right?
For example:
fn foo() {
bar();
}
fn bar() {
baz();
}
fn baz() {
bar();
}
What prevents us from inlining bar into foo, then inlining baz into foo, then bar again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's even cost-effective across the board to inline non-leaves without a more complex analysis.
// Attempts to get an appropriate span for diagnostics. We might not have the source | ||
// available if the span is from an inlined crate, but there's usually a decent span | ||
// in an ancestor scope somewhere | ||
fn diag_span(&self, span: Span, scope: mir::VisibilityScope) -> Option<Span> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have tests for this?
* Iterator over strongly-connected-components using Tarjan's algorithm[1] | ||
* | ||
* [1]: https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: ///
, not that I care
Why is the ordering important? Won't we iterate up to a fixed point anyway? I think the nicest way to do this is:
We might want to keep a cache for I think we need something to avoid being utterly slow with huge SCCs (e.g. not inlining any functions with >30 basic blocks, and not even trying conditional inlining). |
I confess I don't quite know. @Aatch wrote this: "It's just a heuristic, the order doesn't matter for correctness, it just prioritises inlining from outside the SCC first." I think I'm sort of lacking the big picture here.
Probably so. That said, I think I'd like to be landing nice patches like this, particularly since it's gated on -O2. I guess there is some question but I feel like iterating in tree (behind a flag) is usually better than spinning on PRs for too long. We do however need to build up some good infrastructure for doing crater-like testing that tracks compile time though -- we wound up kind of skimping there for MIR and, while I don't regret it, it's a shame. It'd be great to be able to build up a MIR optimization infrastructure and test it before we "throw the switch". I think though that cargo-apply is getting pretty robust between the improvements that @brson and I made. So most of the pieces are probably in place. |
I'm not sure SCC inlining is such a good idea - for one, I can't figure out a good way of extending it to a case where we try to inline (non-specializable) trait methods, and for another, I can't find any good "killer app" for it. I would prefer to just never inline functions in the same SCC - this can deal with trait dependencies by executing Tarjan's algorithm online. |
☔ The latest upstream changes (presumably #36752) made this pull request unmergeable. Please resolve the merge conflicts. |
@Aatch Let me know if you're going to fix this up soon—if not then I can do it :) |
ping, what's the status of this? |
@pcwalton are you still interested in following up on this work? Seems important this keeps moving. |
@pcwalton Are you still doing this? If not I'd like to pick it up. |
@alexg117 looks like you can pick this up. What say you? |
@alexg117 @nikomatsakis @pcwalton @Aatch I will update this. I'll let you know when you can close this PR in lieu of a freshly-rebased PR. |
@mrhota great! I think I'll just close it now because of the long period of inactivity, but please do open another PR. |
Implements basic inlining for MIR functions.
Inlining is currently only done if
mir-opt-level
is set to 2.Does not handle debuginfo completely correctly at the moment.