-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite the pattern matching code in trans. #27050
Conversation
The old code was not well structured, difficult to understand, and buggy. The new implementation is completely naive, so there may be a slight runtime performance loss. That said, adding optimizations on top of a clear and correct implementation seems easier than trying to fix the old mess. Fixes issue rust-lang#19064. Fixes issue rust-lang#26989. Fixes issue rust-lang#26251. Fixes issue rust-lang#18060. Fixes issue rust-lang#24875. Fixes issue rust-lang#23311. Fixes issue rust-lang#20046.
r? @brson (rust_highfive has picked a reviewer for you, use r? to override) |
Store(bcx, info, expr::get_len(bcx, scratch.val)); | ||
field_datum = Datum::new(scratch.val, scratch.ty, Lvalue); | ||
} | ||
bcx = compile_pattern(bcx, field_datum, &field_pat.node.pat, failure_dest, pat_bindings); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line is too long and failed travis
are there any performance implications? |
@eefriedman |
@petrochenkov I called one of those issues out with a FIXME: https://github.com/eefriedman/rust/blob/match-cleanup/src/librustc_trans/trans/_match.rs#L494 . The adjustments should be easy enough to handle. @arielb1 I have a terrible setup for benchmarking... but at a high level the performance implication is that this version will duplicate checks in certain cases where the old code would not. |
Wow, epic patch! |
r? @pnkfelix |
} | ||
|
||
/// Helper for converting from the ValueRef that we pass around in the match code, which is always | ||
/// an lvalue, into a Datum. Eventually we should just pass around a Datum and be done with it. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this extra newline is failing tidy
I wonder if leaning on llvm to optimize branches into switch statements offers enough performance opportunities for us. It seems very likely to make compilation slower, and istm there's a chance this could turn into a dead end if it turns out that writing the switch IR directly produces better code more often. EDIT: I asked @sunfish about this strategy and he seemed enthusiastic. We should definitely do a bunch of measurement for regressions in both runtime and compile time before merging. |
Whether we generate switch instructions or branch instructions in the IR is not at all relevant; LLVM's SimplifyCFG pass runs very early in the optimization process. The potential for redundant code is relevant... but I think optimizations along those lines can be written incrementally. This patch doesn't introduce anything along those lines because I didn't want to complicate the patch without any evidence that it's actually useful. |
Given the upcoming MIR (rust-lang/rfcs#1211), I believe it's better to have a simpler and saner implementation which can be then lifted up to the HIR->MIR lowering pass and expanded upon. |
MIR will basically require |
@eddyb I cannot tell if you are arguing that we should attempt to land this PR (on the basis that it will simplify future efforts to then move to a MIR based implementation), or if you (like @arielb1, I think?) are arguing that we need not invest effort trying to rewrite the existing |
@pnkfelix I am for this PR, even if only the tests remain from it in the long run, that alone will ensure the MIR does not regress back to the old broken state. |
Hmm, I definitely want to see some performance stats for this. The old code may have been difficult to understand, but it used a fairly advanced method for compiling the match. I don't want to merge what could be a major performance regression just because the code is easier to read. |
I'd also like to see some numbers/codegen results, but I'd not be surprised if in some cases, e.g. matches on tuples of enums like in |
I don't think that a possible transition to MIR affects whether or not we should land this patch. There is going to be a fair amount of effort needed to port trans to use MIR, which will take some time, and having patterns work more correctly in the meantime is great! I agree with @Aatch that some performance data would be nice. |
I tried some benchmarking, but my numbers are too unreliable to actually be usable. Maybe someone else can give it a shot? |
☔ The latest upstream changes (presumably #26683) made this pull request unmergeable. Please resolve the merge conflicts. |
@eefriedman I will look into benchmarking a little; sorry for the radio silence, I have been pretty swamped lately. |
Updates on this? |
So, I have done some preliminary profiling. Executive summary: This new Longer story: rustc compiling libsyntax dataOn some "normal code" operating on a "normal input" (in this case,
rustc compiling many_small_arms dataOn
In both the above cases, I'm measuring codegen time, without passing many_small_arms execution time dataBUT, this comes at a troubling cost. I mentioned that test input with 5,000 tiny But I then realized that I probably could (and should) be measuring the quality of the code being generated by the The test case above was a little bit too trivial for something like LLVM to optimize, and it also had a println in the code I would want to be an inner loop, so I modified it slightly (essentially to take and return data, and to stop printing the match result; its probably still quite easy for LLVM to optimize as written). The new test case is here: 5,000 arm matches called 100,000 times with varying inputs. After doing this, I discovered an unfortunate drawback to the new code in this PR. Compiling with
(Above shell transcript was gathered on an unloaded machine.) Now, I know this is a totally artificial micro-benchmark, but
|
Hmm... addressing that particular micro-benchmark requires generating a binary search tree or jump table (probably using an LLVM switch instruction). Conceptually, it's pretty simple: basically, it's just a matter of splitting the match statement based on integer fields. In practice, there are a lot of details... especially if we want a decent heuristic to decide which field to split on first. Not sure it's worthwhile, given that this code is going to get killed by MIR. |
cc @ranma42 |
ping @rust-lang/compiler, would be great to give @eefriedman a definitive answer here. |
@aturon yes. @eefriedman and I were discussing over e-mail and I think we reached the conclusion it made more sense to wait for the MIR to land. |
Closing in favor of MIR transition, as discussed. |
The old code was not well structured, difficult to understand,
and buggy.
The new implementation is completely naive, so there may be a slight
runtime performance loss. That said, adding optimizations on top of
a clear and correct implementation seems easier than trying to
fix the old mess.
Fixes issue #19064.
Fixes issue #26989.
Fixes issue #26251.
Fixes issue #18060.
Fixes issue #24875.
Fixes issue #23311.
Fixes issue #20046.