-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental compilation results in linker error when method use is removed #59535
Comments
I minimized this to ~60 lines in jonas-schievink/rubble#62. |
#61917 looks like it reproduces this in a non-Cargo environment. Some observations from that issue:
|
pre-triage: P-high; leaving nominated in hopes it increases chance that we discuss it today |
Is this specific to the thumbv7em-none-eabi target? |
triage: Reassigning from @Zoxc to self. |
spent a little while trying to reproduce this today, and failed. Of note: even just trying to do
I tried to work around this (by adding my own fake definition for
(I assume there's some cross-compilation system library support that I'm missing.) |
Okay, I managed to now replicate it atop @dtolnay 's reduction from jonas-schievink/rubble#62 (I was not installing the thumb target correctly before this point; I was probably not doing it with |
Here is a repo with my (further-reduced) reduction: I got the whole thing down to a chain of three crates: a 13-line crate at the root, a 65-line crate at the leaf, and a .... 1643-line crate in the middle...
It still requires the thumbv7em-none-eabi target. At this point I think I've gotten it at small as I can with my limited knowledge of embedded rust. So now I'll switch to trying to understand what's going on with the code-generation here. |
(for my narrowed test case, passing |
Embedded Rust is pretty much regular Rust sans magic done by the operating system and libc in a regular application. The reduced 1.6k middle crate is just an abstraction over fixed bits at fixed memory addresses to talk to built-in peripherals (data busses, timers, I/O pins, radios...) generated from a chip manufacturer provided register description. Having multiple |
Its absolutely allowed. See: https://doc.rust-lang.org/stable/book/ch05-03-method-syntax.html#multiple-impl-blocks |
@pnkfelix Yes, I have learned that last weekend in Barcelona and makes sense to me. Should have updated my comment to not waste your time, sorry about that. It's still kind of weird that Rust accepts multiple identical |
Current status: using As a reminder, the undefined symbol is: From there, I discovered that ; Function Attrs: nounwind optsize
define internal zeroext i1 @"_ZN60_$LT$rubble..ble..Duration$u20$as$u20$core..fmt..Display$GT$3fmt17hcaf51ebe05a39b3bE"(i32* noalias readonly align 4 dereferenceable(4), %"core::fmt::Formatter"* align 4 dereferenceable(52)) unnamed_addr #1 {
...
} but One thing that stood out to me here: The function definition in the llvm bitcode says Still, seems odd that this only crops up during this combination of factors (namely, the fact that incremental compilation seems to be involved...)
define hidden zeroext i1 @"_ZN60_$LT$rubble..ble..Duration$u20$as$u20$core..fmt..Display$GT$3fmt17hcaf51ebe05a39b3bE"(i32* noalias readonly align 4 dereferenceable(4), %"core::fmt::Formatter"* align 4 dereferenceable(52)) unnamed_addr #1 {
...
} |
Well, multiple identical non-trivial And multiple non-identical
So I think the only case here that one might want to lint for is trivial (i.e. empty) |
More data, after adding some The module The first (i.e. clean) run does not mark
The second run with incremental build produces and an empty
The logic of the relevant code is here: rust/src/librustc_mir/monomorphize/partitioning.rs Lines 648 to 661 in a19f934
(The debug print outs above do not include the codegen unit placements for the accessors, so I do not actually know for sure how much sense all of this makes. But it certainly makes sense that the aforementioned change to |
More data after adding instrumentation of the codegen units involved: The codegen unit that contains the unresolved reference is named Note that In short: it looks like we have a call chain of methods: A calls B calls C, A is in one code-gen unit, B and C are in another, and we make the decision to mark Is there some sort of inlining happening here that implies we need to do a deeper traversal of the accessor-chain before we commit to marking a reachable item as internal?
|
Okay, further investigation: And, thank goodness, we already had debug instrumentation in place to emit information about such reuse:
In other words: This lends credence to my theory that we are reusing an artifact from a previous incremental compile that should have been invalidated by the changes to what symbols were marked internal by The code that dictates whether a post-ThinLTO object file for a codegen unit can be reused is here: rust/src/librustc_codegen_llvm/back/lto.rs Lines 499 to 526 in 5e380b7
I'm going to read over this more carefully and see if I can figure out how to make it aware of the impact of |
@michaelwoerister and I are continuing to investigate this. (zulip archive) My most recent hypothesis as to what the bug is (transcribed here):
Specifically: rust/src/librustc_codegen_llvm/back/lto.rs Lines 473 to 475 in 0f6f66f
is relying on this: rust/src/librustc_codegen_llvm/back/lto.rs Lines 853 to 858 in 0f6f66f
which reflects the whatever the current LTO imports are computed, which may differ from the set of imports that were used to drive the original LTO optimization that we are reusing. In particular, they may differ by being a subset of the original set of imports, which means we can overlook hidden dependencies. |
…t-lto-imports, r=michaelwoerister save LTO import info and check it when trying to reuse build products Fix rust-lang#59535 Previous runs of LTO optimization on the previous incremental build can import larger portions of the dependence graph into a codegen unit than the current compilation run is choosing to import. We need to take that into account when we choose to reuse PostLTO-optimization object files from previous compiler invocations. This PR accomplishes that by serializing the LTO import information on each incremental build. We load up the previous LTO import data as well as the current LTO import data. Then as we decide whether to reuse previous PostLTO objects or redo LTO optimization, we check whether the LTO import data matches. After we finish with this decision process for every object, we write the LTO import data back to disk. ---- What is the scenario where comparing against past LTO import information is necessary? I've tried to capture it in the comments in the regression test, but here's yet another attempt from me to summarize the situation: 1. Consider a call-graph like `[A] -> [B -> D] <- [C]` (where the letters are functions and the modules are enclosed in `[]`) 2. In our specific instance, the earlier compilations were inlining the call to`B` into `A`; thus `A` ended up with a external reference to the symbol `D` in its object code, to be resolved at subsequent link time. The LTO import information provided by LLVM for those runs reflected that information: it explicitly says during those runs, `B` definition and `D` declaration were imported into `[A]`. 3. The change between incremental builds was that the call `D <- C` was removed. 4. That change, coupled with other decisions within `rustc`, made the compiler decide to make `D` an internal symbol (since it was no longer accessed from other codegen units, this makes sense locally). And then the definition of `D` was inlined into `B` and `D` itself was eliminated entirely. 5. The current LTO import information reported that `B` alone is imported into `[A]` for the *current compilation*. So when the Rust compiler surveyed the dependence graph, it determined that nothing `[A]` imports changed since the last build (and `[A]` itself has not changed either), so it chooses to reuse the object code generated during the previous compilation. 6. But that previous object code has an unresolved reference to `D`, and that causes a link time failure! ---- The interesting thing is that its quite hard to actually observe the above scenario arising, which is probably why no one has noticed this bug in the year or so since incremental LTO support landed (PR rust-lang#53673). I've literally spent days trying to observe the bug on my local machine, but haven't managed to find the magic combination of factors to get LLVM and `rustc` to do just the right set of the inlining and `internal`-reclassification choices that cause this particular problem to arise. ---- Also, I have tried to be careful about injecting new bugs with this PR. Specifically, I was/am worried that we could get into a scenario where overwriting the current LTO import data with past LTO import data would cause us to "forget" a current import. ~~To guard against this, the PR as currently written always asserts, at overwrite time, that the past LTO import-set is a *superset* of the current LTO import-set. This way, the overwriting process should always be safe to run.~~ * The previous note was written based on the first version of this PR. It has since been revised to use a simpler strategy, where we never attempt to merge the past LTO import information into the current one. We just *compare* them, and act accordingly. * Also, as you can see from the comments on the PR itself, I was quite right to be worried about forgetting past imports; that scenario was observable via a trivial transformation of the regression test I had devised.
I still see this issue |
If you mean #45929 (comment), that has almost certainly a different root cause that what this issue is about. |
See: https://github.com/jonas-schievink/rubble/compare/incr-linker-error
Build the first commit (jonas-schievink/rubble@04cf1d1), then the second (jonas-schievink/rubble@cedf7be) using
cargo build
, and you should get a linker error like this:Note that this needs the
thumbv7em-none-eabi
target, so you might need to runrustup target add thumbv7em-none-eabi
beforehand.This is on
rustc 1.33.0 (2aa4c46cf 2019-02-28)
.Also reproduces on
rustc 1.35.0-nightly (237bf3244 2019-03-28)
.The text was updated successfully, but these errors were encountered: