Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take command line arguments into account when doing incremental compilation #33727

Closed
michaelwoerister opened this issue May 18, 2016 · 9 comments
Assignees
Labels
A-incr-comp Area: Incremental compilation

Comments

@michaelwoerister
Copy link
Member

michaelwoerister commented May 18, 2016

Since command line arguments to the compiler influence the code generated, changing them should also purge affected artifacts from the cache. It might be a good idea to treat them just like HIR or Metadata nodes and thus make them part of the regular dep-graph.

cfg settings are a special case. Since they will affect the HIR being generated, they need not be considered for incr. comp. However, it might make sense to have separate caches for different sets of cfg flags, iff we want to support these being changed frequently without having to recompile a lot. Although that might be a negligible use case.

A first step to implementing this is probably taking a survey of all command line options and seeing what they might affect.

@michaelwoerister michaelwoerister added the A-incr-comp Area: Incremental compilation label May 18, 2016
@michaelwoerister
Copy link
Member Author

cc @nikomatsakis

@jonas-schievink
Copy link
Contributor

jonas-schievink commented Jun 2, 2016

I've taken an initial survey of all command line arguments for rustc. Feel free to correct any mistakes, as I'm sure there will be a few.

Options marked with ✔️ affect incr. comp.: When those options are added/removed between runs, everything should be recompiled.
Options marked with ✖️ have no effect on the existing incr. comp. infrastructure (eg. they only change HIR, or only affect linking).
Options marked as 💥 produce output depending on the whole crate (similar to lints). Their output can either be cached and replayed between runs (when their output is known to only change depending on a subset of the whole crate -- like an item (for lints) or the resulting codegen unit), or they require disabling incr. comp. as soon as they're used.

Option Description Affects incr. comp.
--cfg SPEC Configure the compilation environment ✖️ (see above)
-L [KIND=]PATH Add a directory to the library search path ❓ (affects which crates we'll link against, not sure what else)
-l [KIND=]NAME Link the generated crate(s) to the specified native library NAME. ✖️ Only affects linking
`--crate-type [bin lib ...]`
--crate-name NAME Specify the name of the crate being built ❓ Technically it shouldn't, but the crate name is part of symbol names, so everything will be rebuilt IIUC
-o FILENAME Write output to ✖️ Should just be a matter of executing the linker again
--out-dir DIR Write output to compiler-chosen filename in ✖️ Same as above
--explain OPT Provide a detailed explanation of an error message ✖️
--test Build a test harness ✖️ Should only affect HIR
--target TARGET Target triple for which the code is compiled ✔️ Every target should get its own incr. comp. directory, really.
Any lint options 💥 The resulting warnings can change, and with --deny <lint> compilation might even start to fail
-V --version Print version info and exit ✖️
-v --verbose Use verbose output ✖️ Doesn't use anything interesting
--extern NAME=PATH Specify where an external rust library is located Should behave like -L
--sysroot PATH Override the system root Should behave like -L
`--color auto always never`
-Car=val tool to assemble archives with ✖️ Only used to build the final artifact
-Clinker=val system linker to link outputs with ✖️ Like -Car
-Clink-args=val extra arguments to pass to the linker ✖️
-Clink-dead-code don't let linker strip dead code ✖️
-Clto perform LLVM link-time optimizations ✖️
-Ctarget-cpu=val select target processor ✔️ Affects all codegen, similar to --target
-Ctarget-feature=val target specific attributes ✔️ Ditto
-Cpasses=val a list of extra LLVM passes to run ✔️ Since we're not saving unoptimized LLVM IR, we need to rebuild everything
-Cllvm-args=val a list of arguments to pass to llvm ✔️ Superset of -Cpasses (I think)
-Csave-temps save all temporary output files during compilation ✖️ I think when we're saving temps, we'll do so for each CGU, which incr. comp. can handle
-Crpath set rpath values in libs/exes ✖️ I think this is just a linker arg
-Cno-prepopulate-passes don't pre-populate the pass manager with a list of passes ✔️ Can change all LLVM output
-Cno-vectorize-loops don't run the loop vectorization optimization passes ✔️ Can change all LLVM output
-Cno-vectorize-slp don't run LLVM's SLP vectorization pass ✔️ Ditto
-Csoft-float generate software floating point library calls ✔️ Ditto
-Cprefer-dynamic prefer dynamic linking to static linking ✖️ Should just be a different linker invocation
-Cno-integrated-as use an external assembler rather than LLVM's integrated one ✔️ Requires rebuilding everything with the external assembler
-Cno-redzone=val disable the use of the redzone ✔️ Global codegen change
-Crelocation-model=val choose the relocation model to use ✔️ Probably also a global codegen change
-Ccode-model=val choose the code model to use ✔️ Not too sure, but sounds like a global codegen change
-Cmetadata=val metadata to mangle symbol names with ✖️ Used for SVH so we'll recompile automatically
-Cextra-filename=val extra data to put in each output filename ✖️ Only relevant for the final artifact
-Ccodegen-units=val divide crate into N units to optimize in parallel ✖️ Incr. comp. will divide into many units anyways, whether they're compiled in parallel shouldn't affect the output
-Cremark=val print remarks for these optimization passes ✔️ We should probably recompile everything when this is passed
-Cno-stack-check disable checks for stack exhaustion ✔️ Global codegen change (but does this even do anything today?)
-Cdebuginfo=val debug info emission level ✔️ Global codegen change
-Copt-level=val optimize with possible levels 0-3 ✔️ Global codegen change
-Cdebug-assertions=val explicitly enable the cfg(debug_assertions) directive ✖️ Changes the HIR we work with, so this is already handled
-Cinline-threshold=val set the inlining threshold for ✔️ Global codegen/optimization change
-Cpanic=val panic strategy to compile crate with ✔️ Changes whether we emit landing pads
-Zverbose in general, enable more debug printouts ✖️ Like -v
-Ztime-passes measure time of each rustc pass ✖️ Benchmark option, meant to change with incr. comp.
-Zcount-llvm-insns count where LLVM instrs originate ✖️ Ditto
-Ztime-llvm-passes measure time of each LLVM pass ✖️ Ditto
-Zinput-stats gather statistics about the input ✖️ Runs before and shortly after expansion
-Ztrans-stats gather trans statistics ✖️
-Zasm-comments generate comments into the assembly ✔️ Changes the whole output
-Zno-verify skip LLVM verification ✔️
-Zborrowck-stats gather borrowck statistics 💥
-Zno-landing-pads omit landing pads for unwinding ✔️ Superseded by and similar to -Cpanic
-Zdebug-llvm enable debug output from LLVM 💥
-Zcount-type-sizes count the sizes of aggregate types 💥
-Zmeta-stats gather metadata statistics 💥 (?)
-Zprint-link-args print the arguments passed to the linker ✖️
-Zgc garbage collect shared data (experimental) ❓ no clue what this does
-Zprint-llvm-passes prints the llvm optimization passes being run
-Zast-json print the AST as JSON and halt ✖️ Exits before incr. comp.
-Zast-json-noexpand print the pre-expansion AST as JSON and halt ✖️ Ditto
-Zls list the symbols defined by a library crate
-Zsave-analysis write syntax and type analysis (in JSON format) information in addition to normal output ❓ Not handled by current incr. comp. design
-Zsave-analysis-csv write syntax and type analysis (in CSV format) information in addition to normal output Like above
-Zprint-move-fragments print out move-fragment data for every fn ❓ Are move fragments calculated incrementally?
-Zflowgraph-* ... ❓ Is the flowgraph constructed incrementally?
-Zprint-region-graph prints region inference graph. ❓ Like above
-Zparse-only parse only; do not compile, assemble, or link ✖️
-Zno-trans run all passes except translation; no output ✖️
-Ztreat-err-as-bug treat all errors that occur as bugs ✖️ Errors stop compilation one way or the other
-Zcontinue-parse-after-error attempt to recover from parse errors ✖️
-Zincremental=val enable incremental compilation uuuh...
-Zdump-dep-graph dump the dependency graph ✖️ We always build the dep graph for incr. comp.
-Zquery-dep-graph enable queries of the dependency graph for regression testing ✖️ This causes rustc to build the dep graph, but we're doing that anyway for incr. comp.
-Zno-analysis parse and expand the source, but run no analysis ✖️ Stops before incr. comp. can do its magic
-Zextra-plugins=val load extra plugins ❓ Loading extra lints should integrate them with rustc lints
-Zprint-enum-sizes print the size of enums and their variants ❓ (spoiler: I'm working on a PR that will get rid of this)
-Zforce-overflow-checks=val force overflow checks on or off ✔️
-Zforce-dropflag-checks=val force drop flag checks on or off ✔️ Codegen chance
-Ztrace-macros for every macro invocation, print its name and arguments ✖️
-Zenable-nonzeroing-move-hints force nonzeroing move optimization on ✔️ Codegen change
-Zkeep-mtwt-tables don't clear the resolution tables after analysis ✖️
-Zkeep-ast keep the AST after lowering it to HIR ✖️
-Zshow-span=val show spans for compiler debugging ✖️
-Zprint-trans-items=val print the result of the translation item collection pass ❓ Can currently change the trans item collection mode, which seems problematic with incr. comp.
-Zmir-opt-level=val set the MIR optimization level ✔️
-Zdump-mir=val dump MIR state at various points in translation 💥 When compiling incrementally, MIR might not go through all these "points"
-Zorbit get MIR where it belongs ✔️ Changes the translation backend (codegen change)

Sorry for this wall of text, rustc has more options than I remember 😅

@michaelwoerister
Copy link
Member Author

@jonas-schievink Wow, thanks a lot for this listing! That's very impressive :)

@michaelwoerister michaelwoerister self-assigned this Jul 20, 2016
@michaelwoerister
Copy link
Member Author

OK, I see two basic approaches for implementing this:

  1. Conservative - If there is a change in a commandline argument that might affect any cached artifact, delete the whole cache.
  2. DepGraph-based - Commandline arguments are represented as nodes in the dependency graph. The regular dependency system takes care of deleting invalidated cache entries.

Advantages of the Conservative Approach

  • It is easy to implement
  • It is pretty fool-proof, in part due to its simplicity
  • As long as we just cache object files, it is not more wasteful than a more fine-grained approach

Disadvantages of the Conservative Approach

  • Once we start caching things like MIR, this approach will delete more things from the cache than necessary

Advantages of the DepGraph-based Approach

  • It is more accurate than the conservative approach, allowing, for example, to keep cached MIR around if only things affecting codegen have changed
  • It is building on the existing infrastructure

Disadvantages of the DepGraph-based Approach

  • It is more complicated than the conservative approach, needing more changes to existing code
  • It might lead to breaking changes for plugins and tools using rustc as a library
  • It is less fool-proof because information can leak out of the tracking system

Initially I thought that the DepGraph-based approach would be the natural way to do it, but now I'm thinking that it might be better to go the conservative route, at least in the beginning. It reduces the risk for bugs and, at least initially, should have no negative effect on performance. I'd be interested in what others think here.

Conservative Approach: Implementation
My proposal for implementing the conservative approach would be to divide all command-line options into the following categories:

RELEVANT potential effect on output (~ ✔️ in @jonas-schievink's listing)
IRRELEVANT no influence, completely ignored (~ ✖️)
INCOMPATIBLE incompatible (~ 💥)

With these categories defined, run the following algorithm:

  1. If the set of arguments contains any that match INCOMPATIBLE, warn about the negative effect or abort if necessary
  2. Take the subset of arguments that match RELEVANT and compare it to the RELEVANT set that was used to build the cache. If there is a difference, delete all cached artifacts.
  3. Store the current RELEVANT arguments for later comparison.

I was also thinking about sub-dividing the incr. comp. directory, depending on given set of RELEVANT arguments like target triple, optimization level, and debuginfo. But now I'm thinking that it's better left to cargo and build systems to choose a different incr. comp. directory for different configurations. What does the @rust-lang/tools think about that?

@alexcrichton
Copy link
Member

Makes sense to me to take the conservative approach to start out at least, and the proposed strategy here seems reasonable to me. I agree that Cargo should probably deal with the separate incremental compilation directories, it'll just have one for debug/release mode. Additionally in practice the arguments shouldn't change much unless the cache actually needs to be invalidated because Cargo'll pass the same arguments, so the conservative approach I think will pan out well enough.

One question I'd have is how we protect against new flags being added to the compiler? Would we ensure that all options are tagged with relevant/irrelevant to incremental compilation, or would we just have whitelist of "irrelevant" options you'd have to add to if it's relevant?

@michaelwoerister
Copy link
Member Author

I prototyped the DepGraph approach the other day and there I extended the options! macro in sess::config to require that one specifies the DepNode that the option should be mapped to. Something similar (but simpler) can also be done here, I guess. A whitelist seems good too.

On July 21, 2016 7:42:41 PM GMT+02:00, Alex Crichton [email protected] wrote:

Makes sense to me to take the conservative approach to start out at
least, and the proposed strategy here seems reasonable to me. I agree
that Cargo should probably deal with the separate incremental
compilation directories, it'll just have one for debug/release mode.
Additionally in practice the arguments shouldn't change much unless the
cache actually needs to be invalidated because Cargo'll pass the same
arguments, so the conservative approach I think will pan out well
enough.

One question I'd have is how we protect against new flags being added
to the compiler? Would we ensure that all options are tagged with
relevant/irrelevant to incremental compilation, or would we just have
whitelist of "irrelevant" options you'd have to add to if it's
relevant?


You are receiving this because you were assigned.
Reply to this email directly or view it on GitHub:
#33727 (comment)

@nikomatsakis nikomatsakis added this to the Incremental compilation alpha milestone Jul 23, 2016
@nikomatsakis
Copy link
Contributor

@michaelwoerister so the simplest thing I can imagine is just to hash the options and encode that in the serialized state. When we re-load the state, we could compare against our current options and just choose to ignore everything if the options changed. But I guess if you've worked through some of the dep-graph approach that'd be that much better.

@michaelwoerister
Copy link
Member Author

@nikomatsakis Prototyping the DepGraph approach was mostly to get a feel for what it would look like -- and subsequently lead me to prefer the simpler approach. What you describe is pretty much how I intend to implement it, just with filtering out those options before hashing that should not have any influence.

bors added a commit that referenced this issue Aug 10, 2016
…sakis

Take commandline arguments into account for incr. comp.

Implements the conservative strategy described in #33727.

From now one, every time a new commandline option is added, one has to specify if it influences the incremental compilation cache. I've tried to implement this as automatic as possible: One just has to added either the `[TRACKED]` or the `[UNTRACKED]` marker next to the field. The `Options`, `CodegenOptions`, and `DebuggingOptions` definitions in `session::config` show plenty of examples.

The PR removes some cruft from `session::config::Options`, mostly unnecessary copies of flags also present in `DebuggingOptions` or `CodeGenOptions` in the same struct.

One notable removal is the `cfg` field that contained the values passed via `--cfg` commandline arguments. I chose to remove it because (1) its content is only a subset of what later is stored in `hir::Crate::config` and it's pretty likely that reading the cfgs from `Options` would not be what you wanted, and (2) we could not incorporate it into the dep-tracking hash of the `Options` struct because of how the test framework works, leaving us with a piece of untracked but vital data.

It is now recommended (just as before) to access the crate config via the `krate()` method in the HIR map.

Because the `cfg` field is not present in the `Options` struct any more, some methods in the `CompilerCalls` trait now take the crate config as an explicit parameter -- which might constitute a breaking change for plugin authors.
bors added a commit that referenced this issue Aug 12, 2016
…sakis

Take commandline arguments into account for incr. comp.

Implements the conservative strategy described in #33727.

From now one, every time a new commandline option is added, one has to specify if it influences the incremental compilation cache. I've tried to implement this as automatic as possible: One just has to added either the `[TRACKED]` or the `[UNTRACKED]` marker next to the field. The `Options`, `CodegenOptions`, and `DebuggingOptions` definitions in `session::config` show plenty of examples.

The PR removes some cruft from `session::config::Options`, mostly unnecessary copies of flags also present in `DebuggingOptions` or `CodeGenOptions` in the same struct.

One notable removal is the `cfg` field that contained the values passed via `--cfg` commandline arguments. I chose to remove it because (1) its content is only a subset of what later is stored in `hir::Crate::config` and it's pretty likely that reading the cfgs from `Options` would not be what you wanted, and (2) we could not incorporate it into the dep-tracking hash of the `Options` struct because of how the test framework works, leaving us with a piece of untracked but vital data.

It is now recommended (just as before) to access the crate config via the `krate()` method in the HIR map.

Because the `cfg` field is not present in the `Options` struct any more, some methods in the `CompilerCalls` trait now take the crate config as an explicit parameter -- which might constitute a breaking change for plugin authors.
bors added a commit that referenced this issue Aug 15, 2016
…sakis

Take commandline arguments into account for incr. comp.

Implements the conservative strategy described in #33727.

From now one, every time a new commandline option is added, one has to specify if it influences the incremental compilation cache. I've tried to implement this as automatic as possible: One just has to added either the `[TRACKED]` or the `[UNTRACKED]` marker next to the field. The `Options`, `CodegenOptions`, and `DebuggingOptions` definitions in `session::config` show plenty of examples.

The PR removes some cruft from `session::config::Options`, mostly unnecessary copies of flags also present in `DebuggingOptions` or `CodeGenOptions` in the same struct.

One notable removal is the `cfg` field that contained the values passed via `--cfg` commandline arguments. I chose to remove it because (1) its content is only a subset of what later is stored in `hir::Crate::config` and it's pretty likely that reading the cfgs from `Options` would not be what you wanted, and (2) we could not incorporate it into the dep-tracking hash of the `Options` struct because of how the test framework works, leaving us with a piece of untracked but vital data.

It is now recommended (just as before) to access the crate config via the `krate()` method in the HIR map.

Because the `cfg` field is not present in the `Options` struct any more, some methods in the `CompilerCalls` trait now take the crate config as an explicit parameter -- which might constitute a breaking change for plugin authors.
@michaelwoerister
Copy link
Member Author

#35340 has landed, closing :D

critiqjo pushed a commit to critiqjo/rustdoc that referenced this issue Dec 16, 2016
…sakis

Take commandline arguments into account for incr. comp.

Implements the conservative strategy described in rust-lang/rust#33727.

From now one, every time a new commandline option is added, one has to specify if it influences the incremental compilation cache. I've tried to implement this as automatic as possible: One just has to added either the `[TRACKED]` or the `[UNTRACKED]` marker next to the field. The `Options`, `CodegenOptions`, and `DebuggingOptions` definitions in `session::config` show plenty of examples.

The PR removes some cruft from `session::config::Options`, mostly unnecessary copies of flags also present in `DebuggingOptions` or `CodeGenOptions` in the same struct.

One notable removal is the `cfg` field that contained the values passed via `--cfg` commandline arguments. I chose to remove it because (1) its content is only a subset of what later is stored in `hir::Crate::config` and it's pretty likely that reading the cfgs from `Options` would not be what you wanted, and (2) we could not incorporate it into the dep-tracking hash of the `Options` struct because of how the test framework works, leaving us with a piece of untracked but vital data.

It is now recommended (just as before) to access the crate config via the `krate()` method in the HIR map.

Because the `cfg` field is not present in the `Options` struct any more, some methods in the `CompilerCalls` trait now take the crate config as an explicit parameter -- which might constitute a breaking change for plugin authors.
djrenren pushed a commit to djrenren/compiletest that referenced this issue Aug 26, 2019
…sakis

Take commandline arguments into account for incr. comp.

Implements the conservative strategy described in rust-lang/rust#33727.

From now one, every time a new commandline option is added, one has to specify if it influences the incremental compilation cache. I've tried to implement this as automatic as possible: One just has to added either the `[TRACKED]` or the `[UNTRACKED]` marker next to the field. The `Options`, `CodegenOptions`, and `DebuggingOptions` definitions in `session::config` show plenty of examples.

The PR removes some cruft from `session::config::Options`, mostly unnecessary copies of flags also present in `DebuggingOptions` or `CodeGenOptions` in the same struct.

One notable removal is the `cfg` field that contained the values passed via `--cfg` commandline arguments. I chose to remove it because (1) its content is only a subset of what later is stored in `hir::Crate::config` and it's pretty likely that reading the cfgs from `Options` would not be what you wanted, and (2) we could not incorporate it into the dep-tracking hash of the `Options` struct because of how the test framework works, leaving us with a piece of untracked but vital data.

It is now recommended (just as before) to access the crate config via the `krate()` method in the HIR map.

Because the `cfg` field is not present in the `Options` struct any more, some methods in the `CompilerCalls` trait now take the crate config as an explicit parameter -- which might constitute a breaking change for plugin authors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-incr-comp Area: Incremental compilation
Projects
None yet
Development

No branches or pull requests

4 participants