-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid rebuilding a project when cwd changes #4788
Conversation
r? @matklad (rust_highfive has picked a reviewer for you, use r? to override) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks good to me, and I think "purge CWD, use paths relative to a fixed position" is the right approach, but I am not sure that "workspace root heuristic" is a good enough implementation. It's certainly a heuristic because some workspace members may be outside of the workspace, and some non-members may be inside workspace. Not sure if we need 100% correct implementation here though.
if fs_try!(f.read_until(0, &mut cwd)) == 0 { | ||
return Ok(None) | ||
} | ||
let cwd = util::bytes2path(&cwd[..cwd.len()-1])?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct that new versions of Cargo will fail to parse old depinfo, which is actually OK, because it'll cause only a rebuild, and not a build failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I'm not actually sure what will happen but the hashes are all changing so I don't think newer cargo will use the same files from older cargo.
src/cargo/ops/cargo_rustc/mod.rs
Outdated
let ws_root = cx.ws.root(); | ||
let src = unit.target.src_path(); | ||
assert!(src.is_absolute()); | ||
match src.strip_prefix(ws_root) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not too happy about this solution, because there's no guarantee that workspace members reside below workspace root (see also #4787 (comment) :) ).
I think the ideal behavior here is to use all explicit worksapce members which lay outside of the root package as anchors for paths as well. Not sure if we need to support this right from the start though.
fn hash<H: Hasher>(&self, _: &mut H) { | ||
// ... | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A slightly more elaborte design here would be to introduce a notion of explicitly relative path here and track path relative to workspace root and to other workspace members outside the root. Sort of like
struct PathRoots {
// a number of anchors, such as current workspace dir, a particular workspace member, users's home directory,
// CARGO_HOME, etc. This structure is "global" for cargo and is stored in Workspace/config.
roots: Vec::<PathBuf>,
}
struct RelativePath {
root_idx: usize // index in the global PathRoots
rel_path: PathBuf
}
impl RelativePath {
fn to_path(&self, roots: &PathRoots) -> PathBuf {
roots[self.root_idx].join(&self.rel_path)
}
}
Not sure that we need this, but this can cope with workspaces which are not entirely under a workspace root. (Or perhaps it was a mistake to allow non-subdirectory members?).
@matklad thanks for taking a look! It's true yeah that workspace members outside the workspace root do pose a problem. That being said I wasn't really sure how we'd handle them at all :( I originally had a much more complicated patch which entirely refactored the So thinking about it... Right now the workspace root comes up in a few locations:
I think it's a pretty reasonable assumption to assume all rust files for a crate are within the same folder, so I think we could pretty easily adapt the dep-info files to only list files relative to the package root rather than the workspace root (basically do some serious postprocessing). For invocations of What do you think? Maybe we should fix up dep-info parsing to work without using |
Hm, the part about So, if you write
Hm, depinfo is produced by the compiler, so in theory solving the problem with How hard is it to implement the |
This assumption is also pretty important for IDEs, because we need to know which stuff to index, so +1 to making it an official and documented requirement :) Though, I can imagine someone doing weird stuff like |
This commit is targeted at solving a use case which typically comes up during CI builds -- the `target` directory is cached between builds but the cwd of the build changes over time. For example the following scenario can happen: 1. A project is compiled at `/projects/a`. 2. The `target` directory is cached. 3. A new build is started in `/projects/b`. 4. The previous `target` directory is restored to `/projects/b`. 5. The build start, and Cargo rebuilds everything. The last piece of behavior is indeed unfortunate! Cargo's internal hashing currently isn't that resilient to changing cwd and this PR aims to help improve the situation! The first point of too-much-hashing came up with `Target::src_path`. Each `Target` was hashed and stored for all compilations, and the `src_path` field was an absolute path on the filesystem to the file that needed to be compiled. This path then changed over time when cwd changed, but otherwise everything else remained the same! This commit updates the handling of the `src_path` field to simply ignore it when hashing. Instead the path we actually pass to rustc is later calculated and then passed to the fingerprint calculation. The next problem this fixes is that the dep info files were augmented after creation to have the cwd of the compiler at the time to find the files at a later date. This, unfortunately, would cause issues if the cwd itself changed. Instead the cwd is now left out of dep-info files (they're no longer augmented) and instead the cwd is recalculated when parsing the dep info later. The final problem that this commit fixes is actually an existing issue in Cargo today. Right now you can actually execute `cargo build` from anywhere in a project and Cargo will execute the build. Unfortunately though the argument to rustc was actually different depending on what directory you were in (the compiler was invoked with a path relative to cwd). This path ends up being used for metadata like debuginfo which means that different directories would cause different artifacts to be created, but Cargo wouldn't rerun the compiler! To fix this issue the matter of cwd is now entirely excluded from compilation command lines. Instead rustc is unconditionally invoked with a relative path *if* the path is underneath the workspace root, and otherwise it's invoked as an absolute path (in which case the cwd doesn't matter). Once all these fixes were added up it means that now we can have projects where if you move the entire directory Cargo won't rebuild the original source! Note that this may be a bit of a breaking change, however. This means that the paths in error messages for cargo will no longer be unconditionally relative to the current working directory, but rather relative to the root of the workspace itself. Unfortunately this is moreso of a feature right now rather than a bug, so it may be one that we just have to stomach.
fc7c39a
to
dd0b41b
Compare
@matklad ok I've amended the first commit and pushed up another one. This should reduce the reliance on The one usage of
With this PR as-is if you rename the I believe the fix for this would be to change how Cargo invokes rustc, instead invoking I also believe that such a fix would purely involve local modification to this logic. I think though that this may be rare enough that we may want to punt on this for now? Does that makes sense? Curious what you think! |
src/cargo/ops/cargo_rustc/mod.rs
Outdated
assert!(src.is_absolute()); | ||
match src.strip_prefix(ws_root) { | ||
Ok(path) => (path.to_path_buf(), Some(ws_root.to_path_buf())), | ||
Err(_) => (src.to_path_buf(), None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning None
here means that rustc
will use pretty-much arbitrary cwd, is that right? Maybe it's better to lock this down to ws_root
even in this case, just to get a bit more determinism? Than we can return just PathBuf
here and avoid that comment about hashing only .0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly plausible! I was originally thinking that this wouldn't matter but I think with plugins nowadays it could matter for sure. I'll change.
That being said though I don't think we can avoid hashing the second field of the return value because ws_root
changes over time (if the whole dir is renamed) but we don't want that to cause recompiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That being said though I don't think we can avoid hashing the second field of the return value because ws_root changes over time (if the whole dir is renamed) but we don't want that to cause recompiles.
The idea is that if we lock down cwd, then we don't need to return a pair here, we can return just the path.
/// when it was invoked. | ||
/// | ||
/// The serialized Cargo format will contain a list of files, all of which are | ||
/// relative if they're under `root`. or absolute if they're elsewehre. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, weren't depinfo files, produced by Cargo, used by some other tools? #3557
So, this change will break such clients :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sure!
I think though (at least the intention) that these files are just those in .fingerprint
, so the ones like target/debug/foo.d
should still be present unmangled from rustc itself.
Yeah, so I think anchoring to the package root in most places is the correct thing to do, but I still have a couple of question (left inline as well):
|
dd0b41b
to
4493400
Compare
For the last point about |
Targets should be package relative, and not workspace relative. In theory, we do have necessary information:
But I totally understand that it might be hard to actually implement. So yeah, I think it might worth it to look into package relative paths for targets and locking down cwd for rustc, but, otherwise, r+ at will, this looks great! :) |
This commit alters the format of the dependency info that Cargo keeps track of for each crate. In order to be more resilient against directory renames and such Cargo will now postprocess the compiler's dep-info output and serialize into its own format. This format is intended to primarily list relative paths *to the root of the relevant package* rather than absolute or relative to some other location. If paths aren't actually relative to the package root they're still stored as absolute, but there's not much we can do about that!
4493400
to
f688e9c
Compare
Oh sure yeah targets are package relative but we still don't want to hash the It was after I ended up realizing that when I jettisoned the ~500 lines of changes to get the refactoring done as it ended up not being used in the end :( In any case, thought I already did it, but just pushed up the changes for cwd from the package in non-workspace situations |
@bors r+ |
📌 Commit f688e9c has been approved by |
Avoid rebuilding a project when cwd changes This commit is targeted at solving a use case which typically comes up during CI builds -- the `target` directory is cached between builds but the cwd of the build changes over time. For example the following scenario can happen: 1. A project is compiled at `/projects/a`. 2. The `target` directory is cached. 3. A new build is started in `/projects/b`. 4. The previous `target` directory is restored to `/projects/b`. 5. The build start, and Cargo rebuilds everything. The last piece of behavior is indeed unfortunate! Cargo's internal hashing currently isn't that resilient to changing cwd and this PR aims to help improve the situation! The first point of too-much-hashing came up with `Target::src_path`. Each `Target` was hashed and stored for all compilations, and the `src_path` field was an absolute path on the filesystem to the file that needed to be compiled. This path then changed over time when cwd changed, but otherwise everything else remained the same! This commit updates the handling of the `src_path` field to simply ignore it when hashing. Instead the path we actually pass to rustc is later calculated and then passed to the fingerprint calculation. The next problem this fixes is that the dep info files were augmented after creation to have the cwd of the compiler at the time to find the files at a later date. This, unfortunately, would cause issues if the cwd itself changed. Instead the cwd is now left out of dep-info files (they're no longer augmented) and instead the cwd is recalculated when parsing the dep info later. The final problem that this commit fixes is actually an existing issue in Cargo today. Right now you can actually execute `cargo build` from anywhere in a project and Cargo will execute the build. Unfortunately though the argument to rustc was actually different depending on what directory you were in (the compiler was invoked with a path relative to cwd). This path ends up being used for metadata like debuginfo which means that different directories would cause different artifacts to be created, but Cargo wouldn't rerun the compiler! To fix this issue the matter of cwd is now entirely excluded from compilation command lines. Instead rustc is unconditionally invoked with a relative path *if* the path is underneath the workspace root, and otherwise it's invoked as an absolute path (in which case the cwd doesn't matter). Once all these fixes were added up it means that now we can have projects where if you move the entire directory Cargo won't rebuild the original source! Note that this may be a bit of a breaking change, however. This means that the paths in error messages for cargo will no longer be unconditionally relative to the current working directory, but rather relative to the root of the workspace itself. Unfortunately this is moreso of a feature right now rather than a bug, so it may be one that we just have to stomach. Closes #3273
☀️ Test successful - status-appveyor, status-travis |
This commit fixes flycheck#1397. By the recent change of cargo(rust-lang/cargo#4788), flycheck can't detect the error file. So I changed to use 'workspace_root' in 'cargo metadata's outputs, as working directory. If the user's cargo is a bit old and there's no 'workspace_root', flycheck-rust-manifest-directory search a directory with 'Cargo.toml', as ever.
Due to a recent change to cargo [1], in a workspace setting (multiple crates) filenames are now relative to the workspace root, whereas previously they were relative to the crate. This commit uses `cargo metadata` to figure out the workspace root, if it exists. If it doesn't, it fallbacks on the manifest directory. [1]: rust-lang/cargo#4788
Due to a recent change to cargo [1], in a workspace setting (multiple crates) filenames are now relative to the workspace root, whereas previously they were relative to the crate. This commit uses `cargo metadata` to figure out the workspace root, if it exists. If it doesn't, it fallbacks on the manifest directory. [1]: rust-lang/cargo#4788
This commit is targeted at solving a use case which typically comes up during CI
builds -- the
target
directory is cached between builds but the cwd of thebuild changes over time. For example the following scenario can happen:
/projects/a
.target
directory is cached./projects/b
.target
directory is restored to/projects/b
.The last piece of behavior is indeed unfortunate! Cargo's internal hashing
currently isn't that resilient to changing cwd and this PR aims to help improve
the situation!
The first point of too-much-hashing came up with
Target::src_path
. EachTarget
was hashed and stored for all compilations, and thesrc_path
fieldwas an absolute path on the filesystem to the file that needed to be compiled.
This path then changed over time when cwd changed, but otherwise everything else
remained the same!
This commit updates the handling of the
src_path
field to simply ignore itwhen hashing. Instead the path we actually pass to rustc is later calculated and
then passed to the fingerprint calculation.
The next problem this fixes is that the dep info files were augmented after
creation to have the cwd of the compiler at the time to find the files at a
later date. This, unfortunately, would cause issues if the cwd itself changed.
Instead the cwd is now left out of dep-info files (they're no longer augmented)
and instead the cwd is recalculated when parsing the dep info later.
The final problem that this commit fixes is actually an existing issue in Cargo
today. Right now you can actually execute
cargo build
from anywhere in aproject and Cargo will execute the build. Unfortunately though the argument to
rustc was actually different depending on what directory you were in (the
compiler was invoked with a path relative to cwd). This path ends up being used
for metadata like debuginfo which means that different directories would cause
different artifacts to be created, but Cargo wouldn't rerun the compiler!
To fix this issue the matter of cwd is now entirely excluded from compilation
command lines. Instead rustc is unconditionally invoked with a relative path
if the path is underneath the workspace root, and otherwise it's invoked as an
absolute path (in which case the cwd doesn't matter).
Once all these fixes were added up it means that now we can have projects where
if you move the entire directory Cargo won't rebuild the original source!
Note that this may be a bit of a breaking change, however. This means that the
paths in error messages for cargo will no longer be unconditionally relative to
the current working directory, but rather relative to the root of the workspace
itself. Unfortunately this is moreso of a feature right now rather than a bug,
so it may be one that we just have to stomach.
Closes #3273