-
-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching-friendly generation of "current" rustdoc JSON #401
Comments
What is the rationale for the synthetic project? |
It's due to issues like this one: #167 TL;DR: The user's copy of their project is not guaranteed to be in a state amenable to valid rustdoc generation. To get it there we need a fresh lockfile -- but we don't want to overwrite the user's own lockfile, so we make a synthetic project and lock there instead. It's a really unfortunate but really serious problem, because it can trivially lead to all the flavors of semver-major false-positives one could imagine. |
What if we backup the lockfile, run Like:
|
The solution to #167 seems a bit excessive. In our workspace, this means recompiling the same dependencies for every crate because the individual synthetic crates don't share a |
We could also state as a requirement that the repository needs to be versioned with You could then check for uncommitted changes and fail in that case. Assuming no committed changes, run |
I totally get the frustration. We definitely need to do better — this issue isn't the only one like that.
This can be changed today: the build should still obey cargo configuration. If you specify
I'm worried that might get in the way of merging It's also a horrifically bad user experience if Please let me know if you get a chance to try the shared target directory trick I suggested! I'm curious how much of a difference it might make. Like I mentioned in the issue summary, I'm open to adding a "secondary" mechanism to generate the rustdoc that does a |
Oh that is an interesting approach. I'll see if copying the |
Uh, what if I point it to the existing |
I guess that kind of circumvents the fix for #167 😂 |
Actually, no, because it still uses a different lockfile. |
In my local testing, this is not the case. Environment variables are not passed through to the inner cargo invocation. |
I think it doesn't work because we are using the commandline option to specify the target directory: cargo-semver-checks/src/rustdoc_cmd.rs Lines 78 to 79 in 97f8a9b
|
If I remove these two, I can set the target directory via the env variable but then it doesn't find the JSON file because it is probably expecting it somewhere else. My question would be however, whether we can't just always use the original |
It is a bit of a hack but with this patch, I can successfully reuse most of the build artifacts from my usual build: Index: src/rustdoc_cmd.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/rustdoc_cmd.rs b/src/rustdoc_cmd.rs
--- a/src/rustdoc_cmd.rs (revision 97f8a9bdc671040542e150bfb602409492356f7b)
+++ b/src/rustdoc_cmd.rs (date 1677229148044)
@@ -48,14 +48,7 @@
.manifest_path(manifest_path)
.no_deps()
.exec()?;
- let manifest_target_directory = metadata
- .target_directory
- .as_path()
- .as_std_path()
- // HACK: Avoid potential errors when mixing toolchains
- .join(crate::util::SCOPE)
- .join("target");
- let target_dir = manifest_target_directory.as_path();
+ let target_dir = metadata.workspace_root.ancestors().nth(2).unwrap().as_std_path();
let stderr = if self.silence {
std::process::Stdio::piped()
|
With:
A full run of This is roughly equivalent to what I'd have in CI as I'd be restoring a cache that has most things built already, plus a cache for the registry JSON. |
This also works if I clean out all the |
Thanks for doing all these experiments! We'll definitely incorporate the findings into the V2 action that @mgr0dzicki is in the process of building. ContextIIRC, there were two concerns with target dirs that led to making the separate target dir:
Short of doing a deep dive into how cargo build artifacts are generated, I feel that:
ProposalHow about adding a In fact, CI perhaps wants to use |
If you have it handy, I'd love to see the timing data for this. I'm particularly curious how much time is spent running the lints vs building rustdoc (even with cache hits), so I know roughly how much impact the upcoming Trustfall optimizations API (demoed in my blog post) might have here. |
In my experience, Even if it were a problem, I am not sure how common it is to run I'd consider any unnecessary re-compilation a cargo bug that needs to be fixed upstream. We are just running
Unless the default is to use the same target dir, I don't think there is much value in another config option. On the contrary, I always dislike when you have to dig into the details of a tool's config options to make it perform well. The default should perform well and if it that causes problems, I'd consider those as bugs.
I've only done local testing so far because I'd have to build my fork in CI to use it there and the |
I'll see to look into that. |
@epage I'd love your take on this TL;DR: @thomaseizinger has found that not creating a separate target dir specific to semver-checking results in significant speedup for large workspaces (~hour -> few minutes), and is proposing making that the default behavior. A long time ago, we considered this and felt it was too risky at the time, but the underlying assumptions have changed in the meantime (see above) and perhaps it's time to re-evaluate. |
I read the blogpost but I am not sure I understood from it how to gather the timing data. From memory, it took about 20s to "update index" (sparse registry will hopefully fix that soon!). All checks were skipped across the entire workspace because every crate had received a major version bump already when I ran that. So I think the time is mostly spent in creating files, parsing rustdoc and the like:
5 seconds is not amazing but an okay time, considering we are launching a couple of processes in the background. If you wanted to analyze that in detail, I'd probably run it through flamegraph to see where it spends its time. |
@obi1kenobi can you be more specific for what you are wanting to point at in this thread as being resolved? |
This is the context + my proposal: @thomaseizinger in the subsequent reply makes the case that we should reuse the target directory by default, instead of merely supporting it as an option, and that we should trust cargo to not rebuild unnecessarily. |
Friendly bump. Unless I am mistaken, with 0.20 shipped, having to rebuild the current crate on every run is now the dominating factor for the CI runtime of |
What is necessary to push this forward? With unrelated improvements to our CI, we are now spending about 50% of our time in In wall-clock time that is only 1min30s but it would still be nice if we could reuse the target dir here and thus benefit from the caching that we already do. |
rust-lang/cargo#12103 suggests that there might be another reason why the cache isn't being hit even if we manually moved the target files there. Going to keep an eye on it, it should make a positive difference or at least make the workaround of just copying in the target files work better. |
We've now migrated to the provided GitHub action and semver-checking takes about 15 minutes for us. See https://github.com/libp2p/rust-libp2p/actions/runs/6068173889/job/16460787944#step:4:210. We have some interoperability tests that also take similarly long but the rest of CI finishes in < 5 minutes. It would be great if At the moment, we are completely unnecessarily recompiling the entire workspace, both locally and in CI. This results in high disk usage and slow feedback loops because everything is compiled twice. Can something like #402 please be considered? |
Yes, definitely. I also found I'm going to be travelling for a few weeks for RustConf and a couple of other conferences, but I look forward to picking thhis back up when I get back. |
Describe your use case
For workspaces that semver-check many crates, generating the "current" rustdoc JSON takes a significant portion of time: almost an hour across all of
rust-libp2p
, for example.To ensure rustdoc JSON stability and correctness, "current" (also "baseline") rustdoc is generated by:
This synthetic project is generated at
cargo-semver-checks
runtime. Because of rust-lang/cargo#6529, cached build artifacts for that project can't be used: cargo will ignore them since their mtime is older than the project. The hack of applying a synthetic mtime in the future to the build artifacts didn't seem to work, for unknown reasons: #399It would appear that build artifact caching for the current rustdoc JSON is just not possible at the moment. It would be nice to make that a supported use case.
Describe the solution you'd like
cargo-semver-checks
GitHub ActionAlternatives, if applicable
It may be possible to do this as a step-by-step process:
cargo-semver-checks
via the--current-rustdoc <path>
CLI option.This is certainly not as elegant, but may be substantially faster for large repos.
Additional Context
No response
The text was updated successfully, but these errors were encountered: