-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide scheduling hints? #7437
Comments
(Looks like |
Scheduling has come up from time to time, but generally hasn't received a ton of love unfortunately! Some history is:
The tl;dr is that Cargo only has limited judgement when scheduling things. This only comes up when Cargo has multiple candidates to schedule for an available jobserver token. The "waiting" red line in the graph you have, when it's nonzero, is the only chance where Cargo actually has a scheduling decision to make. When Cargo has a scheduling decision, it currently sorts the list of candidates to scheduled based on the number of transitive crates which depend on the crate. The thinking is that this schedules things that will hopefully unlock the most parallelism later in the graph. The previous behavior after the "first fix" was sorting based on depth in the graph. That being said I definitely agree that we should have some way of providing hints to scheduling. For example Servo should be able to codify "when you compile the If you zoom out though on your graph, do you know if Cargo could have actually scheduled better here? Did dependencies of |
I'm pretty sure it is 16? |
At a meetup last night we realized that that can be answered with a small addition to the data stored in the |
Oh, interesting. Any guess why this machine’s 4 cores / 8 threads are not saturated during codegen for the
If I run |
Is it practical to use the json output from In any case, I've submitted #8908 which attempts to take a small step toward allowing some weighting to be considered when determining build order. I think this should be useful for either allowing a user to provide scheduling hints as requested in this issue, or in implementing a system which uses actual timing data from a previous build. |
update dependency queue to consider cost for each node In support of #7437, this updates the dependency queue implementation to consider a non-fixed cost for each node. The behavior of this implementation should match the previous implementation if all units are assigned the same cost by the code which calls the `queue` method (which is what we do for now). In the future I can think of at least two ways these costs could be used: 1. Use some known constant value by default (say 100 as I've done in this PR), and allow the user to provide hints for certain units to influence compilation order (as requested in #7437). 2. Take timing data from a previous compilation (perhaps the json output of `cargo build -Ztimings=json`) and use the actual time required to compile a given unit (in milliseconds) as its cost. Any units not included in the provided timing data could perhaps have their cost set to the median cost of all crates included in the timing data.
That's #7396. |
#11032 looked at adding more heuristics to improve build order but they were a mixed bag. Something that came up in that discussion was a (presumed) perma-unstable way of configuring the priority of every package to explore what heuristics could be used. #7396 could then build on this to create a feedback loop for your own system to improve. |
Now that nightly supports parallel front-end compilation, I propose we develop an algorithm Notes:
|
FWIW, not exactly the same issue but could share some ideas in between regarding parallelism — #12912. |
Similarly, protobuf generation (via Since we have integrated SQLite into Cargo, it is a good time to evaluate how to persist timings data as a first step. |
When more crates are ready to start compiling than there is available parallelism, how does Cargo pick which ones to start first? Are there ways to influence this scheduling? (For example, does the order of declarations of dependencies in a given
Cargo.toml
file matter? Or, what effect would it have to add an otherwise unnecessary edge to the dependency graph?) Should we add a new mechanism to influence it?Here is the output of
cargo build -Z timings
for Servo: (with "Min unit time" set to 10 seconds)If we start from the end of the graph, (part of) a critical path is very apparent: the final executable only starts to build after the
script
crate has finished. The script crate in turns only starts after the build script for themozjs_sys
has finished.Additionally, CPU utilization drops while
script
is compiling because Cargo runs out of other tasks to do and rustc only has limited intra-crate parallelism. (Though I expectedcodegen-units
to provide some more parallelism during codegen, but that’s a separate issue.)It seems that if we could start
mozjs_sys
andscript
and earlier, the total time could be significantly reduced. Specifically:cargo build -p mozjs_sys && cargo build -p script && cargo build
might lead to better scheduling. A way to achieve that scheduling with more parallelism could be to assign priorities.mozjs_sys
and its recursive dependencies have priority 2 (more urgent),script
and its dependencies that don’t already have a priority have priority 1, and everything else has priority 1.Literally this mechanism with numeric priority levels is probably not the UX we want for Cargo. But how does it sound to add some way to influence the scheduling? Maybe call it “hints” to avoid setting in stone the current algorithm.
CC #5125 which is on the more general topic of scheduling
The text was updated successfully, but these errors were encountered: