-
-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark workunits blocked, and skip rendering completed workunits #12369
Conversation
…orkunits`. [ci skip-build-wheels]
…a static property. [ci skip-build-wheels]
[ci skip-build-wheels]
…ate workunit for eager fetching. # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
d23f0eb
to
bbcc489
Compare
…rkunits in case of spawned work (like cache writes) where a child is still running below a completed parent. [ci skip-build-wheels]
…andRunner can mark its parent workunit blocked. [ci skip-build-wheels]
…e BoundedCommandRunner's. [ci skip-build-wheels]
bbcc489
to
9ecd8ac
Compare
/// | ||
/// Workunits form a tree of running, blocked, and completed work, with parent ids propagated via | ||
/// thread-local state. | ||
/// | ||
/// While running (the Started state), a copy of a Workunit is generally kept on the stack by the | ||
/// `in_workunit!` macro, while another copy of the same Workunit is recorded in the WorkunitStore. | ||
/// Most of the fields of the Workunit are immutable, but an atomic "blocked" flag can be set to | ||
/// temporarily mark the running Workunit as being in a blocked state. | ||
/// | ||
/// When the `in_workunit!` macro exits, the Workunit on the stack is completed by storing any | ||
/// local mutated values as the final value of the Workunit. | ||
/// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: ^
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Stu! Should this be picked to 2.6?
@@ -508,13 +525,19 @@ pub struct HeavyHittersInnerStore { | |||
fn first_matched_parent( | |||
workunit_records: &HashMap<SpanId, Workunit>, | |||
mut span_id: Option<SpanId>, | |||
is_terminal: impl Fn(&Workunit) -> bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does terminal mean here? Imo it's ambiguous if it's something like "final/complete" vs. "the CLI/terminal"
… fixes `Starting` messages. # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing the logging of cache hits!
…build#12369) Our "graph" of execution is a DAG, while the workunits (used to visualize and record traces) form a tree. Because they form a tree, workunits do not always report that they are blocked on work that they didn't start, but which some other node started (common when lots of @rules are waiting on another single @rule which only one of them started). To fix this, we make the `blocked` property of a workunit an atomic mutable, and skip rendering the parents of blocked leaves. We use the `blocked` flag both for `Tasks` (which wait directly for memoized Nodes, and so frequently block in this way), and in `BoundedCommandRunner`, which temporarily blocks the workunit while we're acquiring the semaphore. Additionally, we skip rendering or walking through `Completed` workunits, which can happen in the case of speculation if a parent workunit completes before a child. In order to toggle the `blocked` property on workunits, we expose the current `RunningWorkunit` in two new places: the `CommandRunner` and `WrappedNode`. In both cases, this is to allow the generic code to consume the workunit created by their callers and mark it blocked (for `Task` and `BoundedCommandRunner`). Fixes pantsbuild#12349. [ci skip-build-wheels]
… (#12376) Our "graph" of execution is a DAG, while the workunits (used to visualize and record traces) form a tree. Because they form a tree, workunits do not always report that they are blocked on work that they didn't start, but which some other node started (common when lots of @rules are waiting on another single @rule which only one of them started). To fix this, we make the `blocked` property of a workunit an atomic mutable, and skip rendering the parents of blocked leaves. We use the `blocked` flag both for `Tasks` (which wait directly for memoized Nodes, and so frequently block in this way), and in `BoundedCommandRunner`, which temporarily blocks the workunit while we're acquiring the semaphore. Additionally, we skip rendering or walking through `Completed` workunits, which can happen in the case of speculation if a parent workunit completes before a child. In order to toggle the `blocked` property on workunits, we expose the current `RunningWorkunit` in two new places: the `CommandRunner` and `WrappedNode`. In both cases, this is to allow the generic code to consume the workunit created by their callers and mark it blocked (for `Task` and `BoundedCommandRunner`). Fixes #12349. [ci skip-build-wheels]
#12369 adjusted the workunit graph to have the `BoundedCommandRunner` mark (what it thought was) its parent workunit as blocking while waiting to acquire a slot on the semaphore. But when #12748 fixed rendering of parent workunits, we experienced a regression in rendering with remote caching enabled: "Scheduling: ..." workunits were rendered when a process was blocked. #12369 contained a bug: the workunit being marked blocked by the `BoundedCommandRunner` was not always it's direct parent: in particular, under remote caching the workunit being marked blocking was in fact its grandparent. Marking that workunit blocked had no effect, because its child (the parent of the semaphore acquisition) would still cause it to render. To fix that, we move back to directly creating a workunit for `BoundedCommandRunner` semaphore acquisition, rather than marking the inbound workunit blocked. This also has the benefit of recording how long processes waited to acquire slots. This bug is to some degree an indictment of explicitly passing workunits to improve clarity... but on the other hand, it also seems to more strongly encourage operating on workunits that you have created, and which are living on your stack.
…ild#12973) pantsbuild#12369 adjusted the workunit graph to have the `BoundedCommandRunner` mark (what it thought was) its parent workunit as blocking while waiting to acquire a slot on the semaphore. But when pantsbuild#12748 fixed rendering of parent workunits, we experienced a regression in rendering with remote caching enabled: "Scheduling: ..." workunits were rendered when a process was blocked. pantsbuild#12369 contained a bug: the workunit being marked blocked by the `BoundedCommandRunner` was not always it's direct parent: in particular, under remote caching the workunit being marked blocking was in fact its grandparent. Marking that workunit blocked had no effect, because its child (the parent of the semaphore acquisition) would still cause it to render. To fix that, we move back to directly creating a workunit for `BoundedCommandRunner` semaphore acquisition, rather than marking the inbound workunit blocked. This also has the benefit of recording how long processes waited to acquire slots. This bug is to some degree an indictment of explicitly passing workunits to improve clarity... but on the other hand, it also seems to more strongly encourage operating on workunits that you have created, and which are living on your stack. # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
…ick of #12973) (#12975) #12369 adjusted the workunit graph to have the `BoundedCommandRunner` mark (what it thought was) its parent workunit as blocking while waiting to acquire a slot on the semaphore. But when #12748 fixed rendering of parent workunits, we experienced a regression in rendering with remote caching enabled: "Scheduling: ..." workunits were rendered when a process was blocked. #12369 contained a bug: the workunit being marked blocked by the `BoundedCommandRunner` was not always it's direct parent: in particular, under remote caching the workunit being marked blocking was in fact its grandparent. Marking that workunit blocked had no effect, because its child (the parent of the semaphore acquisition) would still cause it to render. To fix that, we move back to directly creating a workunit for `BoundedCommandRunner` semaphore acquisition, rather than marking the inbound workunit blocked. This also has the benefit of recording how long processes waited to acquire slots. This bug is to some degree an indictment of explicitly passing workunits to improve clarity... but on the other hand, it also seems to more strongly encourage operating on workunits that you have created, and which are living on your stack. [ci skip-build-wheels]
Our "graph" of execution is a DAG, while the workunits (used to visualize and record traces) form a tree. Because they form a tree, workunits do not always report that they are blocked on work that they didn't start, but which some other node started (common when lots of @rules are waiting on another single @rule which only one of them started).
To fix this, we make the
blocked
property of a workunit an atomic mutable, and skip rendering the parents of blocked leaves. We use theblocked
flag both forTasks
(which wait directly for memoized Nodes, and so frequently block in this way), and inBoundedCommandRunner
, which temporarily blocks the workunit while we're acquiring the semaphore. Additionally, we skip rendering or walking throughCompleted
workunits, which can happen in the case of speculation if a parent workunit completes before a child.In order to toggle the
blocked
property on workunits, we expose the currentRunningWorkunit
in two new places: theCommandRunner
andWrappedNode
. In both cases, this is to allow the generic code to consume the workunit created by their callers and mark it blocked (forTask
andBoundedCommandRunner
).Fixes #12349.