Fix spurious "Scheduling: ..." workunits with remote caching #12973

stuhood · 2021-09-21T20:55:06Z

#12369 adjusted the workunit graph to have the BoundedCommandRunner mark (what it thought was) its parent workunit as blocking while waiting to acquire a slot on the semaphore. But when #12748 fixed rendering of parent workunits, we experienced a regression in rendering with remote caching enabled: "Scheduling: ..." workunits were rendered when a process was blocked.

#12369 contained a bug: the workunit being marked blocked by the BoundedCommandRunner was not always its direct parent: in particular, under remote caching the workunit being marked blocking was in fact its grandparent. Marking that workunit blocked had no effect, because its child (the parent of the semaphore acquisition) would still cause it to render.

To fix that, we move back to directly creating a workunit for BoundedCommandRunner semaphore acquisition, rather than marking the inbound workunit blocked. This also has the benefit of recording how long processes waited to acquire slots.

This bug is to some degree an indictment of explicitly passing workunits to improve clarity... but on the other hand, it also seems to more strongly encourage operating on workunits that you have created, and which are living on your stack.

[ci skip-build-wheels]

…parent workunit. [ci skip-build-wheels]

# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

stuhood · 2021-09-21T20:56:13Z

src/rust/engine/async_semaphore/src/lib.rs

@@ -77,7 +77,7 @@ impl AsyncSemaphore {
    res
  }

-  async fn acquire(&self) -> Permit<'_> {
+  pub async fn acquire(&self) -> Permit<'_> {


This was private from way back in the day when Future combinators were widely used. In an async-await world, it's easy to see how to use it correctly.

Eric-Arellano

Thanks!

…ild#12973) pantsbuild#12369 adjusted the workunit graph to have the `BoundedCommandRunner` mark (what it thought was) its parent workunit as blocking while waiting to acquire a slot on the semaphore. But when pantsbuild#12748 fixed rendering of parent workunits, we experienced a regression in rendering with remote caching enabled: "Scheduling: ..." workunits were rendered when a process was blocked. pantsbuild#12369 contained a bug: the workunit being marked blocked by the `BoundedCommandRunner` was not always it's direct parent: in particular, under remote caching the workunit being marked blocking was in fact its grandparent. Marking that workunit blocked had no effect, because its child (the parent of the semaphore acquisition) would still cause it to render. To fix that, we move back to directly creating a workunit for `BoundedCommandRunner` semaphore acquisition, rather than marking the inbound workunit blocked. This also has the benefit of recording how long processes waited to acquire slots. This bug is to some degree an indictment of explicitly passing workunits to improve clarity... but on the other hand, it also seems to more strongly encourage operating on workunits that you have created, and which are living on your stack. # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

…ick of #12973) (#12975) #12369 adjusted the workunit graph to have the `BoundedCommandRunner` mark (what it thought was) its parent workunit as blocking while waiting to acquire a slot on the semaphore. But when #12748 fixed rendering of parent workunits, we experienced a regression in rendering with remote caching enabled: "Scheduling: ..." workunits were rendered when a process was blocked. #12369 contained a bug: the workunit being marked blocked by the `BoundedCommandRunner` was not always it's direct parent: in particular, under remote caching the workunit being marked blocking was in fact its grandparent. Marking that workunit blocked had no effect, because its child (the parent of the semaphore acquisition) would still cause it to render. To fix that, we move back to directly creating a workunit for `BoundedCommandRunner` semaphore acquisition, rather than marking the inbound workunit blocked. This also has the benefit of recording how long processes waited to acquire slots. This bug is to some degree an indictment of explicitly passing workunits to improve clarity... but on the other hand, it also seems to more strongly encourage operating on workunits that you have created, and which are living on your stack. [ci skip-build-wheels]

stuhood added 3 commits September 21, 2021 12:44

Expose a remote cache lookup workunit.

d3db98c

[ci skip-build-wheels]

BoundedCommandRunner blocks its own private workunit rather than the …

2123d5e

…parent workunit. [ci skip-build-wheels]

Test that blocked children do not cause parents to render.

94fb34a

# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

stuhood added the needs-cherrypick label Sep 21, 2021

stuhood added this to the 2.7.x milestone Sep 21, 2021

stuhood requested review from jsirois, tdyas and Eric-Arellano September 21, 2021 20:55

stuhood commented Sep 21, 2021

View reviewed changes

Eric-Arellano approved these changes Sep 21, 2021

View reviewed changes

jsirois approved these changes Sep 21, 2021

View reviewed changes

stuhood merged commit 3a9e15d into pantsbuild:main Sep 21, 2021

stuhood deleted the stuhood/bounded-command-runner-direct-marking branch September 21, 2021 22:10

stuhood removed the needs-cherrypick label Sep 22, 2021

wisechengyi mentioned this pull request Oct 2, 2021

2.8.0.dev3 prep #13081

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix spurious "Scheduling: ..." workunits with remote caching #12973

Fix spurious "Scheduling: ..." workunits with remote caching #12973

stuhood commented Sep 21, 2021 •

edited

Loading

stuhood Sep 21, 2021

Eric-Arellano left a comment

Fix spurious "Scheduling: ..." workunits with remote caching #12973

Fix spurious "Scheduling: ..." workunits with remote caching #12973

Conversation

stuhood commented Sep 21, 2021 • edited Loading

stuhood Sep 21, 2021

Choose a reason for hiding this comment

Eric-Arellano left a comment

Choose a reason for hiding this comment

stuhood commented Sep 21, 2021 •

edited

Loading