Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix spurious "Scheduling: ..." workunits with remote caching (cherrypick of #12973) #12975

Merged
merged 1 commit into from
Sep 21, 2021

Conversation

stuhood
Copy link
Member

@stuhood stuhood commented Sep 21, 2021

#12369 adjusted the workunit graph to have the BoundedCommandRunner mark (what it thought was) its parent workunit as blocking while waiting to acquire a slot on the semaphore. But when #12748 fixed rendering of parent workunits, we experienced a regression in rendering with remote caching enabled: "Scheduling: ..." workunits were rendered when a process was blocked.

#12369 contained a bug: the workunit being marked blocked by the BoundedCommandRunner was not always it's direct parent: in particular, under remote caching the workunit being marked blocking was in fact its grandparent. Marking that workunit blocked had no effect, because its child (the parent of the semaphore acquisition) would still cause it to render.

To fix that, we move back to directly creating a workunit for BoundedCommandRunner semaphore acquisition, rather than marking the inbound workunit blocked. This also has the benefit of recording how long processes waited to acquire slots.

This bug is to some degree an indictment of explicitly passing workunits to improve clarity... but on the other hand, it also seems to more strongly encourage operating on workunits that you have created, and which are living on your stack.

[ci skip-build-wheels]

…ild#12973)

pantsbuild#12369 adjusted the workunit graph to have the `BoundedCommandRunner` mark (what it thought was) its parent workunit as blocking while waiting to acquire a slot on the semaphore. But when pantsbuild#12748 fixed rendering of parent workunits, we experienced a regression in rendering with remote caching enabled: "Scheduling: ..." workunits were rendered when a process was blocked.

pantsbuild#12369 contained a bug: the workunit being marked blocked by the `BoundedCommandRunner` was not always it's direct parent: in particular, under remote caching the workunit being marked blocking was in fact its grandparent. Marking that workunit blocked had no effect, because its child (the parent of the semaphore acquisition) would still cause it to render.

To fix that, we move back to directly creating a workunit for `BoundedCommandRunner` semaphore acquisition, rather than marking the inbound workunit blocked. This also has the benefit of recording how long processes waited to acquire slots.

This bug is to some degree an indictment of explicitly passing workunits to improve clarity... but on the other hand, it also seems to more strongly encourage operating on workunits that you have created, and which are living on your stack.
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@stuhood stuhood merged commit 62ba296 into pantsbuild:2.7.x Sep 21, 2021
@stuhood stuhood deleted the stuhood/pick-12973-for-2.7.x branch September 21, 2021 23:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants