Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix spurious "Scheduling: ..." workunits with remote caching #12973

Merged

Conversation

stuhood
Copy link
Member

@stuhood stuhood commented Sep 21, 2021

#12369 adjusted the workunit graph to have the BoundedCommandRunner mark (what it thought was) its parent workunit as blocking while waiting to acquire a slot on the semaphore. But when #12748 fixed rendering of parent workunits, we experienced a regression in rendering with remote caching enabled: "Scheduling: ..." workunits were rendered when a process was blocked.

#12369 contained a bug: the workunit being marked blocked by the BoundedCommandRunner was not always its direct parent: in particular, under remote caching the workunit being marked blocking was in fact its grandparent. Marking that workunit blocked had no effect, because its child (the parent of the semaphore acquisition) would still cause it to render.

To fix that, we move back to directly creating a workunit for BoundedCommandRunner semaphore acquisition, rather than marking the inbound workunit blocked. This also has the benefit of recording how long processes waited to acquire slots.

This bug is to some degree an indictment of explicitly passing workunits to improve clarity... but on the other hand, it also seems to more strongly encourage operating on workunits that you have created, and which are living on your stack.

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@stuhood stuhood added this to the 2.7.x milestone Sep 21, 2021
@@ -77,7 +77,7 @@ impl AsyncSemaphore {
res
}

async fn acquire(&self) -> Permit<'_> {
pub async fn acquire(&self) -> Permit<'_> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was private from way back in the day when Future combinators were widely used. In an async-await world, it's easy to see how to use it correctly.

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@stuhood stuhood merged commit 3a9e15d into pantsbuild:main Sep 21, 2021
@stuhood stuhood deleted the stuhood/bounded-command-runner-direct-marking branch September 21, 2021 22:10
stuhood added a commit to stuhood/pants that referenced this pull request Sep 21, 2021
…ild#12973)

pantsbuild#12369 adjusted the workunit graph to have the `BoundedCommandRunner` mark (what it thought was) its parent workunit as blocking while waiting to acquire a slot on the semaphore. But when pantsbuild#12748 fixed rendering of parent workunits, we experienced a regression in rendering with remote caching enabled: "Scheduling: ..." workunits were rendered when a process was blocked.

pantsbuild#12369 contained a bug: the workunit being marked blocked by the `BoundedCommandRunner` was not always it's direct parent: in particular, under remote caching the workunit being marked blocking was in fact its grandparent. Marking that workunit blocked had no effect, because its child (the parent of the semaphore acquisition) would still cause it to render.

To fix that, we move back to directly creating a workunit for `BoundedCommandRunner` semaphore acquisition, rather than marking the inbound workunit blocked. This also has the benefit of recording how long processes waited to acquire slots.

This bug is to some degree an indictment of explicitly passing workunits to improve clarity... but on the other hand, it also seems to more strongly encourage operating on workunits that you have created, and which are living on your stack.
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
stuhood added a commit that referenced this pull request Sep 21, 2021
…ick of #12973) (#12975)

#12369 adjusted the workunit graph to have the `BoundedCommandRunner` mark (what it thought was) its parent workunit as blocking while waiting to acquire a slot on the semaphore. But when #12748 fixed rendering of parent workunits, we experienced a regression in rendering with remote caching enabled: "Scheduling: ..." workunits were rendered when a process was blocked.

#12369 contained a bug: the workunit being marked blocked by the `BoundedCommandRunner` was not always it's direct parent: in particular, under remote caching the workunit being marked blocking was in fact its grandparent. Marking that workunit blocked had no effect, because its child (the parent of the semaphore acquisition) would still cause it to render.

To fix that, we move back to directly creating a workunit for `BoundedCommandRunner` semaphore acquisition, rather than marking the inbound workunit blocked. This also has the benefit of recording how long processes waited to acquire slots.

This bug is to some degree an indictment of explicitly passing workunits to improve clarity... but on the other hand, it also seems to more strongly encourage operating on workunits that you have created, and which are living on your stack.

[ci skip-build-wheels]
@wisechengyi wisechengyi mentioned this pull request Oct 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants