Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSQ: Improved worker cancellation. #17046

Merged
merged 8 commits into from
Sep 15, 2024
Merged

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Sep 12, 2024

Four changes:

  1. FrameProcessorExecutor now requires that cancellationIds be registered
    with "registerCancellationId" prior to being used in "runFully" or "runAllFully".

  2. FrameProcessorExecutor gains an "asExecutor" method, which allows that
    executor to be used as an executor for future callbacks in such a way
    that respects cancellationId.

  3. RunWorkOrder gains a "stop" method, which cancels the current
    cancellationId and closes the current FrameContext. It blocks until
    both operations are complete.

  4. Fixes a bug in RunAllFullyWidget where "processorManager.result()" was
    called outside "runAllFullyLock", which could cause it to be called
    out-of-order with "cleanup()" in case of cancellation or other error.

Together, these changes help ensure cancellation does not have races. Once "cancel" is called for a given cancellationId, all existing processors and running callbacks are canceled and exit in an orderly manner. Future processors and callbacks with the same cancellationId are rejected before being executed.

Four changes:

1) FrameProcessorExecutor now requires that cancellationIds be registered
   with "registerCancellationId" prior to being used in "runFully" or "runAllFully".

2) FrameProcessorExecutor gains an "asExecutor" method, which allows that
   executor to be used as an executor for future callbacks in such a way
   that respects cancellationId.

3) RunWorkOrder gains a "stop" method, which cancels the current
   cancellationId and closes the current FrameContext. It blocks until
   both operations are complete.

4) Fixes a bug in RunAllFullyWidget where "processorManager.result()" was
   called outside "runAllFullyLock", which could cause it to be called
   out-of-order with "cleanup()" in case of cancellation or other error.

Together, these changes help ensure cancellation does not have races.
Once "cancel" is called for a given cancellationId, all existing processors
and running callbacks are canceled and exit in an orderly manner. Future
processors and callbacks with the same cancellationId are rejected
before being executed.
@github-actions github-actions bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Sep 12, 2024
@gianm gianm merged commit 6fac267 into apache:master Sep 15, 2024
90 checks passed
@gianm gianm deleted the msq-improve-cancellation branch September 15, 2024 08:22
gianm added a commit to gianm/druid that referenced this pull request Sep 15, 2024
In apache#17046, an exception thrown by RunWorkOrder#startAsync would be ignored
and replaced with a generic CanceledFault. This patch fixes it by retaining
the original error.
cryptoe pushed a commit that referenced this pull request Sep 17, 2024
…#17069)

* MSQ: Properly report errors that occur when starting up RunWorkOrder.

In #17046, an exception thrown by RunWorkOrder#startAsync would be ignored
and replaced with a generic CanceledFault. This patch fixes it by retaining
the original error.
pranavbhole pushed a commit to pranavbhole/druid that referenced this pull request Sep 17, 2024
* MSQ: Improved worker cancellation.

Four changes:

1) FrameProcessorExecutor now requires that cancellationIds be registered
   with "registerCancellationId" prior to being used in "runFully" or "runAllFully".

2) FrameProcessorExecutor gains an "asExecutor" method, which allows that
   executor to be used as an executor for future callbacks in such a way
   that respects cancellationId.

3) RunWorkOrder gains a "stop" method, which cancels the current
   cancellationId and closes the current FrameContext. It blocks until
   both operations are complete.

4) Fixes a bug in RunAllFullyWidget where "processorManager.result()" was
   called outside "runAllFullyLock", which could cause it to be called
   out-of-order with "cleanup()" in case of cancellation or other error.

Together, these changes help ensure cancellation does not have races.
Once "cancel" is called for a given cancellationId, all existing processors
and running callbacks are canceled and exit in an orderly manner. Future
processors and callbacks with the same cancellationId are rejected
before being executed.

* Fix test.

* Use execute, which doesn't return, to avoid errorprone complaints.

* Fix some style stuff.

* Further enhancements.

* Fix style.
pranavbhole pushed a commit to pranavbhole/druid that referenced this pull request Sep 17, 2024
…apache#17069)

* MSQ: Properly report errors that occur when starting up RunWorkOrder.

In apache#17046, an exception thrown by RunWorkOrder#startAsync would be ignored
and replaced with a generic CanceledFault. This patch fixes it by retaining
the original error.
@kfaraz kfaraz added this to the 31.0.0 milestone Oct 4, 2024
kfaraz pushed a commit to kfaraz/druid that referenced this pull request Oct 4, 2024
* MSQ: Improved worker cancellation.

Four changes:

1) FrameProcessorExecutor now requires that cancellationIds be registered
   with "registerCancellationId" prior to being used in "runFully" or "runAllFully".

2) FrameProcessorExecutor gains an "asExecutor" method, which allows that
   executor to be used as an executor for future callbacks in such a way
   that respects cancellationId.

3) RunWorkOrder gains a "stop" method, which cancels the current
   cancellationId and closes the current FrameContext. It blocks until
   both operations are complete.

4) Fixes a bug in RunAllFullyWidget where "processorManager.result()" was
   called outside "runAllFullyLock", which could cause it to be called
   out-of-order with "cleanup()" in case of cancellation or other error.

Together, these changes help ensure cancellation does not have races.
Once "cancel" is called for a given cancellationId, all existing processors
and running callbacks are canceled and exit in an orderly manner. Future
processors and callbacks with the same cancellationId are rejected
before being executed.

* Fix test.

* Use execute, which doesn't return, to avoid errorprone complaints.

* Fix some style stuff.

* Further enhancements.

* Fix style.
kfaraz added a commit that referenced this pull request Oct 4, 2024
…) (#17074) (#17076) (#17077) (#17193) (#17243)

Backport the following patches for a clean backport of Dart changes
1. Add "targetPartitionsPerWorker" setting for MSQ. (#17048)
2. MSQ: Improved worker cancellation. (#17046)
3. Add "includeAllCounters()" to WorkerContext. (#17047)
4. MSQ: Include worker context maps in WorkOrders. (#17076)
5. TableInputSpecSlicer changes to support running on Brokers. (#17074)
6. Fix call to MemoryIntrospector in IndexerControllerContext. (#17066)
7. MSQ: Add QueryKitSpec to encapsulate QueryKit params. (#17077)
8. MSQ: Use task context flag useConcurrentLocks to determine task lock type (#17193)
kfaraz pushed a commit to kfaraz/druid that referenced this pull request Oct 5, 2024
…apache#17069)

* MSQ: Properly report errors that occur when starting up RunWorkOrder.

In apache#17046, an exception thrown by RunWorkOrder#startAsync would be ignored
and replaced with a generic CanceledFault. This patch fixes it by retaining
the original error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants