-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve rolling Supervisor restarts at taskDuration #15859
Conversation
|
||
EasyMock.reset(spec); | ||
EasyMock.expect(spec.isSuspended()).andReturn(false).anyTimes(); | ||
EasyMock.expect(spec.getDataSchema()).andReturn(getDataSchema()).anyTimes(); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation Note test
SeekableStreamSupervisorSpec.getDataSchema
AtomicInteger stoppedTasks = new AtomicInteger(); | ||
// Sort task groups by start time to prioritize early termination of earlier groups, then iterate for processing | ||
activelyReadingTaskGroups | ||
.entrySet().stream().sorted( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the Concurrency check: the stream
change provide the same level of Concurrency guaranteed as the previous for
loop.
@JsonProperty | ||
public int getStopTaskCount() | ||
public Integer getStopTaskCount() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Would be nice to add a javadoc here telling devs to use #getMaxAllowedStops
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @YongGang
Description
This PR builds on the foundation set by #14396 which introduced rolling supervisor restarts as a strategy to potentially reduce the number of task slots required for task handover in Druid. The primary goal of this PR is to refine the behavior and utilization of this feature by addressing two specific areas:
stopTaskCount
defaulted to the value oftaskCount
if not explicitly specified in the Supervisor spec. This implicit behavior led to inconsistency whentaskCount
was manually increased, asstopTaskCount
did not automatically adjust to reflect the new taskCount. This PR eliminates this implicit defaulting. Now, ifstopTaskCount
is not set, it remains null and is preserved as such, requiring explicit configuration to activate rolling restarts.stopTaskCount
is configured to a low value (e.g., 1), it could lead to prolonged cycling and stopping of active tasks, especially for tasks that started earlier. To address this, the PR introduces logic to sort active running tasks by their start date, prioritizing the stopping of older tasks first. This approach aims to prevent early-started tasks from running excessively long, improving the overall efficiency of task cycling.Key changed/added classes in this PR
SeekableStreamSupervisor
updated to sort task groups by start time to prioritize early termination of earlier groupsSeekableStreamSupervisorIOConfig
removed the automatic defaulting ofstopTaskCount
totaskCount
, ensuring thatstopTaskCount
must be explicitly defined to influence the rolling restart behavior.This PR has: