-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MaintenanceTasks.stuck_task_duration
configuration
#949
Add MaintenanceTasks.stuck_task_duration
configuration
#949
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with this as a feature. I'm not a huge fan of the config name, but can't think of anything more appropriate right now. Maybe someone else on the team has ideas? threshold
seems a bit verbose 😅 Maybe even stuck_task_duration
or stuck_task_period
to imply it's a duration?
@@ -32,7 +32,6 @@ class Run < ApplicationRecord | |||
:cancelled, | |||
] | |||
COMPLETED_STATUSES = [:succeeded, :errored, :cancelled] | |||
STUCK_TASK_TIMEOUT = 5.minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering about using #deprecate_constant
, but MaintenanceTasks::Run
is @api private
, so I don't think we need to deprecate 👍 (Apps shouldn't be referencing this).
lib/maintenance_tasks.rb
Outdated
# The threshold in seconds after which a task is considered stuck. | ||
# | ||
# @return [Integer] the threshold in seconds after which a task is considered stuck. | ||
mattr_accessor :task_stuck_threshold, default: 5.minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it can be an integer since we call #ago
on it in #stuck?
-- I think it needs to be an AS::Duration
.
# The threshold in seconds after which a task is considered stuck. | |
# | |
# @return [Integer] the threshold in seconds after which a task is considered stuck. | |
mattr_accessor :task_stuck_threshold, default: 5.minutes | |
# The duration after which a task is considered stuck and can be force cancelled. | |
# | |
# @return [ActiveSupport::Duration, Numeric] duration after which a task is considered stuck. | |
mattr_accessor :task_stuck_threshold, default: 5.minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, in your suggestion you are still adding Numeric
, does it mean we want to support both
MaintenanceTasks.task_stuck_threshold = 60 # seconds implied
MaintenanceTasks.task_stuck_threshold = 1.minute
I think I like the API being flexible, but perhaps instead of the return value we need to define writer so writer will accept several inputs but will always turn it into ActiveSupport::Duration
. Does this sound reasonable?
Or maybe to avoid confusion we should choose one and that's it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead with ActiveSupport::Duration
only as I feel like its unlikely application will prefer setting this value in seconds like MaintenanceTasks.task_stuck_threshold = 300 # seconds implied
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just a duration is acceptable. Supporting multiple values isn't needed if we only use it one way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that SGTM 👍
I like |
afca161
to
2bdbb95
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Can we add a section to the README under https://github.com/Shopify/maintenance_tasks?tab=readme-ov-file#configuring-the-gem to document this option before we merge?
MaintenanceTasks.task_stuck_threshold
configurationMaintenanceTasks.stuck_task_duration
configuration
2bdbb95
to
c56a2a4
Compare
Added a section in the README. I'll merge and start preparing a release. But feel free to tweak the README entry anytime |
Having
STUCK_TASK_TIMEOUT
as a constant is not helpful since the actual value will be highly dependent on the job infrastructure of the applications. Simple setups may consider the task being stuck after no updates for ~ 1 minute while more complex and distributed systems with job retries and backoffs may legitimately prefer higher threshold to account for natural delays in a distributed systemIt's a breaking change for applications that relied on
STUCK_TASK_TIMEOUT
though the fix is going to be pretty straightforward.What reviewers should focus on
Is
MaintenanceTasks
module a right place for the config to live? One common alternative would be to define it on theTask
class which will allow redefining it on per-task basis. But I don't think this would make sense and the behavior is job-infra specific rather than task-specific.