Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

send active support notifications on run events #961

Merged
merged 11 commits into from
Feb 20, 2024

Conversation

elfassy
Copy link
Contributor

@elfassy elfassy commented Feb 2, 2024

closes #747

Why

Callbacks on the task are not meant to be used for instrumentation across all tasks. Task callbacks also don't have access to the Run object, preventing this logic from accessing the metadata or other information stored on the run. This makes it difficult to set notifications for status changes, for instance to send slack messages when a job is completed

What

Send active support notifications on the Run status changes

      maintenance_tasks.enqueued,    # The task has been enqueued by the user.
      maintenance_tasks.succeeded,   # The task finished without error.
      maintenance_tasks.cancelled,   # The user explicitly halted the task's execution.
      maintenance_tasks.paused,      # The task was paused in the middle of the run by the user.
      maintenance_tasks.errored,     # The task code produced an unhandled exception.

Example

ActiveSupport::Notifications.subscribe("maintenance_tasks.succeeded") do |*args|
       payload = args.last
       run = payload[:run]
       run.task_name
       run.arguments
       run.metadata
end

ActiveSupport::Notifications.subscribe("maintenance_tasks.errored") do |*args|
       payload = args.last
       run = payload[:run]
       run.task_name
       run.arguments
       run.error_message
       run.error_class
end

# or

class MaintenanceTasksInstrumenter < ActiveSupport::Subscriber
 attach_to :maintenance_tasks
 
 def enqueued(event)
   run = event.payload[:run]
   SlackNotifier.broadcast(SLACK_CHANNEL, "Job #{run.task_name} was started by #{run.metadata[:user_email]}} with arguments #{run.arguments.to_s.truncate(255)}")
 end
 
 def errored(event)
   run = event.payload[:run]
   SlackNotifier.broadcast(SLACK_CHANNEL, "Job #{run.task_name} halted with an error")
 end
 
 def succeeded(event)
   run = event.payload[:run]
   # do something anytime a maintenance task succeeds
   SlackNotifier.broadcast(SLACK_CHANNEL, "Job #{run.task_name} completed successfully")
 end
 
 ...
end

@elfassy elfassy requested a review from fridaynosaur February 5, 2024 16:37
test/models/maintenance_tasks/run_test.rb Outdated Show resolved Hide resolved
app/models/maintenance_tasks/run.rb Outdated Show resolved Hide resolved
app/models/maintenance_tasks/run.rb Outdated Show resolved Hide resolved
@elfassy elfassy force-pushed the active_support_notifications branch 2 times, most recently from 2516581 to 35feb05 Compare February 6, 2024 14:02
Copy link

@fridaynosaur fridaynosaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice changes, re-read everything and it looks awesome.

@@ -448,6 +450,15 @@ def task

private

def instrument_status_change
return if status.nil? || !status_previously_changed?
return if status.in?(["running", "pausing", "cancelling", "interrupted"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not these events?

Copy link
Contributor Author

@elfassy elfassy Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

original discussion here #961 (comment). TLDR: these states are easily confused with other states, and there's no use case that we know of for them b499351. This commit can easily be reversed if need be

these states are easily confused with other states. For example
- running vs enqueued
- pausing vs paused
- cancelling vs cancelled
- interrupted vs paused

If someone has a valid use case for these states, feel free to revert this commit and add them back

README.md Outdated Show resolved Hide resolved
Copy link
Member

@ryanseys ryanseys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, couple questions

@elfassy elfassy force-pushed the active_support_notifications branch from 03f0ac5 to e275f83 Compare February 12, 2024 14:16
@elfassy elfassy requested a review from a team February 14, 2024 13:46
README.md Outdated Show resolved Hide resolved
app/models/maintenance_tasks/run.rb Outdated Show resolved Hide resolved
return if status.in?(["running", "pausing", "cancelling", "interrupted"])

ActiveSupport::Notifications.instrument("maintenance_tasks.#{status}", run: self)
rescue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the idea to ensure that issuing a notification will not result in the whole flow being halted but I think this rescue overly defensive. Realistically we don't expect anything to go wrong in this method but rescue will swallow all the problems including even NoMethodError

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need it because if any of the subscriber blocks raise, the instrument calls will raise, which could stop the job/task. We can't restrict the exceptions we want to rescue because instrument will only raise ActiveSupport::Notifications::InstrumentationSubscriberError if there's more than one exception. So if there's only one subscriber and it raises, or if there are multiple but only one raises, the error is re-raised as-is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need it because if any of the subscriber blocks raise

Doesn't it mean that subscribers should be the one to be defensive?
Otherwise it sounds like everything, including Rails itself should accompany every instrument call with a rescue block due to subscriber potentially raising.

Perhaps I'm missing something?

Copy link
Contributor Author

@elfassy elfassy Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To Etienne's point, unrelated tests start failing if I remove this rescue

Error:
MaintenanceTasks::TaskJobTest#test_.perform_now_pauses_run_when_race_occurs_between_interrupting_and_pausing_run:
ActiveRecord::RecordInvalid: Validation failed: Status Cannot transition run from status errored to pausing
    /Users/michaelelfassy/.gem/ruby/3.3.0/gems/activerecord-7.1.3/lib/active_record/validations.rb:84:in `raise_validation_error'
    /Users/michaelelfassy/.gem/ruby/3.3.0/gems/activerecord-7.1.3/lib/active_record/validations.rb:55:in `save!'
    /Users/michaelelfassy/.gem/ruby/3.3.0/gems/activerecord-7.1.3/lib/active_record/transactions.rb:313:in `block in save!'
    /Users/michaelelfassy/.gem/ruby/3.3.0/gems/activerecord-7.1.3/lib/active_record/transactions.rb:365:in `block in with_transaction_returning_status'
    /Users/michaelelfassy/.gem/ruby/3.3.0/gems/activerecord-7.1.3/lib/active_record/connection_adapters/abstract/database_statements.rb:342:in `transaction'
    /Users/michaelelfassy/.gem/ruby/3.3.0/gems/activerecord-7.1.3/lib/active_record/transactions.rb:361:in `with_transaction_returning_status'
    /Users/michaelelfassy/.gem/ruby/3.3.0/gems/activerecord-7.1.3/lib/active_record/transactions.rb:313:in `save!'
    /Users/michaelelfassy/.gem/ruby/3.3.0/gems/activerecord-7.1.3/lib/active_record/suppressor.rb:56:in `save!'
    /Users/michaelelfassy/.gem/ruby/3.3.0/gems/activerecord-7.1.3/lib/active_record/persistence.rb:906:in `block in update!'
    /Users/michaelelfassy/.gem/ruby/3.3.0/gems/activerecord-7.1.3/lib/active_record/transactions.rb:365:in `block in with_transaction_returning_status'
    /Users/michaelelfassy/.gem/ruby/3.3.0/gems/activerecord-7.1.3/lib/active_record/connection_adapters/abstract/database_statements.rb:342:in `transaction'
    /Users/michaelelfassy/.gem/ruby/3.3.0/gems/activerecord-7.1.3/lib/active_record/transactions.rb:361:in `with_transaction_returning_status'
    /Users/michaelelfassy/.gem/ruby/3.3.0/gems/activerecord-7.1.3/lib/active_record/persistence.rb:904:in `update!'
    /Users/michaelelfassy/.gem/ruby/3.3.0/gems/activerecord-7.1.3/lib/active_record/enum.rb:313:in `block in define_enum_methods'
    test/jobs/maintenance_tasks/task_job_test.rb:516:in `block (2 levels) in <class:TaskJobTest>'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that rescue was rescuing ActiveRecord::RecordInvalid: Validation failed: Status Cannot transition run from status errored to pausing ?

Doesn't it prove my point that we will start rescuing errors that have nothing to do with issuing a notification and may indicate a real issue with the flow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nvasilevski you're correct, this exception is due to an invalid mock and was getting swallowed. I'm not sure if we want to raise only in tests or if we want to remove the rescue all together

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still in favor of removing it because I believe it's subscriber's responsibility to catch any potential issues and the gem should not swallow it if means breaking tasks's run state. But I could have misunderstood @etiennebarrie point so let me chat with him!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah removing the rescue clause is probably what makes the most sense. If we were to keep it, we should find a way to re-raise the exception from the job after the data is correctly persisted. Maybe we could add the rescue to the readme example to show the best practice that prevents the instrumentation block from raising.

@elfassy elfassy force-pushed the active_support_notifications branch from a29de72 to 09ca611 Compare February 14, 2024 18:16
Copy link
Contributor

@adrianna-chang-shopify adrianna-chang-shopify left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! ❤️ One small request for additional docs, and let's wait to see if anyone else on the team has feedback, but otherwise 👍

README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@elfassy elfassy force-pushed the active_support_notifications branch 2 times, most recently from bd029fe to ee235fe Compare February 16, 2024 18:48
elfassy and others added 6 commits February 19, 2024 09:21
I'm excluding these states as they are easily confused with others and are likely to create confusion:
- running vs enqueued
- pausing vs paused
- cancelling vs cancelled
- interrupted vs paused

If someone has a valid use case for these states, feel free to revert this commit and add them back
Co-authored-by: Étienne Barrié <[email protected]>
@elfassy elfassy force-pushed the active_support_notifications branch from fecbc6b to 36d7c08 Compare February 19, 2024 14:22
@nvasilevski
Copy link
Contributor

Let's squash the commits before merging please

@elfassy elfassy merged commit 7363b64 into main Feb 20, 2024
41 checks passed
@elfassy elfassy deleted the active_support_notifications branch February 20, 2024 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for ActiveSupport Notifications
8 participants