-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: improve Slack rate limiting logic when updating alert groups #5287
Merged
Merged
Changes from all commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
2e7e46f
fix: improve Slack rate limiting logic when updating alert groups
joeyorlando ee013d8
remove `django-deprecate-fields`
joeyorlando 202f8b6
Merge branch 'dev' into jorlando/integration-slack-rate-limiting
joeyorlando 6229ff3
wip
joeyorlando 26df057
Merge branch 'jorlando/integration-slack-rate-limiting' of github.com…
joeyorlando 522a699
linting
joeyorlando 4cafa21
wip
joeyorlando 2d55fb5
wip
joeyorlando 5b9b4a8
wip
joeyorlando 1f49245
wip
joeyorlando a49a674
wip
joeyorlando 39e4536
Merge branch 'dev' into jorlando/integration-slack-rate-limiting
joeyorlando 9f9d643
wip
joeyorlando 9f86765
wip
joeyorlando 090b64a
test
joeyorlando 0763c46
wip
joeyorlando 8075140
wip
joeyorlando bd3f6fd
wip
joeyorlando bb6624b
wip
joeyorlando 54561d6
wip
joeyorlando 83a2bf3
wip
joeyorlando d9c9988
wip
joeyorlando b517883
Merge branch 'dev' into jorlando/integration-slack-rate-limiting
joeyorlando 0ef7aed
fix indentation (from merge conflict resolution in GH UI)
joeyorlando 221db90
Merge branch 'dev' into jorlando/integration-slack-rate-limiting
joeyorlando 97c67d4
linting
joeyorlando 10cbeac
wip
joeyorlando 82db6a5
PR comment
joeyorlando d175490
pr comment
joeyorlando 6ad5957
linting
joeyorlando File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,10 @@ | |
import typing | ||
import uuid | ||
|
||
from celery import uuid as celery_uuid | ||
from django.core.cache import cache | ||
from django.db import models | ||
from django.utils import timezone | ||
|
||
from apps.slack.client import SlackClient | ||
from apps.slack.constants import BLOCK_SECTION_TEXT_MAX_SIZE | ||
|
@@ -15,6 +18,7 @@ | |
SlackAPIRatelimitError, | ||
SlackAPITokenError, | ||
) | ||
from apps.slack.tasks import update_alert_group_slack_message | ||
|
||
if typing.TYPE_CHECKING: | ||
from apps.alerts.models import AlertGroup | ||
|
@@ -30,6 +34,8 @@ class SlackMessage(models.Model): | |
alert_group: typing.Optional["AlertGroup"] | ||
channel: "SlackChannel" | ||
|
||
ALERT_GROUP_UPDATE_DEBOUNCE_INTERVAL_SECONDS = 45 | ||
|
||
id = models.CharField(primary_key=True, default=uuid.uuid4, editable=False, max_length=36) | ||
slack_id = models.CharField(max_length=100) | ||
|
||
|
@@ -85,7 +91,7 @@ class SlackMessage(models.Model): | |
|
||
active_update_task_id = models.CharField(max_length=100, null=True, default=None) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this field is not used anywhere. At first I considered using, but went a different route. Marking it to be dropped in a subsequent PR/release. |
||
""" | ||
ID of the latest celery task to update the message | ||
DEPRECATED/TODO: drop this field in a separate PR/release | ||
""" | ||
|
||
class Meta: | ||
|
@@ -259,3 +265,87 @@ def send_slack_notification( | |
slack_user_identity.send_link_to_slack_message(slack_message) | ||
except (SlackAPITokenError, SlackAPIMethodNotSupportedForChannelTypeError): | ||
pass | ||
|
||
def _get_update_message_cache_key(self) -> str: | ||
return f"update_alert_group_slack_message_{self.alert_group.pk}" | ||
|
||
def get_active_update_task_id(self) -> typing.Optional[str]: | ||
return cache.get(self._get_update_message_cache_key(), default=None) | ||
|
||
def set_active_update_task_id(self, task_id: str) -> None: | ||
""" | ||
NOTE: we store the task ID in the cache for twice the debounce interval to ensure that the task ID is | ||
EVENTUALLY removed. The background task which updates the message will remove the task ID from the cache, but | ||
this is a safety measure in case the task fails to run or complete. The task ID would be removed from the cache | ||
which would then allow the message to be updated again in a subsequent call to this method. | ||
""" | ||
cache.set( | ||
self._get_update_message_cache_key(), | ||
task_id, | ||
timeout=self.ALERT_GROUP_UPDATE_DEBOUNCE_INTERVAL_SECONDS * 2, | ||
) | ||
|
||
def mark_active_update_task_as_complete(self) -> None: | ||
self.last_updated = timezone.now() | ||
self.save(update_fields=["last_updated"]) | ||
|
||
cache.delete(self._get_update_message_cache_key()) | ||
|
||
def update_alert_groups_message(self, debounce: bool) -> None: | ||
""" | ||
Schedule an update task for the associated alert group's Slack message, respecting the debounce interval. | ||
|
||
This method ensures that updates to the Slack message related to an alert group are not performed | ||
too frequently, adhering to the `ALERT_GROUP_UPDATE_DEBOUNCE_INTERVAL_SECONDS` debounce interval. | ||
It schedules a background task to update the message after the appropriate countdown. | ||
|
||
The method performs the following steps: | ||
- Checks if there's already an active update task ID set in the cache. If so, exits to prevent | ||
duplicate scheduling. | ||
- Calculates the time since the last update (`last_updated` field) and determines the remaining time needed | ||
to respect the debounce interval. | ||
- Schedules the `update_alert_group_slack_message` task with the calculated countdown. | ||
- Stores the task ID in the cache to prevent multiple tasks from being scheduled. | ||
|
||
debounce: bool - this is intended to be used when we want to debounce updates to the message. Examples: | ||
- when set to True, we will skip scheduling an update task if there's an active update task (eg. debounce it) | ||
- when set to False, we will immediately schedule an update task | ||
""" | ||
if not self.alert_group: | ||
logger.warning( | ||
f"skipping update_alert_groups_message as SlackMessage {self.pk} has no alert_group associated with it" | ||
) | ||
return | ||
|
||
active_update_task_id = self.get_active_update_task_id() | ||
if debounce and active_update_task_id is not None: | ||
logger.info( | ||
f"skipping update_alert_groups_message as SlackMessage {self.pk} has an active update task " | ||
f"{active_update_task_id} and debounce is set to True" | ||
) | ||
return | ||
|
||
now = timezone.now() | ||
|
||
# we previously weren't updating the last_updated field for messages, so there will be cases | ||
# where the last_updated field is None | ||
last_updated = self.last_updated or now | ||
|
||
time_since_last_update = (now - last_updated).total_seconds() | ||
remaining_time = self.ALERT_GROUP_UPDATE_DEBOUNCE_INTERVAL_SECONDS - int(time_since_last_update) | ||
countdown = max(remaining_time, 10) if debounce else 0 | ||
|
||
logger.info( | ||
f"updating message for alert_group {self.alert_group.pk} in {countdown} seconds " | ||
f"(debounce interval: {self.ALERT_GROUP_UPDATE_DEBOUNCE_INTERVAL_SECONDS})" | ||
) | ||
|
||
task_id = celery_uuid() | ||
|
||
# NOTE: we need to persist the task ID in the cache before scheduling the task to prevent | ||
# a race condition where the task starts before the task ID is stored in the cache as the task | ||
# does a check to verify that the celery task id matches the one stored in the cache | ||
# | ||
# (see update_alert_group_slack_message task for more details) | ||
self.set_active_update_task_id(task_id) | ||
update_alert_group_slack_message.apply_async((self.pk,), countdown=countdown, task_id=task_id) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is currently only invoked in two spots, both of which have been updated (additionally, both of these spots that were invoking this function were already passing in a
delay
, hence why I removed the default ofSLACK_RATE_LIMIT_DELAY
)