Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support alerting provider overrides in alert configuration #96

Closed
TwiN opened this issue Mar 12, 2021 · 14 comments · Fixed by #929
Closed

Support alerting provider overrides in alert configuration #96

TwiN opened this issue Mar 12, 2021 · 14 comments · Fixed by #929
Assignees
Labels
area/alerting Related to alerting feature New feature or request good first issue Good for newcomers

Comments

@TwiN
Copy link
Owner

TwiN commented Mar 12, 2021

Currently, users can target specific groups of people by using the description as a way to notify specific groups for specific alerts. This, however, does not apply for all alerting providers.

For instance:

alerting:
  slack: 
    webhook-url: "https://hooks.slack.com/services/**********/**********/**********"

services:
  - name: relevant-to-managers
    url: "https://example-for-managers.org/"
    alerts:
      - type: slack
        enabled: true
        description: "<@!subteam^SLACK_MANAGER_GROUP_ID>"
    conditions:
      - "[STATUS] == 200"

  - name: relevant-to-devs
    url: "https://example-for-devs.org/"
    alerts:
      - type: slack
        enabled: true
        description: "<!subteam^SLACK_DEV_GROUP_ID>"
    conditions:
      - "[STATUS] == 200"

(note the services[].alerts[].description, reference)

By tweaking the notification settings, each groups could be notified only when it concerns them.

That being said, I do understand that this is more of a hack than a viable solution; I think the root cause is that there may be a need for supporting multiple slack webhooks rather than just one, even if you could technically use custom to create a 2nd Slack provider, having the ability to create N alerting providers would be better.

Across all alerting providers, there appears to always only one parameter that may need to change in order to support multiple recipients:

  • Slack: webhook-url
  • PagerDuty: integration-key
  • Twilio: to
  • Mattermost: recipients
  • Discord: webhook-url
  • Custom: URL, but probably irrelevant

As a result, we may be able to just add a single optional parameter to the service alert (as opposed to the alerting provider) to override that "key" parameter for a specific service:

alerting:
  slack: 
    webhook-url: "https://hooks.slack.com/services/**********/**********/webhook-for-managers"

services:
  - name: relevant-to-managers
    url: "https://example-for-managers.org/"
    alerts:
      - type: slack
        enabled: true
    conditions:
      - "[STATUS] == 200"

  - name: relevant-to-devs
    url: "https://example-for-devs.org/"
    alerts:
      - type: slack
        enabled: true
        override-target: "https://hooks.slack.com/services/**********/**********/webhook-for-devs"
    conditions:
      - "[STATUS] == 200"

As a result, the relevant-to-devs service would send to the overridden webhook https://hooks.slack.com/services/**********/**********/webhook-for-devs instead of the default "https://hooks.slack.com/services/**********/**********/webhook-for-managers".

It's a little hacky, especially when you look at the Slack alerting provider which only has a single parameter (webhook-url) as opposed to the Twilio alerting provider, which has several parameters out of which only one is relevant to alert a different target (to), but I think that if we want to avoid breaking changes, this is the best way to do it: it's transparent for those that don't need the feature, and while most users will likely never use that feature, the option is still available.

@TwiN
Copy link
Owner Author

TwiN commented Mar 12, 2021

ref: #84

@TwiN TwiN added the feature New feature or request label Mar 12, 2021
@zeylos
Copy link
Contributor

zeylos commented Apr 12, 2021

Hi there,

Wouldn't it be easier to allow a list of webhooks indexed by names under slack or mattermost ?
Something like that :

alerting:
  slack: 
    - name: manager
      webhook-url: "https://hooks.slack.com/services/**********/**********/webhook-for-managers"
    - name: devs
      webhook-url: "https://hooks.slack.com/services/**********/**********/webhook-for-devs"
services:
  - name: relevant-to-managers
    url: "https://example-for-managers.org/"
    alerts:
      - type: slack
        enabled: true
        target: manager
    conditions:
      - "[STATUS] == 200"

  - name: relevant-to-devs
    url: "https://example-for-devs.org/"
    alerts:
      - type: slack
        enabled: true
        target: devs
    conditions:
      - "[STATUS] == 200"

We could also add a default keyword on an alerting item which would make the target key optionnal.

Btw just discovered your work, it rocks !! Good job

Zeylos

@fixje
Copy link

fixje commented Aug 9, 2021

Across all alerting providers, there appears to always only one parameter that may need to change in order to support multiple recipients:
...
Custom: URL, but probably irrelevant

We have set up a web hook for Rocket.Chat using alerting.custom and want to target different channels/users. The channel is one property within the JSON request body. So this assumption is not true for our use case.

The most versatile approach would imho be someting like suggested by @zeylos - have the ability to configure multiple alerting "channels" and reference them by name. For convenience, the reference in the services section could be left out if there is just one channel for the given alerts.type

alerting:
  slack: 
    - name: manager
      webhook-url: "https://hooks.slack.com/services/**********/**********/webhook-for-managers"
    - name: devs
      webhook-url: "https://hooks.slack.com/services/**********/**********/webhook-for-devs"
  custom:
   - name: frontend-ops
     url: ...
   - name: backend-ops
     url: ...
  

services:
  - name: relevant-to-managers
    url: "https://example-for-managers.org/"
    alerts:
      - type: slack
        enabled: true
        target: manager
    conditions:
      - "[STATUS] == 200"

  - name: relevant-to-devs
    url: "https://example-for-devs.org/"
    alerts:
      - type: slack
        enabled: true
        target: devs
    conditions:
      - "[STATUS] == 200"

  - name: relevant-to-fe-ops
    url: "https://example-for-devs.org/"
    alerts:
      - type: custom
        enabled: true
        target: frontend-ops
    conditions:
      - "[STATUS] == 200"
  
  - name: relevant-to-be-ops
    url: "https://example-for-devs.org/"
    alerts:
      - type: custom
        enabled: true
        target: backend-ops
    conditions:
      - "[STATUS] == 200"

Great project btw!!

@TwiN
Copy link
Owner Author

TwiN commented Oct 10, 2021

FYI: #181 added support for the integrations parameter in PagerDuty, which essentially enables group-specific alerts.

I've renamed them to overrides, as that PR introduced them as integrations and I wanted to make it generic enough so that a similar change could be ported to other alerting providers.

The idea is that porting this to other alerting providers would enable something like this:

alerting:
  slack:
    webhook-url: "https://hooks.slack.com/services/**********/**********/default-webhook"
    overrides:
      - group: "frontend"
        webhook-url: "https://hooks.slack.com/services/**********/**********/frontend-webhook"
      - group: "backend"
        webhook-url: "https://hooks.slack.com/services/**********/**********/frontend-webhook"

endpoints:
  - name: "something-else"
    group: "misc"
    # ...

  - name: "home"
    group: "frontend"
    # ...

  - name: "api-health"
    group: "backend"
    # ...

While this doesn't offer the same flexibility as allowing an override at the service level, this should at least solve some use cases.

@TwiN TwiN changed the title Implement alerting provider override from the alert configuration Allow more than one configuration per alerting provider Oct 10, 2021
@TwiN TwiN changed the title Allow more than one configuration per alerting provider Support more than one configuration per alerting provider Oct 10, 2021
appleboy added a commit to appleboy/gatus that referenced this issue Mar 19, 2022
Add group-specific to list for email alert

TwiN#96

Signed-off-by: Bo-Yi Wu <[email protected]>
appleboy added a commit to appleboy/gatus that referenced this issue Mar 20, 2022
Add group-specific webhook URL for teams alert

ref: TwiN#96

Signed-off-by: Bo-Yi Wu <[email protected]>
TwiN pushed a commit that referenced this issue Mar 21, 2022
* feat(alert): Add group-specific to email list

Add group-specific to list for email alert

#96

Signed-off-by: Bo-Yi Wu <[email protected]>

* docs: update

Signed-off-by: Bo-Yi Wu <[email protected]>

* Update README.md

* Update README.md

* Update README.md

* chore: update

Signed-off-by: Bo-Yi Wu <[email protected]>

* Update README.md
TwiN pushed a commit that referenced this issue Mar 24, 2022
* feat(alert): Add group-specific webhook URL for teams

Add group-specific webhook URL for teams alert

ref: #96

Signed-off-by: Bo-Yi Wu <[email protected]>

* Update README.md

* Update README.md
appleboy added a commit to appleboy/gatus that referenced this issue Mar 25, 2022
Add group-specific webhook URL for discord alert

Provides support for paging multiple Discords based on the group selector while keeping backward compatibility to the old Discords configuration manifest

integration per team can be specified in the overrides sections in an array form.

ref: TwiN#96

Signed-off-by: Bo-Yi Wu <[email protected]>
appleboy added a commit to appleboy/gatus that referenced this issue Mar 25, 2022
Add group-specific webhook URL for Google Chat

ref: TwiN#96

Signed-off-by: Bo-Yi Wu <[email protected]>
TwiN added a commit that referenced this issue Apr 12, 2022
* feat(alerting): Add group-specific webhook URL for discord

Add group-specific webhook URL for discord alert

Provides support for paging multiple Discords based on the group selector while keeping backward compatibility to the old Discords configuration manifest

integration per team can be specified in the overrides sections in an array form.

ref: #96

Signed-off-by: Bo-Yi Wu <[email protected]>

* docs: update

Signed-off-by: Bo-Yi Wu <[email protected]>

* Update README.md

* Update README.md

* Update alerting/provider/discord/discord.go

Co-authored-by: TwiN <[email protected]>

* Update README.md

Co-authored-by: TwiN <[email protected]>

* test: revert testing name

* Update alerting/provider/discord/discord_test.go

Co-authored-by: TwiN <[email protected]>

Co-authored-by: TwiN <[email protected]>
@TwiN TwiN added the area/alerting Related to alerting label Feb 1, 2023
@TwiN TwiN changed the title Support more than one configuration per alerting provider Support alerting provider overrides in alert configuration Feb 1, 2023
@TwiN
Copy link
Owner Author

TwiN commented Feb 1, 2023

Here's an example of something that might work better:

For example:

alerting:
  slack:
    webhook-url: "https://hooks.slack.com/services/aaaaaaaaaaaaaaaaaaaaaaaaaaa"


endpoints:
  - name: "example-with-no-override"
    alerts:
      - type: slack

  - name: "example-with-override"
    alerts:
      - type: slack
        config:
          webhook-url: "https://hooks.slack.com/services/bbbbbbbbbbbbbbbbbbbbbbbbbbbbb"

Basically, endpoints[].alerts[].config could be merged with alerting.<provider>.

@TwiN TwiN added the good first issue Good for newcomers label Feb 1, 2023
@bishtawi
Copy link
Contributor

Looking at the PRs #272, #271, #266, and #264, looks like its being implemented in an adhoc fashion?

This is unfortunate as there are a lot of alert providers and the ones I need (specifically Ntfy) have not been updated to support overrides.

Additionally for certain alert providers (like Ntfy and email), you often need to overwrite multiple fields. Take email, currently you can only overwrite the to field but I could definitely see situations where you need different instances of the email alert provider configured with different host/username/password/from values. Its not just the to field that is needs to be different but potentially other fields as well.

In my case with Ntfy, overriding both the topic and the click field would be great. And I could see situations where the url and token also need overriding.

In other words, I think supporting multiple independent instances of an alert type is a more robus solution than just implementing a single field override for select alert providers.

@TwiN
Copy link
Owner Author

TwiN commented Nov 28, 2024

@bishtawi The PRs you listed implement these at the group level, whereas this issue suggests implementing them at the alert level.

I'd also like to point out that there is no limitation on the number of fields that could be overridden. For instance, Telegram's override has two fields - id and token.

The truth is that I don't personally use all the alerting providers that are implemented in Gatus, as such, I delegate the implementation of overrides and such to those that use it.

I do welcome you to implement group-level overrides for ntfy, if you need it, though.
As for the feature that this specific issue suggests, I haven't had the time to work on this yet.

@bishtawi
Copy link
Contributor

bishtawi commented Nov 29, 2024 via email

@TwiN
Copy link
Owner Author

TwiN commented Dec 7, 2024

Okay, this is happening. I'm starting the work for this tonight.
I might also implement it in a way that forces the implementation of overrides in all current and future alerting providers.

@TwiN TwiN self-assigned this Dec 7, 2024
@TwiN TwiN pinned this issue Dec 8, 2024
@TwiN
Copy link
Owner Author

TwiN commented Dec 8, 2024

Despite my initial enthusiasm, things aren't going as well as I'd have liked.

The big problem is that Gatus is built in a way that allows its configuration to fail fast. In other words, on start, if the configuration is invalid, it'll crash right away. This is to provide the best user experience possible; people want their applications to run properly if they start properly, as opposed to crashing, for instance, 20 hours after starting.

Now this alert override makes this much harder, because the alerts themselves live under the endpoint, while the providers' configuration live completely separately. The provider takes in the alert and "sends" (triggers/resolves) it.

For this to work, the provider would now need to take in the alert, modify its own configuration, and then "send" the alert.

Right now, even the group override is very simple, plus, it lives directly in the provider as opposed to the alert overrides which would live under endpoint, which means I have to find sneaky ways to avoid circular dependencies.

Anyways, I digress, and I think code speaks louder than words, so for those interested, here's where I am right now:

func (provider *AlertProvider) getConfigWithOverrides(group string, alert *alert.Alert) (*Config, error) {
	// TODO: Caching? would need to pass the endpoint's key so that I can create a working cache key
	cfg := provider.Config
	// First, we handle group overrides
	if provider.Overrides != nil {
		for _, override := range provider.Overrides {
			if group == override.Group {
				if len(override.AccessKeyID) > 0 {
					cfg.AccessKeyID = override.AccessKeyID
				}
				if len(override.SecretAccessKey) > 0 {
					cfg.SecretAccessKey = override.SecretAccessKey
				}
				if len(override.Region) > 0 {
					cfg.Region = override.Region
				}
				if len(override.From) > 0 {
					cfg.From = override.From
				}
				if len(override.To) > 0 {
					cfg.To = override.To
				}
				break
			}
		}
	}
	// Second, we handle the alert's overrides
	if len(alert.Override) != 0 {
		cfgAsYAML, err := yaml.Marshal(cfg)
		if err != nil {
			return nil, fmt.Errorf("failed to marshal provider config with group overrides to YAML: %w", err)
		}
		alertOverridesAsYAML, err := yaml.Marshal(alert.Override)
		if err != nil {
			return nil, fmt.Errorf("failed to marshal alert overrides to YAML: %w", err)
		}
		mergedCfg, err := deepmerge.YAML(cfgAsYAML, alertOverridesAsYAML)
		if err != nil {
			return nil, fmt.Errorf("failed to merge config with alert overrides: %w", err)
		}
		if err = yaml.Unmarshal(mergedCfg, &cfg); err != nil {
			return nil, fmt.Errorf("failed to unmarshal merged config: %w", err)
		}
	}
	return &cfg, nil
}

The big concern here is while this implementation tries to keep things as simple as possible (because I don't want to make creating an alerting provider extremely complicated to add, otherwise I'll put myself in a situation where I'm the only one that can create new providers and it's hard for people to contribute).

I'm just rambling at this point and not really expecting any answers, but man, things would be so much easier if I could work on this full time as opposed to juggling this and work 😩

One day, maybe one day, I'll be able to focus on my passion without worrying about paying the bills.


TL;DR: I'm working on it, but it's a little more complicated that I anticipated to design it in a way that wouldn't make it too difficult for new contributors to add new alerting providers in the future.

@TwiN
Copy link
Owner Author

TwiN commented Dec 14, 2024

The bulk of the implementation is done @ ffce15d (#929). I still need to update all tests and then spot test a handful of providers to make sure the changes work as intended, but that'll be for another day.

@TwiN
Copy link
Owner Author

TwiN commented Dec 16, 2024

Looking good so far!

This is what the configuration will look like:

endpoints:
  - name: example
    url: "https://example.com"
    conditions:
      - "[STATUS] == 200"
    alerts:
      - type: discord
        provider-override:
          webhook-url: https://discord.com/api/webhooks/someother/webhook

TwiN added a commit that referenced this issue Dec 17, 2024
* feat(alerting): Implement alert-level provider overrides

Fixes #96

* Fix tests

* Add missing test cases for alerting providers

* feat(alerting): Implement alert-level overrides on all providers

* chore: Add config.yaml to .gitignore

* fix typo in discord provider

* test: Start fixing tests for alerting providers

* test: Fix GitLab tests

* Fix all tests

* test: Improve coverage

* test: Improve coverage

* Rename override to provider-override

* docs: Mention new provider-override config

* test: Improve coverage

* test: Improve coverage

* chore: Rename Alert.OverrideAsBytes to Alert.ProviderOverrideAsBytes
@TwiN TwiN closed this as completed in #929 Dec 17, 2024
@TwiN
Copy link
Owner Author

TwiN commented Dec 18, 2024

@zeylos, @philippe-granet, @tiwood, @allanviana-logcomex, @sashkab, @Coolomina, @pollosp, @dedo1911, @umputun @GithubUser5462 I apologize for ping, but since you folks have expressed your interest by reacting with 👍 to this issue, I assume it should should be acceptable for me to ping. If you no longer use Gatus or are not interested, please disregard this message.

For those of you still using Gatus and interested in this feature, it is now available under the latest tag, and given the complexity of the implementation of this feature, I feel compelled to test this feature a bit more thoroughly. Unfortunately, I don't personally use all alerting providers that Gatus offers.

If some of you have the time to spare, I'd very much appreciate if you could try the new provider-override feature with whatever alerting provider you use.

Here's an example for Discord:

alerting:
  discord:
    webhook-url: "https://discord.com/api/webhooks/11111111..."

One for GitHub:

alerting:
  github:
    repository-url: "https://github.com/TwiN/first-repo"
    token: "github_pat_123456..."

endpoints:
  - name: example
    url: "https://example.org"
    interval: 5s
    conditions:
      - "[STATUS] == 200"
      - "[RESPONSE_TIME] < 5"
    alerts:
      - type: discord
        description: "alert going to the default webhook-url"
        send-on-resolved: true
      - type: discord
        description: "alert going to the overridden webhook-url"
        send-on-resolved: true
        provider-override:
          webhook-url: "https://discord.com/api/webhooks/222222222..."
      - type: github
        description: "alert going to the default repository-url"
        send-on-resolved: true
      - type: github
        description: "alert going to the overridden repository-url"
        send-on-resolved: true
        provider-override:
          repository-url: "https://github.com/TwiN/second-repo"
          token: "github_pat_example-of-some-other-token-if-your-default-token-does-not-have-access-to-the-second-repo"

If you don't have the capacity to help, don't worry, no hard feelings :)

Happy Holidays.

@pollosp
Copy link

pollosp commented Dec 20, 2024

Hey N.P and thanks for your work in this tool, I think I could help you with slack and pagerduty , which are what we are currently using

@TwiN TwiN unpinned this issue Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/alerting Related to alerting feature New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants