-
Notifications
You must be signed in to change notification settings - Fork 1.3k
prometheus: bundle Alertmanager and siteConfig sync #11832
Conversation
commit bf833fe Merge: f32b042 f142c9c Author: Robert Lin <[email protected]> Date: Tue Jun 23 10:59:00 2020 +0800 Merge branch 'master' of github.com:sourcegraph/sourcegraph into distribution/grafana-smtp commit f32b042 Merge: 2327cd3 34bd181 Author: Robert Lin <[email protected]> Date: Tue Jun 23 09:53:35 2020 +0800 Merge branch 'master' of github.com:sourcegraph/sourcegraph into distribution/grafana-smtp commit 2327cd3 Author: Robert Lin <[email protected]> Date: Tue Jun 23 09:53:27 2020 +0800 remove todos commit 3d48e18 Author: Robert Lin <[email protected]> Date: Mon Jun 22 13:58:14 2020 +0800 make problem format more consistent commit c5ba7db Author: Robert Lin <[email protected]> Date: Mon Jun 22 13:54:49 2020 +0800 report if grafana is down as part of config-subscriber problems commit ef824fc Author: Robert Lin <[email protected]> Date: Mon Jun 22 10:32:58 2020 +0800 add default for alerts commit 180fd43 Author: Robert Lin <[email protected]> Date: Mon Jun 22 10:32:50 2020 +0800 disable external snapshots commit 9626e47 Author: Robert Lin <[email protected]> Date: Mon Jun 22 09:51:48 2020 +0800 add docstring commit c1c1545 Merge: ac5749e 716ceb7 Author: Robert Lin <[email protected]> Date: Mon Jun 22 09:49:29 2020 +0800 Merge branch 'master' of github.com:sourcegraph/sourcegraph into distribution/grafana-smtp commit ac5749e Merge: 9e819c4 c0c220b Author: Robert Lin <[email protected]> Date: Fri Jun 19 08:21:19 2020 +0800 Merge branch 'master' of github.com:sourcegraph/sourcegraph into distribution/grafana-smtp commit 9e819c4 Author: Robert Lin <[email protected]> Date: Fri Jun 19 08:21:09 2020 +0800 improve notifier change behaviour commit 4c4a7d2 Author: Robert Lin <[email protected]> Date: Thu Jun 18 11:47:06 2020 +0800 fix return discard commit 9a9be6b Author: Robert Lin <[email protected]> Date: Thu Jun 18 11:21:04 2020 +0800 address linter warnings commit 7242ff3 Merge: a2b07d5 ce911b0 Author: Robert Lin <[email protected]> Date: Thu Jun 18 10:37:56 2020 +0800 Merge branch 'master' of github.com:sourcegraph/sourcegraph into distribution/grafana-smtp commit a2b07d5 Author: Robert Lin <[email protected]> Date: Thu Jun 18 10:37:37 2020 +0800 add tests for grafana smtp change commit 7cc2fe0 Author: Robert Lin <[email protected]> Date: Wed Jun 17 14:38:32 2020 +0800 refactor to remove integration points commit 2f60a4d Author: Robert Lin <[email protected]> Date: Wed Jun 17 13:23:01 2020 +0800 improve docs commit 2981a3d Merge: 664ebe6 ed5a2ab Author: Robert Lin <[email protected]> Date: Wed Jun 17 13:21:06 2020 +0800 Merge branch 'master' of github.com:sourcegraph/sourcegraph into distribution/grafana-smtp commit 664ebe6 Author: Robert Lin <[email protected]> Date: Wed Jun 17 13:20:59 2020 +0800 implement smtp sync commit 2c10270 Author: Robert Lin <[email protected]> Date: Wed Jun 17 13:12:37 2020 +0800 regenerate stringdata commit 03da223 Author: Robert Lin <[email protected]> Date: Wed Jun 17 13:03:42 2020 +0800 dump grafana build output if build fails commit 267bca0 Author: Robert Lin <[email protected]> Date: Wed Jun 17 09:44:28 2020 +0800 improve docs commit 1d81518 Merge: 3ca4892 254e676 Author: Robert Lin <[email protected]> Date: Wed Jun 17 09:23:34 2020 +0800 Merge branch 'master' of github.com:sourcegraph/sourcegraph into distribution/grafana-smtp commit 3ca4892 Author: Robert Lin <[email protected]> Date: Tue Jun 16 15:31:24 2020 +0800 scaffold improvements to change application commit 7162a4c Author: Robert Lin <[email protected]> Date: Tue Jun 16 13:05:57 2020 +0800 restore email schema support
…ribution/prom-alerting
The notice doesn't add a lot of information to anyone who might receive the alert. Moved the disclaimer to PossibleSolutions instead
…ribution/prom-alerting
…ribution/prom-alerting
This is excellent, thanks for the write-up! I will take a look at the implementation and try this out tomorrow. |
I've fixed the point now :) |
It would be nice to have this happen automatically after saving changes to the alerting configuration - or even just a GraphQL query one can run in e.g. sourcegraph.com/api/console (but neither of this is needed for this PR and I don't want to increase its scope) One easy way to do it would be to declare a a Prometheus metric like |
this is very cool. i'll take a closer look tomorrow. |
Do you think cloud teams should review this PR? (come here via code owner) |
@unknwon I would say not necessary as Distribution owns this - but feel free to review if you like as usual of course. Will make sure CODEOWNERS file reflects this. |
Can we dog-food this for Sourcegraph Cloud? |
Yep! Once I finish this up, I'll update https://github.com/sourcegraph/deploy-sourcegraph-dot-com/pull/2839 for Cloud (Cloud == dot-com right?) |
Co-authored-by: Stephen Gutekanst <[email protected]>
Co-authored-by: Keegan Carruthers-Smith <[email protected]>
47c5a62
to
527eed6
Compare
Co-authored-by: Stephen Gutekanst <[email protected]>
Co-authored-by: Stephen Gutekanst <[email protected]>
…ourcegraph into distribution/prom-alerting
…ribution/prom-alerting
Co-authored-by: davejrt <[email protected]>
Context
For onlookers, some recent product-level context may help: this change is aiding in getting us to a point where we can push all site admins to turn on monitoring for their Sourcegraph instance by having them configure alerting through the Sourcegraph site configuration and having our monitoring stack automatically respond to that. This comment goes into a little more detail.
Changes
This PR completely removes the
grafana-wrapper
and functionality for deployingobservability.alerts
as Grafana alerts (effectively reverting it to vanilla Grafana), and replaces it with a newprom-wrapper
that bundles Prometheus + Alertmanager insourcegraph/prometheus
alongside the sameobservability.alerts
features added in #10641The rationale for this change is covered in the comments of https://github.com/sourcegraph/sourcegraph/issues/11452, but in a nutshell:
alert_count
rules so will give us the granularity we need, and routing makes it easy to send alerts to the appropriate notifiers (referred to in Alertmanager as "receivers")Other advantages:
prometheus:9090/alertmanager
- might come in handyCaveats:
notifier
fields on a kind of need-to-have basisio.Writer
, but only managed to find one simple but seemingly abandoned library,go-prefix-writer
, so I've opted not to introduce the dependency for nowalert_count
and Prometheus alertsBest reviewed as a whole and ignoring all the Grafana changes
Issues
grafana-wrapper
, and is now pretty much just vanilla Grafana. This implementation loads the initial configuration in a goroutine, so it won't be a problem hereprometheus:9090/alertmanager
, so users will be able to port-forward the Prometheus service and go to http://localhost:9090/alertmanager/#/silences to silence alerts out of the box (though I think explore solutions for "impossible to silence monitoring alerts" #11210 envisions a site-config based approach, so this doesn't fully resolve it yet)TODOs
trying to figure out a way around this short of copy-pasting the definitionsopened config: add MarshalSecrets flag prometheus/alertmanager#2316Non-goals