Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Built-In Redundancy Monitoring for GoAlert Instances #3323

Open
mastercactapus opened this issue Sep 29, 2023 · 3 comments
Open

Built-In Redundancy Monitoring for GoAlert Instances #3323

mastercactapus opened this issue Sep 29, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@mastercactapus
Copy link
Member

What problem would you like to solve? Please describe:
There is a need for a feature to facilitate the monitoring of a GoAlert production instance employing distinct installations of GoAlert. While certain elements of this process are partially implemented, they are not documented, complex, and possibly outdated.

Describe the solution you'd like:
A new admin page for Remote Monitor to streamline the configuration and operation of these monitoring services. Features required:

  • Auto-creation of a new monitor service with a webhook, pointing back to the main instance.
  • Auto-sync or manual sync options for updating the monitor environments from the primary GoAlert instance.
  • Enable the syncing of on-call users from the primary instance to the monitor installations. Consider a solution for user contact method validation since notifications may come from a new number.

Describe alternatives you've considered:
The current procedure involves a command, 'monitor', and the 'remotemonitor' package, to create and manage added complexity redundancy measures.

Additional context:
This feature request caters primarily to the admins responsible for a GoAlert installation rather than the application's end users.

@mastercactapus mastercactapus added the enhancement New feature or request label Sep 29, 2023
@mastercactapus
Copy link
Member Author

A good experience will rely heavily on #3007

@mastercactapus
Copy link
Member Author

mastercactapus commented Oct 4, 2023

Current state reference info

Current check sequence

d2 (10)

Current Check Deployment Example

d2 (11)


Worth noting that the current operation requires remote monitor to be deployed with a config file with various required settings, it's own twilio number, and be publicly-routable

@mastercactapus
Copy link
Member Author

For next steps:

MVP: minimal API/db additions to allow create alert -> heartbeat using webhook on an EP

+1: auto sync on-call users
+1: auto config from production/main instance

+2: "sentinel mode" -- slimmer UI, warning banner, make it obvious an instance is for monitoring another (with link)
+3: first time setup /w sentinel

Separate supporting feature ideas:


Implementation thoughts/notes:

  • have sentinels do as much work as possible (PULL config rather than push for on-call)
  • synthetic check: can I create an alert? did I get a notification? -> POST heartbeat(s)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant