-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sync status label to deployments #8622
Conversation
automation/bake/Dockerfile
Outdated
@@ -10,6 +10,8 @@ RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key --keyri | |||
|
|||
RUN apt-get update && apt-get install -y google-cloud-sdk | |||
|
|||
RUN apt-get update && apt-get install -y kubectl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should move to dockerfiles/Dockerfile-coda-daemon if it's going to be required for health checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment but largely LGTM. Upgrading the existing clusters to use it is nontrivial but that can be handled in a future PR against compatible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 lgtm
Added a new label
syncStatus
to deployments whose value gets updated as part of the readiness health check. This label can then be used in grafana dashboard and/or in alerting rules to filter metrics from nodes with a specified sync status.Here's an example dashboard (in edit mode so you can see the queries) from a private testnet deployed with the changes from this PR.
https://o1testnet.grafana.net/d/qx4y6dfWz/network-overview?editPanel=27&orgId=1&refresh=1m&var-testnet=test-labels&from=now-30m&to=now
The initial status when deploying is
INIT
which then gets updated to one of following statuses of the daemon:CONNECTING
LISTENING
OFFLINE
BOOTSTRAP
SYNCED
CATCHUP
Should this be off of develop?
I'm not sure if it is incompatible against existing deployments. In any case we'll want to gradually upgrade all the nodes of existing networks(mainnet/devnet)
After approved, I'll update the alert expressions as described in #8522