Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sync status label to deployments #8622

Merged
merged 9 commits into from
Apr 15, 2021
Merged

Conversation

deepthiskumar
Copy link
Member

@deepthiskumar deepthiskumar commented Apr 12, 2021

Added a new label syncStatus to deployments whose value gets updated as part of the readiness health check. This label can then be used in grafana dashboard and/or in alerting rules to filter metrics from nodes with a specified sync status.

Here's an example dashboard (in edit mode so you can see the queries) from a private testnet deployed with the changes from this PR.
https://o1testnet.grafana.net/d/qx4y6dfWz/network-overview?editPanel=27&orgId=1&refresh=1m&var-testnet=test-labels&from=now-30m&to=now

The initial status when deploying is INIT which then gets updated to one of following statuses of the daemon:

  1. CONNECTING
  2. LISTENING
  3. OFFLINE
  4. BOOTSTRAP
  5. SYNCED
  6. CATCHUP

Should this be off of develop?
I'm not sure if it is incompatible against existing deployments. In any case we'll want to gradually upgrade all the nodes of existing networks(mainnet/devnet)

After approved, I'll update the alert expressions as described in #8522

@deepthiskumar deepthiskumar requested a review from a team as a code owner April 12, 2021 22:49
@deepthiskumar deepthiskumar added the ci-build-me Add this label to trigger a circle+buildkite build for this branch label Apr 12, 2021
@@ -10,6 +10,8 @@ RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key --keyri

RUN apt-get update && apt-get install -y google-cloud-sdk

RUN apt-get update && apt-get install -y kubectl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should move to dockerfiles/Dockerfile-coda-daemon if it's going to be required for health checks

Copy link
Contributor

@lk86 lk86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment but largely LGTM. Upgrading the existing clusters to use it is nontrivial but that can be handled in a future PR against compatible

Copy link
Member

@nholland94 nholland94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 lgtm

@mrmr1993 mrmr1993 merged commit 524cfcd into compatible Apr 15, 2021
@mrmr1993 mrmr1993 deleted the fix/sync-status-label branch April 15, 2021 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-build-me Add this label to trigger a circle+buildkite build for this branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants