Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query all direct-connected peers for all seed nodes in watchdog #8688

Merged
merged 9 commits into from
Apr 27, 2021

Conversation

ghost-not-in-the-shell
Copy link
Contributor

@ghost-not-in-the-shell ghost-not-in-the-shell commented Apr 20, 2021

This PR changes how we query peers in watchdog and make_report.py.

The intention is that we would query all the seed nodes for all their directly connected peers. See #8638 for more details.

This PR also add some more error reporting for the make_report.py.

This PR also fixed the following kinds of errors of the watchdog process:

  1. various errors like
  File "/code/metrics.py", line 60, in pods_with_no_new_logs
    mina_containers = list(filter(lambda c: c.name in [ 'coda', 'seed', 'coordinator' ], containers))
TypeError: 'NoneType' object is not iterable

fixed by checking the status of the pods before doing anything

  1. handle errors from daemon
  File "<unknown>", line 1
    Error: Unable to connect to Mina daemon.
  1. reference to undefined variables

@ghost-not-in-the-shell ghost-not-in-the-shell changed the base branch from develop to compatible April 20, 2021 00:01
@ghost-not-in-the-shell ghost-not-in-the-shell changed the title [WIP] query all direct-connected peers for all seed nodes in watchdog query all direct-connected peers for all seed nodes in watchdog Apr 20, 2021
Copy link
Member

@nholland94 nholland94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, though we should probably get a second pair of eyes on this as I am not too familiar with the watchdog script.

Copy link
Contributor

@es92 es92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks effectively fine, if we're cool with in the short term seeds only querying their direct peers (better than what we do right now anyway, it would seem). I have some follow up questions on the long term strategy, but I'll follow up with those elsewhere

@deepthiskumar
Copy link
Member

LGTM. Could you update the version number in version.txt and do a make release to upload the new image? I'm waiting for a review on #8664 (which updates the version) but you can go ahead and update it to 0.4.8 and I'll change it in my PR

@deepthiskumar deepthiskumar added the ci-build-me Add this label to trigger a circle+buildkite build for this branch label Apr 22, 2021
@ghost-not-in-the-shell ghost-not-in-the-shell changed the base branch from compatible to fix/alert-improvements April 22, 2021 23:24
Base automatically changed from fix/alert-improvements to compatible April 23, 2021 00:08
@ghost-not-in-the-shell ghost-not-in-the-shell added ready-to-merge Adding this label will trigger mergify and trigger CI to run and merge the PR alerting and monitoring labels Apr 27, 2021
@mrmr1993 mrmr1993 merged commit ebde046 into compatible Apr 27, 2021
@mrmr1993 mrmr1993 deleted the node-status-metrics-query-direct-peers branch April 27, 2021 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alerting and monitoring ci-build-me Add this label to trigger a circle+buildkite build for this branch ready-to-merge Adding this label will trigger mergify and trigger CI to run and merge the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants