-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add healthz #722
add healthz #722
Conversation
d74069e
to
d66bf59
Compare
@eyakubovich @tomdee @zbwright Please take a look at the CI error. It seems nothing is related with the change. Seems there are more than one flannel instance is running simultaneous in |
@andyxning I'm not sure I see the benefit of this PR - what's it actually checking (except maybe that the Go runtime hasn't crashed) |
@tomdee Currently, flannel listen on udp port 8285 by default, we have no methods to do a health check about the status of flanneld daemon, i.e., whether it is running or stopped accidentally. We have encountered a situation where flanneld has been stopped accidentally for more than two days without any notice, and when we found this and restart it, the newly allocated subnet is different from the original one cause we have not renew the lease about the original subnet. This makes it possible that we have two nodes with the same subnet which is not acceptable. So, the original thought is we need a way to check whether flanneld is running or not. No much other logic. :) Maybe you want to add more real checks for the health check endpoint, such as where flanneld can connect to etcd successfully. This is more good. But, for now, we just want to check whether flanneld daemon is running or stopped. |
/ping @tomdee |
I'm not totally against this, but before I can merge it you need to add some documentation for it and ensure the tests pass. I also think it should be an optional feature - and probably off by default. |
@tomdee Will Do. |
1c53100
to
4a78a4d
Compare
@tomdee Done. PTAL. |
/ping @tomdee |
Friendly ping @tomdee |
@tomdee This health check is disable by default. I have not test whether it can pass all the tests when enabled by default. :) |
Friendly Ping @tomdee :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @andyxning I have a comment and question.
Documentation/configuration.md
Outdated
|
||
## Health Check | ||
|
||
Flannel provides an health check http endpoint `healthz`. Currently this endpoint will blindly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: should be "a health check"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
|
||
if err := http.ListenAndServe(address, nil); err != nil { | ||
log.Errorf("Start healthz server error. %v", err) | ||
panic(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why panic here? Is it standard practice to use panic with http.ListenAndServe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. panic
because this method is called mustRunHealthz
. According to golang convention, we should panic when we can not do some thing that must be done.
The reason for why we use mustRunHealthz
is because is think we should must start the health check server once we enable it. If not, we should panic or stop flannel to give user more clear result.
@tomdee WDYT.
@tomdee Many thanks. 👏 |
Fix #616
This PR will add a
healthz
endpoint in http for flanneld by default in0.0.0.0:8285
.Note: This PR is based on #632 .
/cc @eyakubovich @tomdee @zbwright