-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Health check routine leaks using new nomad provider #15477
Comments
Thanks for the report @thetooth, and apologies for the slow response. I was able to reproduce this with this simpler job and bash script. AFAICT the duplication in requests to the healthcheck happens on job "demo" {
datacenters = ["dc1"]
group "group1" {
network {
mode = "host"
port "http" {
static = 8888
}
}
reschedule {
unlimited = true
delay = "15s"
delay_function = "constant"
attempts = 0
}
restart {
attempts = 2
delay = "1s"
interval = "15s"
mode = "fail"
}
task "task1" {
driver = "raw_exec"
user = "shoenig"
config {
command = "python3"
args = ["-m", "http.server", "8888", "--directory", "/tmp"]
}
service {
provider = "nomad"
port = "http"
check{
path = "/"
type = "http"
interval = "3s"
timeout = "1s"
}
}
resources {
cpu = 500
memory = 256
}
}
}
}
|
This PR fixes a bug where alloc pre-kill hooks were not run in the edge case where there are no live tasks remaining, but it is also the final update to process for the (terminal) allocation. We need to run cleanup hooks here, otherwise they will not run until the allocation gets garbage collected (i.e. via Destroy()), possibly at a distant time in the future. Fixes #15477
* client: run alloc pre-kill hooks on last pass despite no live tasks This PR fixes a bug where alloc pre-kill hooks were not run in the edge case where there are no live tasks remaining, but it is also the final update to process for the (terminal) allocation. We need to run cleanup hooks here, otherwise they will not run until the allocation gets garbage collected (i.e. via Destroy()), possibly at a distant time in the future. Fixes #15477 * client: do not run ar cleanup hooks if client is shutting down
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v1.4.3 (f464aca)
Issue
I have a service that logs HTTP requests and noticed that the endpoint given for health checking is being executed a few hundred times per second. There is a pretty aggressive restart policy on this job and we had a netsplit issue last night which lead to the service restarting around 600 times, so the logs are quite busy to say the least.
Reproduction steps
Run the job below and either stop the job and resubmit or have the process crash. The number of requests hitting the service increase until nomad is restarted.
Job file (if appropriate)
The text was updated successfully, but these errors were encountered: