Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task with template stanza failing to restart after template was updated #2613

Closed
Hellspam opened this issue May 4, 2017 · 3 comments
Closed

Comments

@Hellspam
Copy link

Hellspam commented May 4, 2017

Nomad version

Nomad v0.5.5
Consul v0.7.2

Operating system and Environment details

CentOS Linux release 7.3.1611 (Core)

Issue

I have a backend server that takes a couple of minutes to startup and be healthy.
After the healthcheck is completed, the nginx task is restarted to add this server to the upstreams, but the task fails. I have attached the job file.

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

May 04 12:11:55 nomad[169508]: 2017/05/04 12:11:55.849959 [DEBUG] client: restarting task nginx for alloc "69aaa532-ac22-1e38-8743-609b4874370d": consul-template: template with change_mode restart re-rendered
May 04 12:11:55 nomad[169508]: 2017/05/04 12:11:55.850017 [DEBUG] client: task being restarted: consul-template: template with change_mode restart re-rendered
May 04 12:11:56 nomad[169508]: 2017/05/04 12:11:56.037964 [DEBUG] client: updated allocations at index 2657057 (total 3) (pulled 0) (filtered 3)
May 04 12:11:56 nomad[169508]: 2017/05/04 12:11:56.038087 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 3)
May 04 12:11:58 nomad[169508]: 2017/05/04 12:11:58.837117 [DEBUG] http: Request /v1/client/allocation/69aaa532-ac22-1e38-8743-609b4874370d/stats (344.814µs)
May 04 12:11:59 nomad[169508]: 2017/05/04 12:11:59.960729 [DEBUG] client: restarting task nginx for alloc "69aaa532-ac22-1e38-8743-609b4874370d": consul-template: template with change_mode restart re-rendered
May 04 12:12:00 nomad[169508]: 2017/05/04 12:12:00.909321 [DEBUG] http: Request /v1/client/allocation/69aaa532-ac22-1e38-8743-609b4874370d/stats (380.686µs)
May 04 12:12:01 nomad[169508]: 2017/05/04 12:12:01.071495 [INFO] driver.docker: stopped container 21742f33d076e47e8ff084740307fa47940a0473b12ea36a00c05c47a6eeed78
May 04 12:12:01 nomad[169508]: 2017/05/04 12:12:01.071499 [DEBUG] driver.docker: error collecting stats from container 21742f33d076e47e8ff084740307fa47940a0473b12ea36a00c05c47a6eeed78: io: read/write on closed pipe
May 04 12:12:01 nomad[169508]: 2017/05/04 12:12:01 [DEBUG] plugin: /usr/local/bin/nomad: plugin process exited
May 04 12:12:01 nomad[169508]: 2017/05/04 12:12:01.122339 [INFO] client: Restarting task "nginx" for alloc "69aaa532-ac22-1e38-8743-609b4874370d" in 0s
May 04 12:12:01 nomad[169508]: 2017/05/04 12:12:01.122788 [DEBUG] client: task being restarted: consul-template: template with change_mode restart re-rendered
May 04 12:12:01 nomad[169508]: panic: close of nil channel
May 04 12:12:01 nomad[169508]: goroutine 92630 [running]:
May 04 12:12:01 nomad[169508]: github.com/hashicorp/nomad/client.(*TaskRunner).run(0xc42048c160)
May 04 12:12:01 nomad[169508]: /opt/gopath/src/github.com/hashicorp/nomad/client/task_runner.go:972 +0xa9e
May 04 12:12:01 nomad[169508]: github.com/hashicorp/nomad/client.(*TaskRunner).Run(0xc42048c160)
May 04 12:12:01 nomad[169508]: /opt/gopath/src/github.com/hashicorp/nomad/client/task_runner.go:442 +0x556
May 04 12:12:01 nomad[169508]: created by github.com/hashicorp/nomad/client.(*AllocRunner).Run
May 04 12:12:01 nomad[169508]: /opt/gopath/src/github.com/hashicorp/nomad/client/alloc_runner.go:507 +0x8e2
May 04 12:12:01 systemd[1]: nomad.service: main process exited, code=exited, status=2/INVALIDARGUMENT
May 04 12:12:01 systemd[1]: Unit nomad.service entered failed state.
May 04 12:12:01 systemd[1]: nomad.service failed.
May 04 12:12:43 systemd[1]: nomad.service holdoff time over, scheduling restart.
May 04 12:12:43 systemd[1]: Started Nomad agent.

Job file (if appropriate)

  group "trc" {
  count = 1
  restart {
    mode = "fail"
  }

  task "trc" {
    driver = "docker"

    config {
      image = "***/taboola/trc:latest"
      port_map {
        trc = 10213
        actuator = 8091
        debug = 8000
      }
    }

    service {
      name = "trc"
      tags = [
        "trc",
        "frontend",
        "alloc-${NOMAD_ALLOC_ID}"]
      port = "trc"
      check {
        type = "tcp"
        port = "trc"
        interval = "10s"
        timeout = "2s"
      }
      check {
        type = "script"
        name = "init"
        command = "/trc-healthcheck.sh"
        interval = "10s"
        timeout = "2s"
      }
    }
    service {
      name = "trc-actuator"
      tags = [
        "trc",
        "management"]
      port = "actuator"
      check {
        type = "tcp"
        port = "actuator"
        interval = "10s"
        timeout = "2s"
      }
    }
    env {
//
    }

    resources {
      cpu = 10000
      memory = 16536
      network {
        mbits = 1
        port "trc" {
        }
        port "actuator" {
        }
        port "debug" {
        }
      }
    }
  }
}
  group "nginx" {
  task "nginx" {
    driver = "docker"

    config {
      image = "****/taboola/trc-nginx:latest"
      port_map {
        nginx_http = 80
        nginx_ssl = 443
      }
    }

    template {
      data = <<EOH
        upstream trc {
        least_conn;
        {{ range service "trc" -}}
          server {{.Address}}:{{.Port}}; #{{.Node}}
        {{- end }}
          keepalive 100;
        }
      EOH
      destination = "local/trc.conf"
      change_mode = "restart"
    }

    service {
      name = "trc-nginx-http"
      tags = [
        "trc",
        "nginx",
        "http",
        "frontend"]
      port = "nginx_http"
      check {
        type = "tcp"
        port = "nginx_http"
        interval = "10s"
        timeout = "2s"
      }
    }
    service {
      name = "trc-nginx-ssl"
      tags = [
        "trc",
        "nginx",
        "ssl",
        "frontend"]
      port = "nginx_ssl"
      check {
        type = "tcp"
        port = "nginx_ssl"
        interval = "10s"
        timeout = "2s"
      }
    }
    resources {
      cpu = 1000
      memory = 4000
      network {
        mbits = 1
        port "nginx_http" {
        }
        port "nginx_ssl" {
        }
      }
    }
  }
}
@dadgar
Copy link
Contributor

dadgar commented May 4, 2017

Hey this is fixed in v0.5.6 by PR #2480! Sorry you ran into this!

@dadgar dadgar closed this as completed May 4, 2017
@Hellspam
Copy link
Author

Hellspam commented May 8, 2017

Thanks! After upgrade, issue does not appear anymore.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants