Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite template re-render loop when using {{ timestamp }} in two different templates of the same task #20618

Open
faryon93 opened this issue May 16, 2024 · 8 comments
Assignees
Labels
hcc/jira stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/template type/bug

Comments

@faryon93
Copy link

faryon93 commented May 16, 2024

Nomad version

Output from `nomad version`
Nomad v1.7.7
BuildDate 2024-04-16T19:26:43Z
Revision 0f34c85ee63f6472bd2db1e2487611f4b176c70c

Operating system and Environment details

$: cat /etc/debian_version 
12.5

$: uname -a
Linux nomad2 6.1.0-20-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.85-1 (2024-04-11) x86_64 GNU/Linux

$: podman version
Client:       Podman Engine
Version:      4.3.1
API Version:  4.3.1
Go Version:   go1.19.8
Built:        Thu Jan  1 01:00:00 1970
OS/Arch:      linux/amd64

Issue

When I use the {{ timestamp "unix"}} function in two differente template blocks of the same task and one of the templates uses other functions which can cause re-rendering (like {{ key }} or {{ service }}, etc. ) and a re-rendering is performed, the template gets re-rendered in an infinite loop.

The problem does not exist when the task has only one template.

Sidenote: When the task is stuck in the renderloop proper shutdown is not possible.

Reproduction steps

  1. create key foo/bar in consul with some random content
  2. run the jobfile from below
  3. wait until task is running
  4. change the key foo/bar to another value
  5. observe the Recent Events table of the task for re-occurring Template re-rendered messages

Expected Result

The template should be re-rendered exactly once.

Actual Result

The template gets re-rendered in an infinite loop every few seconds.

Screenshot_20240516_192701

Job file (if appropriate)

job "fileserver" {
  group "http" {
    count = 1

    network {
      mode = "bridge"
    }

    task "nginx" {
      driver = "podman"
      config {
        image = "docker.io/library/nginx:latest"
      }

      template {
        data = <<EOF
# Timestamp: {{ timestamp "unix" }}
# K/V: {{ key "foo/bar"}}
        EOF
        destination = "/local/test.txt"
        change_mode   = "signal"
        change_signal = "SIGHUP"
      }


      template {
        data = <<EOF
# Timestamp: {{ timestamp "unix" }}
        EOF
        destination = "/local/test1.txt"
        change_mode   = "signal"
        change_signal = "SIGHUP"
      }
    }
  }
}

Nomad Server / Client logs

2024-05-16T19:24:46.088+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Received msg="Task received by client" failed=false
2024-05-16T19:24:51.192+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type="Task Setup" msg="Building Task Directory" failed=false
2024-05-16T19:24:51.298+0200 [INFO]  agent: (runner) creating new runner (dry: false, once: false)
2024-05-16T19:24:51.299+0200 [INFO]  agent: (runner) creating watcher
2024-05-16T19:24:51.299+0200 [INFO]  agent: (runner) starting
2024-05-16T19:24:51.347+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test1.txt"
2024-05-16T19:24:51.419+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test.txt"
2024-05-16T19:24:51.754+0200 [INFO]  client.driver_mgr.nomad-driver-podman: Completely started container: driver=podman @module=podman container=4e213b233753ad26a521001cc646f92fb0e1ca796ee4a59d453fb568e33b37b9 ip=172.26.64.116 taskID=3a19ad33-e832-7205-167e-936dc7307a94/nginx/55e0c66a timestamp="2024-05-16T19:24:51.753+0200"
2024-05-16T19:24:51.757+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Started msg="Task started by client" failed=false
2024-05-16T19:25:25.847+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test1.txt"
2024-05-16T19:25:25.895+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test.txt"
2024-05-16T19:25:26.863+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Signaling msg="Template re-rendered" failed=false
2024-05-16T19:25:29.775+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Signaling msg="Template re-rendered" failed=false
2024-05-16T19:25:30.951+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test1.txt"
2024-05-16T19:25:35.858+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Signaling msg="Template re-rendered" failed=false
2024-05-16T19:25:35.941+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test.txt"
2024-05-16T19:25:38.577+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Signaling msg="Template re-rendered" failed=false
2024-05-16T19:25:40.997+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test1.txt"
2024-05-16T19:25:43.301+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Signaling msg="Template re-rendered" failed=false
2024-05-16T19:25:45.995+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test.txt"
2024-05-16T19:25:49.458+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Signaling msg="Template re-rendered" failed=false
2024-05-16T19:25:51.051+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test1.txt"
2024-05-16T19:25:52.674+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Signaling msg="Template re-rendered" failed=false
2024-05-16T19:25:56.052+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test.txt"
2024-05-16T19:25:56.056+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Signaling msg="Template re-rendered" failed=false
2024-05-16T19:26:01.109+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test1.txt"
2024-05-16T19:26:04.697+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Killing msg="Sent interrupt. Waiting 5s before force killing" failed=false
2024-05-16T19:26:04.699+0200 [INFO]  client.driver_mgr.nomad-driver-podman: Stopping task: driver=podman signal="" taskID=3a19ad33-e832-7205-167e-936dc7307a94/nginx/55e0c66a @module=podman timestamp="2024-05-16T19:26:04.699+0200"
2024-05-16T19:26:04.715+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=04eea9f3-7518-cec6-4808-1dbcb9020c37 task=nginx type=Received msg="Task received by client" failed=false
2024-05-16T19:26:05.811+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Terminated msg="Exit Code: 0" failed=false
2024-05-16T19:26:05.836+0200 [WARN]  client.driver_mgr.nomad-driver-podman: Could not remove container: driver=podman container=4e213b233753ad26a521001cc646f92fb0e1ca796ee4a59d453fb568e33b37b9 error="cannot delete container, status code: 200" @module=podman timestamp="2024-05-16T19:26:05.835+0200"
2024-05-16T19:26:05.836+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Killed msg="Task successfully killed" failed=false
2024-05-16T19:26:06.102+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Killing
    msg=
    | Template failed to send signals [hangup]: 1 error occurred:
    | \t* Task not running
    |
    failed=true
    2024-05-16T19:26:06.108+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test.txt"
2024-05-16T19:26:09.841+0200 [WARN]  client.alloc_runner.task_runner.task_hook.logmon.nomad: timed out waiting for read-side of process output pipe to close: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx @module=logmon timestamp="2024-05-16T19:26:09.841+0200"
2024-05-16T19:26:09.848+0200 [INFO]  client.alloc_runner.task_runner.task_hook.logmon: plugin process exited: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx plugin=/usr/bin/nomad id=1227172
2024-05-16T19:26:09.849+0200 [INFO]  agent: (runner) stopping
2024-05-16T19:26:09.849+0200 [INFO]  agent: (runner) received finish
2024-05-16T19:26:09.849+0200 [INFO]  client.gc: marking allocation for GC: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94
@pkazmierczak
Copy link
Contributor

Hi @faryon93, thanks for reporting this. I was able to reproduce, and not only with podman but also with docker driver. Indeed there's something fishy happening with the taskrunner here, and we'll look into it.

@pkazmierczak pkazmierczak added the stage/accepted Confirmed, and intend to work on. No timeline committment though. label May 21, 2024
@tgross tgross added the hcc/jira label Aug 6, 2024
@tgross tgross moved this from Triaging to Needs Roadmapping in Nomad - Community Issues Triage Aug 6, 2024
@matkinson-godaddy
Copy link

matkinson-godaddy commented Aug 16, 2024

The title of this issue may need to be updated, as it is not just timestamps that cause this to happen. It is anything that causes the source data to change triggering a rendering. This only happens if you have more than a single template.

Another Example:

job "template-test" {
  type = "service"

  group "test" {
    
    task "test" {
      driver = "docker"
      
      config {
        image = "bash"
        args = [
          "sleep",
          "infinity"
        ]
      }
      
      template {
        data = <<-EOH
          {{ with nomadVar "nomad/jobs/test" }}
          	{{ .test }}
          {{ end }}
        EOH
        
        destination = "local/test.txt"
        change_mode = "noop"
      }
      
      template {
        data = <<-EOH
        	test
        EOH
        
        destination = "local/test2.txt"
        change_mode = "noop"
      }
        
      resources {
         cpu = 17
         memory = 32
         memory_max = 128
      }
    }
  }
}

If we change the nomad/jobs/test variable, it starts to indefinitely re-render the templates.

The reason we found this bug is because it causes extremely high Nomad cpu usage when running a lot of containers. Looks like this maybe also related to all these issues:

hashicorp/consul-template#1427
hashicorp/consul-template#1140
#22431

A simple fix would be if we could just disable the quiescence timers in Nomad, but I have tried everything to pass null to the Wait times but nothing works. I think it's because of this code that will always assign a default:

https://github.com/hashicorp/nomad/blob/v1.7.7/client/config/config.go#L422-L425

Looking into the consul-template code and docs, if you pass null to the Wait config it will disable the quiescence timers.

@faryon93
Copy link
Author

faryon93 commented Aug 18, 2024

It is anything that causes the source data to change triggering a rendering. This only happens if you have more than a single template.

I cannot confirm that.

I just modified my example job file to use the key (consul k/v store) in the first template, a static text in the second template and no timestamp function in none of the two templates. Changing the given key in consul doesn't trigger the initially described re-rendering loop.

Even when using a nomadVar like you did, I cannot observe the re-rendering.

The problem when using timestamp still persists though.

In the meantime I upgraded my nomad cluster to 1.8.2, so this tests were conducted with that version.

@pkazmierczak is there any news on this topic?

@pkazmierczak
Copy link
Contributor

hey @faryon93, sorry this is taking a while. The issue is on our board, but sadly I can't give you an estimate or commit to a timeline. We'll try to fix soon.

@matkinson-godaddy
Copy link

matkinson-godaddy commented Aug 19, 2024

I cannot confirm that.

I tested this again on the latest version I can get v1.8.3 running in local dev:

nomad agent -dev -bind 0.0.0.0 -network-interface='{{ GetDefaultInterfaces | attr "name" }}'

I created the nomad/jobs/test variable, then started the example job I created. After the job is running I changed the nomad/jobs/test variable and then it got into the infinite loop. You can see this in the DEBUG logs, they look like this:

    2024-08-19T08:38:19.084-0700 [DEBUG] agent: (runner) received template "13801fe879ea8193cae420f91c2b3cfe" from quiescence
    2024-08-19T08:38:19.084-0700 [DEBUG] agent: (runner) initiating run
    2024-08-19T08:38:19.084-0700 [DEBUG] agent: (runner) checking template 13801fe879ea8193cae420f91c2b3cfe
    2024-08-19T08:38:19.085-0700 [DEBUG] agent: (runner) rendering "(dynamic)" => "/private/var/folders/1k/vd17sn3917g1v70zvwvthbsr0000gq/T/NomadClient51851860/99d41d44-799f-ae28-0fdd-0724e66f6dfe/test/local/test.txt"
    2024-08-19T08:38:19.121-0700 [DEBUG] agent: (runner) checking template d8e8fca2dc0f896fd7cb4cb0031ba249
    2024-08-19T08:38:19.121-0700 [DEBUG] agent: (runner) diffing and updating dependencies
    2024-08-19T08:38:19.121-0700 [DEBUG] agent: (runner) nomad.var.block(nomad/jobs/[email protected]) is still needed
    2024-08-19T08:38:19.121-0700 [DEBUG] agent: (runner) watching 1 dependencies
    2024-08-19T08:38:19.121-0700 [DEBUG] agent: (runner) all templates rendered
    2024-08-19T08:38:19.121-0700 [DEBUG] agent: (runner) enabling global quiescence for "13801fe879ea8193cae420f91c2b3cfe"

@pkazmierczak when you do the test, can you make sure you are getting DEBUG logs?

@tgross
Copy link
Member

tgross commented Dec 10, 2024

@mismithhisler see also #24638

@valeriansaliou
Copy link

valeriansaliou commented Dec 10, 2024

I've investigated and it definitely seems to originate from consul-template. I can share more logs here as I've simulated the case where rendering stabilizes (w/ 1 template) and where it never stabilizes (w/ 2 templates).

Those logs are generated with a local Nomad environment, all HCL files are provided here: #24638 (comment)

Working case: 1 template (this shows only once, a few seconds after updating the variable):

    2024-12-10T10:50:50.527-0300 [DEBUG] agent: (runner) received template "1b785af60a21ceb8d6f6ed075af2c32a" from quiescence
    2024-12-10T10:50:50.527-0300 [DEBUG] agent: (runner) initiating run
    2024-12-10T10:50:50.528-0300 [DEBUG] agent: (runner) checking template 1b785af60a21ceb8d6f6ed075af2c32a
    2024-12-10T10:50:50.529-0300 [DEBUG] agent: (runner) rendering "(dynamic)" => "/private/var/folders/l8/fcxz9sxx2rlctw_wmbmpx04r0000gn/T/NomadClient3633263354/6584aa10-0a4c-523b-f61f-82ea4878ab52/edge/local/first.txt"
    2024-12-10T10:50:50.641-0300 [DEBUG] agent: template-render sandbox "/private/var/folders/l8/fcxz9sxx2rlctw_wmbmpx04r0000gn/T/NomadClient3633263354/6584aa10-0a4c-523b-f61f-82ea4878ab52" not available: operation not permitted
    2024-12-10T10:50:50.641-0300 [INFO]  agent: (runner) rendered "(dynamic)" => "/private/var/folders/l8/fcxz9sxx2rlctw_wmbmpx04r0000gn/T/NomadClient3633263354/6584aa10-0a4c-523b-f61f-82ea4878ab52/edge/local/first.txt"
    2024-12-10T10:50:50.641-0300 [DEBUG] agent: (runner) diffing and updating dependencies
    2024-12-10T10:50:50.641-0300 [DEBUG] agent: (runner) nomad.var.block(nomad/jobs/[email protected]) is still needed
    2024-12-10T10:50:50.641-0300 [DEBUG] agent: (runner) watching 1 dependencies
    2024-12-10T10:50:50.641-0300 [DEBUG] agent: (runner) all templates rendered
    2024-12-10T10:50:50.641-0300 [DEBUG] agent: (runner) enabling global quiescence for "1b785af60a21ceb8d6f6ed075af2c32a"

Bogous case: 2 templates (this is where the problem appears -- the min wait has been changed from 5s to 15s in Nomad client config):

  1. Immediately after updating the variable:
    2024-12-10T11:54:18.526-0300 [DEBUG] http: request complete: method=PUT path="/v1/var/nomad%2Fjobs%2Fedge?cas=10" duration="961.916µs"
    2024-12-10T11:54:18.526-0300 [DEBUG] agent: (runner) receiving dependency nomad.var.block(nomad/jobs/[email protected])
    2024-12-10T11:54:18.526-0300 [DEBUG] agent: (runner) initiating run
    2024-12-10T11:54:18.526-0300 [DEBUG] agent: (runner) checking template 1b785af60a21ceb8d6f6ed075af2c32a
    2024-12-10T11:54:18.528-0300 [DEBUG] agent: (runner) checking template e0b5e4324c0de383b157d89b1e30e194
    2024-12-10T11:54:18.531-0300 [DEBUG] agent: (runner) diffing and updating dependencies
    2024-12-10T11:54:18.531-0300 [DEBUG] agent: (runner) nomad.var.block(nomad/jobs/[email protected]) is still needed
    2024-12-10T11:54:18.531-0300 [DEBUG] agent: (runner) watching 1 dependencies
    2024-12-10T11:54:18.531-0300 [DEBUG] agent: (runner) all templates rendered
  1. 15s seconds after updating the variable this shows:
    2024-12-10T10:54:48.145-0300 [DEBUG] agent: (runner) received template "e0b5e4324c0de383b157d89b1e30e194" from quiescence
    2024-12-10T10:54:48.145-0300 [DEBUG] agent: (runner) initiating run
    2024-12-10T10:54:48.145-0300 [DEBUG] agent: (runner) checking template e0b5e4324c0de383b157d89b1e30e194
    2024-12-10T10:54:48.146-0300 [DEBUG] agent: (runner) rendering "(dynamic)" => "/private/var/folders/l8/fcxz9sxx2rlctw_wmbmpx04r0000gn/T/NomadClient1787104242/fe048580-6c9a-dc10-0a82-cc84b3b3aa23/edge/local/second.txt"
    2024-12-10T10:54:48.232-0300 [DEBUG] agent: template-render sandbox "/private/var/folders/l8/fcxz9sxx2rlctw_wmbmpx04r0000gn/T/NomadClient1787104242/fe048580-6c9a-dc10-0a82-cc84b3b3aa23" not available: operation not permitted
    2024-12-10T10:54:48.232-0300 [DEBUG] agent: (runner) checking template 1b785af60a21ceb8d6f6ed075af2c32a
    2024-12-10T10:54:48.232-0300 [DEBUG] agent: (runner) diffing and updating dependencies
    2024-12-10T10:54:48.233-0300 [DEBUG] agent: (runner) nomad.var.block(nomad/jobs/[email protected]) is still needed
    2024-12-10T10:54:48.233-0300 [DEBUG] agent: (runner) watching 1 dependencies
    2024-12-10T10:54:48.233-0300 [DEBUG] agent: (runner) all templates rendered
    2024-12-10T10:54:48.233-0300 [DEBUG] agent: (runner) enabling global quiescence for "e0b5e4324c0de383b157d89b1e30e194"

    2024-12-10T10:54:48.233-0300 [DEBUG] agent: (runner) received template "1b785af60a21ceb8d6f6ed075af2c32a" from quiescence
    2024-12-10T10:54:48.233-0300 [DEBUG] agent: (runner) initiating run
    2024-12-10T10:54:48.233-0300 [DEBUG] agent: (runner) checking template e0b5e4324c0de383b157d89b1e30e194
    2024-12-10T10:54:48.233-0300 [DEBUG] agent: (runner) checking template 1b785af60a21ceb8d6f6ed075af2c32a
    2024-12-10T10:54:48.233-0300 [DEBUG] agent: (runner) rendering "(dynamic)" => "/private/var/folders/l8/fcxz9sxx2rlctw_wmbmpx04r0000gn/T/NomadClient1787104242/fe048580-6c9a-dc10-0a82-cc84b3b3aa23/edge/local/first.txt"
    2024-12-10T10:54:48.262-0300 [DEBUG] agent: template-render sandbox "/private/var/folders/l8/fcxz9sxx2rlctw_wmbmpx04r0000gn/T/NomadClient1787104242/fe048580-6c9a-dc10-0a82-cc84b3b3aa23" not available: operation not permitted
    2024-12-10T10:54:48.262-0300 [INFO]  agent: (runner) rendered "(dynamic)" => "/private/var/folders/l8/fcxz9sxx2rlctw_wmbmpx04r0000gn/T/NomadClient1787104242/fe048580-6c9a-dc10-0a82-cc84b3b3aa23/edge/local/first.txt"
    2024-12-10T10:54:48.262-0300 [DEBUG] agent: (runner) diffing and updating dependencies
    2024-12-10T10:54:48.262-0300 [DEBUG] agent: (runner) nomad.var.block(nomad/jobs/[email protected]) is still needed
    2024-12-10T10:54:48.262-0300 [DEBUG] agent: (runner) watching 1 dependencies
    2024-12-10T10:54:48.262-0300 [DEBUG] agent: (runner) all templates rendered
    2024-12-10T10:54:48.262-0300 [DEBUG] agent: (runner) enabling global quiescence for "1b785af60a21ceb8d6f6ed075af2c32a"
  1. then, it's flaky, switching every 15s between re-rendering template 1 and template 2:
    2024-12-10T10:55:03.234-0300 [DEBUG] agent: (runner) received template "e0b5e4324c0de383b157d89b1e30e194" from quiescence
    2024-12-10T10:55:03.234-0300 [DEBUG] agent: (runner) initiating run
    2024-12-10T10:55:03.234-0300 [DEBUG] agent: (runner) checking template e0b5e4324c0de383b157d89b1e30e194
    2024-12-10T10:55:03.246-0300 [DEBUG] agent: (runner) rendering "(dynamic)" => "/private/var/folders/l8/fcxz9sxx2rlctw_wmbmpx04r0000gn/T/NomadClient1787104242/fe048580-6c9a-dc10-0a82-cc84b3b3aa23/edge/local/second.txt"
    2024-12-10T10:55:03.343-0300 [DEBUG] agent: template-render sandbox "/private/var/folders/l8/fcxz9sxx2rlctw_wmbmpx04r0000gn/T/NomadClient1787104242/fe048580-6c9a-dc10-0a82-cc84b3b3aa23" not available: operation not permitted
    2024-12-10T10:55:03.343-0300 [DEBUG] agent: (runner) checking template 1b785af60a21ceb8d6f6ed075af2c32a
    2024-12-10T10:55:03.343-0300 [DEBUG] agent: (runner) diffing and updating dependencies
    2024-12-10T10:55:03.343-0300 [DEBUG] agent: (runner) nomad.var.block(nomad/jobs/[email protected]) is still needed
    2024-12-10T10:55:03.343-0300 [DEBUG] agent: (runner) watching 1 dependencies
    2024-12-10T10:55:03.343-0300 [DEBUG] agent: (runner) all templates rendered
    2024-12-10T10:55:03.343-0300 [DEBUG] agent: (runner) enabling global quiescence for "e0b5e4324c0de383b157d89b1e30e194"
  1. then 15s later it renders template 2:
    2024-12-10T10:55:18.345-0300 [DEBUG] agent: (runner) received template "1b785af60a21ceb8d6f6ed075af2c32a" from quiescence
    2024-12-10T10:55:18.345-0300 [DEBUG] agent: (runner) initiating run
    2024-12-10T10:55:18.345-0300 [DEBUG] agent: (runner) checking template e0b5e4324c0de383b157d89b1e30e194
    2024-12-10T10:55:18.347-0300 [DEBUG] agent: (runner) checking template 1b785af60a21ceb8d6f6ed075af2c32a
    2024-12-10T10:55:18.348-0300 [DEBUG] agent: (runner) rendering "(dynamic)" => "/private/var/folders/l8/fcxz9sxx2rlctw_wmbmpx04r0000gn/T/NomadClient1787104242/fe048580-6c9a-dc10-0a82-cc84b3b3aa23/edge/local/first.txt"
    2024-12-10T10:55:18.429-0300 [DEBUG] agent: template-render sandbox "/private/var/folders/l8/fcxz9sxx2rlctw_wmbmpx04r0000gn/T/NomadClient1787104242/fe048580-6c9a-dc10-0a82-cc84b3b3aa23" not available: operation not permitted
    2024-12-10T10:55:18.429-0300 [DEBUG] agent: (runner) diffing and updating dependencies
    2024-12-10T10:55:18.429-0300 [DEBUG] agent: (runner) nomad.var.block(nomad/jobs/[email protected]) is still needed
    2024-12-10T10:55:18.429-0300 [DEBUG] agent: (runner) watching 1 dependencies
    2024-12-10T10:55:18.429-0300 [DEBUG] agent: (runner) all templates rendered
    2024-12-10T10:55:18.429-0300 [DEBUG] agent: (runner) enabling global quiescence for "1b785af60a21ceb8d6f6ed075af2c32a"
  1. then it alternates between point 2 and point 3 forever, every 15s. Rendering template 1 and then template 2 in circles.

The issue seems to originate from the fact that in point 1, it receives each template independantly from quiescence. And therefore it starts 2 independant quiescence timers, where each timer are evaluating both templates, where only 1 is dynamic and contains a variable that changed.

@mismithhisler
Copy link
Member

@valeriansaliou You are absolutely correct this does originate from consul-template, and has to do with the quiescence timers, which are set by default in Nomad clients. We've identified the origin of this issue in consul-template, and are working on possible fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hcc/jira stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/template type/bug
Projects
Status: Needs Roadmapping
Development

No branches or pull requests

6 participants