-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support script checks for task group services #6197
Conversation
d8cab24
to
46c6a97
Compare
4a1b943
to
bdfd436
Compare
// updateTTL updates the state to Consul, performing an expontential backoff | ||
// in the case where the check isn't registered in Consul to avoid a race between | ||
// service registration and the first check. | ||
func (s *scriptCheck) updateTTL(ctx context.Context, id, msg, state string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewers:
Because we don't thread the update TTL through the same internal state proxy as we do all other Consul operations, it's possible for the update TTL to fire before it's registered. The window for this race is very small in the existing implementation, but this PR widens the window. By backing off here we work around the race condition. In future work we'll look into threading the update TTLs through the internal state proxy for Consul.
@@ -0,0 +1,161 @@ | |||
package taskrunner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewers:
Much of the code in this file has been moved from command/agent/consul/script.go
} | ||
|
||
// closes over the script check and returns the taskletCallback for | ||
// when the script check executes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewers:
Much of the code in this function has been moved from command/agent/consul/script.go
// associated with the task. If so, we'll create scriptCheck tasklets | ||
// for them. The group-level service and any check restart behaviors it | ||
// needs are entirely encapsulated within the group service hook which | ||
// watches Consul for status changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewers:
This block is the only additional change we need to support Task Group services, so it's encapsulated in fe97b91
@schmichael @nickethier I've still got |
bdfd436
to
0c867af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great at first glance, but I didn't make it too far in. Will pick it back up in the morning.
scriptChecks[sc.id] = sc | ||
} | ||
} | ||
if len(scriptChecks) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Updates may add services with script checks, I don't think you can completely disable the hook here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me... but now I'm not sure I understand how we add other hooks on update in that case. Ex. we only create new service hooks at task_runner_hooks.go#L98-L107
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Welp that looks like a bug
6e1caa9
to
3a92293
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good 🎉
de1c9d4
to
9aecd1e
Compare
18f90b4
to
de388c2
Compare
I've addressed the open comments and rebased this on master with 2 clean commits (the script refactor + the task group scripts). |
I'm not worried about the deploy-preview, as that's worked fine previously and looks to be a git pulling issue at their end. |
de388c2
to
0c9f61c
Compare
In Nomad prior to Consul Connect, all Consul checks work the same except for Script checks. Because the Task being checked is running in its own container namespaces, the check is executed by Nomad in the Task's context. If the Script check passes, Nomad uses the TTL check feature of Consul to update the check status. This means in order to run a Script check, we need to know what Task to execute it in. To support Consul Connect, we need Group Services, and these need to be registered in Consul along with their checks. We could push the Service down into the Task, but this doesn't work if someone wants to associate a service with a task's ports, but do script checks in another task in the allocation. Because Nomad is handling the Script check and not Consul anyways, this moves the script check handling into the task runner so that the task runner can own the script check's configuration and lifecycle. This will allow us to pass the group service check configuration down into a task without associating the service itself with the task.
When tasks are checked for script checks, we walk back through their task group to see if there are script checks associated with the task. If so, we'll spin off script check tasklets for them. The group-level service and any restart behaviors it needs are entirely encapsulated within the group service hook.
0c9f61c
to
fb13c5b
Compare
fb0f506
to
124bbc8
Compare
124bbc8
to
363b923
Compare
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Fixes #6162
In Nomad prior to Consul Connect, all Consul checks work the same except for Script checks. Because the Task being checked is running in its own container namespace, the check is executed by Nomad in the Task's context. If the Script check passes, Nomad uses the TTL check feature of Consul to update the check status. This means in order to run a Script check, we need to know what Task to execute it in.
To support Consul Connect, we need Group Services, and these need to be registered in Consul along with their checks. We could push the Service down into the Task, but this doesn't work if someone wants to associate a service with a task's ports, but do script checks in another task in the allocation.
Because Nomad is handling the Script check and not Consul anyways, this moves the script check handling into the task runner so that the task runner can own the script check's configuration and lifecycle. This will allow us to pass the group service check configuration down into a task without associating the service itself with the task.
This is still a very early draft PR for discussion of the overall approach.
TODO (at least!):
CheckRestart
command/agent/consul