Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support script checks for task group services #6197

Merged
merged 4 commits into from
Sep 3, 2019

Conversation

tgross
Copy link
Member

@tgross tgross commented Aug 23, 2019

Fixes #6162

In Nomad prior to Consul Connect, all Consul checks work the same except for Script checks. Because the Task being checked is running in its own container namespace, the check is executed by Nomad in the Task's context. If the Script check passes, Nomad uses the TTL check feature of Consul to update the check status. This means in order to run a Script check, we need to know what Task to execute it in.

To support Consul Connect, we need Group Services, and these need to be registered in Consul along with their checks. We could push the Service down into the Task, but this doesn't work if someone wants to associate a service with a task's ports, but do script checks in another task in the allocation.

Because Nomad is handling the Script check and not Consul anyways, this moves the script check handling into the task runner so that the task runner can own the script check's configuration and lifecycle. This will allow us to pass the group service check configuration down into a task without associating the service itself with the task.


This is still a very early draft PR for discussion of the overall approach.

TODO (at least!):

  • rebase from master to pick up all our 0.10 library changes
  • get all tests working
  • get the configuration for group services wired up
  • verify behavior around CheckRestart
  • remove the old script check handling from command/agent/consul
  • better docstrings for the "tasklet" concept being introduced here
  • annotate this PR for reviewers

@tgross tgross force-pushed the f-split-out-script-checks branch 11 times, most recently from d8cab24 to 46c6a97 Compare August 28, 2019 21:02
@tgross tgross added this to the 0.10.0 milestone Aug 28, 2019
@tgross tgross changed the title client: move script checks into task runner support script checks for task group services Aug 28, 2019
@tgross tgross force-pushed the f-split-out-script-checks branch 3 times, most recently from 4a1b943 to bdfd436 Compare August 29, 2019 18:41
// updateTTL updates the state to Consul, performing an expontential backoff
// in the case where the check isn't registered in Consul to avoid a race between
// service registration and the first check.
func (s *scriptCheck) updateTTL(ctx context.Context, id, msg, state string) error {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers:
Because we don't thread the update TTL through the same internal state proxy as we do all other Consul operations, it's possible for the update TTL to fire before it's registered. The window for this race is very small in the existing implementation, but this PR widens the window. By backing off here we work around the race condition. In future work we'll look into threading the update TTLs through the internal state proxy for Consul.

@@ -0,0 +1,161 @@
package taskrunner
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers:
Much of the code in this file has been moved from command/agent/consul/script.go

}

// closes over the script check and returns the taskletCallback for
// when the script check executes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers:
Much of the code in this function has been moved from command/agent/consul/script.go

// associated with the task. If so, we'll create scriptCheck tasklets
// for them. The group-level service and any check restart behaviors it
// needs are entirely encapsulated within the group service hook which
// watches Consul for status changes.
Copy link
Member Author

@tgross tgross Aug 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers:
This block is the only additional change we need to support Task Group services, so it's encapsulated in fe97b91

@tgross tgross marked this pull request as ready for review August 29, 2019 18:54
@tgross
Copy link
Member Author

tgross commented Aug 29, 2019

@schmichael @nickethier I've still got TestAllocGarbageCollector_MakeRoomForAllocations_GC_All to fix but this should be in a reviewable shape at this point.

@tgross tgross force-pushed the f-split-out-script-checks branch from bdfd436 to 0c867af Compare August 29, 2019 19:08
Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great at first glance, but I didn't make it too far in. Will pick it back up in the morning.

scriptChecks[sc.id] = sc
}
}
if len(scriptChecks) == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Updates may add services with script checks, I don't think you can completely disable the hook here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me... but now I'm not sure I understand how we add other hooks on update in that case. Ex. we only create new service hooks at task_runner_hooks.go#L98-L107.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welp that looks like a bug

client/allocrunner/taskrunner/script_check_hook.go Outdated Show resolved Hide resolved
@tgross tgross force-pushed the f-split-out-script-checks branch from 6e1caa9 to 3a92293 Compare August 30, 2019 13:10
Copy link
Contributor

@endocrimes endocrimes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good 🎉

@tgross tgross force-pushed the f-split-out-script-checks branch from de1c9d4 to 9aecd1e Compare August 30, 2019 15:41
@tgross tgross force-pushed the f-split-out-script-checks branch 2 times, most recently from 18f90b4 to de388c2 Compare August 30, 2019 18:17
@tgross
Copy link
Member Author

tgross commented Aug 30, 2019

I've addressed the open comments and rebased this on master with 2 clean commits (the script refactor + the task group scripts).

@tgross
Copy link
Member Author

tgross commented Aug 30, 2019

I'm not worried about the deploy-preview, as that's worked fine previously and looks to be a git pulling issue at their end.

In Nomad prior to Consul Connect, all Consul checks work the same
except for Script checks. Because the Task being checked is running in
its own container namespaces, the check is executed by Nomad in the
Task's context. If the Script check passes, Nomad uses the TTL check
feature of Consul to update the check status. This means in order to
run a Script check, we need to know what Task to execute it in.

To support Consul Connect, we need Group Services, and these need to
be registered in Consul along with their checks. We could push the
Service down into the Task, but this doesn't work if someone wants to
associate a service with a task's ports, but do script checks in
another task in the allocation.

Because Nomad is handling the Script check and not Consul anyways,
this moves the script check handling into the task runner so that the
task runner can own the script check's configuration and
lifecycle. This will allow us to pass the group service check
configuration down into a task without associating the service itself
with the task.
When tasks are checked for script checks, we walk back through their
task group to see if there are script checks associated with the
task. If so, we'll spin off script check tasklets for them. The
group-level service and any restart behaviors it needs are entirely
encapsulated within the group service hook.
@tgross tgross force-pushed the f-split-out-script-checks branch from 0c9f61c to fb13c5b Compare September 3, 2019 12:58
@tgross tgross force-pushed the f-split-out-script-checks branch from fb0f506 to 124bbc8 Compare September 3, 2019 18:17
@tgross tgross force-pushed the f-split-out-script-checks branch from 124bbc8 to 363b923 Compare September 3, 2019 18:19
@tgross tgross merged commit 40368d2 into master Sep 3, 2019
@tgross tgross deleted the f-split-out-script-checks branch September 3, 2019 19:09
@tgross tgross mentioned this pull request Sep 3, 2019
@github-actions
Copy link

github-actions bot commented Feb 1, 2023

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Register Checks for Group Services
4 participants