scheduling on task driver capabilities via implicit constraints #10088

tgross · 2021-02-24T19:10:24Z

Currently task drivers can't advertise their capabilities on a per-host basis to the server. This means that if we add a feature to a task driver, the server doesn't know which version the task driver is running, and it's possible that an older version won't be able to support a new feature request. This gets handled by throwing an error on the client after the workload has been placed, and that's a bad experience.

The job validation/registration RPCs support admission hooks that can alter a job as its submitted. We currently use this to create an implied constraint for certain versions of Vault when the vault block is provided. This implied constraint is then handled by the server just like any user-provided constraint. This is potentially a nice avenue for us because it lets us add scheduler features without having to rework the scheduler internals.

We're currently working on a design for oversubscription (ref #606) and some of the scheduling work for that may experiment with this idea. Some considerations include:

Round-tripping and Visibility

The jobspec submitted by the HTTP API and the jobspec that's read back out are no longer the same. This requires a bunch of annoying special-casing in the Terraform provider so that we don't get into a plan-loop where the Nomad state is always considered "dirty" by Terraform.
We want scheduling decisions to be visible to the operator. Any implied constraints should be made visible in the nomad alloc status -verbose and nomad eval status outputs.
One option might be to have a separate structs.Job.ImplicitConstraints field that isn't returns on the job read API but is included on the alloc status and eval status APIs.
Another might be to update the HCL for constraints to optionally include a label. The label for implicit constraints would be automatically prefixed by "nomad-" and then API consumers like Terraform could ignore them. Ex.

# implicit constraint
constraint "nomad-os-signals" {
  operator = "${attr.taskdriver.os_signals}"
  value = "1"
}

# user-defined named constraint
constraint "preprod" {
  operator  = "${node.class}"
  value     = "preprod"
}

# user-defined unnamed constraint
constraint {
  operator  = "distinct_hosts"
  value     = "true"
}

Size

Node state store size: every structs.Node.Attribute we add increases the size of the memory required by the server for every node. But the total memory increase compared to the number of structs.Node and the jobs seems negligible: on a medium cluster of 100 nodes that ends up being 1MB per 10k attribute, vs the many 1000s of jobs and allocations that could be on a cluster that size.
Job state store size: every structs.Job.Constraint we add increases the size of the memory required by the server for every job. This is a greater concern than increasing the size of structs.Node, but individual structs.Job are very large compared to adding new constraints anyways so the per-job impact is small.

The text was updated successfully, but these errors were encountered:

tgross added type/enhancement stage/needs-discussion labels Feb 24, 2021

tgross mentioned this issue Jun 21, 2021

UI: Display GPU resource usage #10779

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scheduling on task driver capabilities via implicit constraints #10088

scheduling on task driver capabilities via implicit constraints #10088

tgross commented Feb 24, 2021

scheduling on task driver capabilities via implicit constraints #10088

scheduling on task driver capabilities via implicit constraints #10088

Comments

tgross commented Feb 24, 2021