You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently task drivers can't advertise their capabilities on a per-host basis to the server. This means that if we add a feature to a task driver, the server doesn't know which version the task driver is running, and it's possible that an older version won't be able to support a new feature request. This gets handled by throwing an error on the client after the workload has been placed, and that's a bad experience.
The job validation/registration RPCs support admission hooks that can alter a job as its submitted. We currently use this to create an implied constraint for certain versions of Vault when the vault block is provided. This implied constraint is then handled by the server just like any user-provided constraint. This is potentially a nice avenue for us because it lets us add scheduler features without having to rework the scheduler internals.
We're currently working on a design for oversubscription (ref #606) and some of the scheduling work for that may experiment with this idea. Some considerations include:
Round-tripping and Visibility
The jobspec submitted by the HTTP API and the jobspec that's read back out are no longer the same. This requires a bunch of annoying special-casing in the Terraform provider so that we don't get into a plan-loop where the Nomad state is always considered "dirty" by Terraform.
We want scheduling decisions to be visible to the operator. Any implied constraints should be made visible in the nomad alloc status -verbose and nomad eval status outputs.
One option might be to have a separate structs.Job.ImplicitConstraints field that isn't returns on the job read API but is included on the alloc status and eval status APIs.
Another might be to update the HCL for constraints to optionally include a label. The label for implicit constraints would be automatically prefixed by "nomad-" and then API consumers like Terraform could ignore them. Ex.
Node state store size: every structs.Node.Attribute we add increases the size of the memory required by the server for every node. But the total memory increase compared to the number of structs.Node and the jobs seems negligible: on a medium cluster of 100 nodes that ends up being 1MB per 10k attribute, vs the many 1000s of jobs and allocations that could be on a cluster that size.
Job state store size: every structs.Job.Constraint we add increases the size of the memory required by the server for every job. This is a greater concern than increasing the size of structs.Node, but individual structs.Job are very large compared to adding new constraints anyways so the per-job impact is small.
The text was updated successfully, but these errors were encountered:
Currently task drivers can't advertise their capabilities on a per-host basis to the server. This means that if we add a feature to a task driver, the server doesn't know which version the task driver is running, and it's possible that an older version won't be able to support a new feature request. This gets handled by throwing an error on the client after the workload has been placed, and that's a bad experience.
The job validation/registration RPCs support admission hooks that can alter a job as its submitted. We currently use this to create an implied constraint for certain versions of Vault when the vault block is provided. This implied constraint is then handled by the server just like any user-provided constraint. This is potentially a nice avenue for us because it lets us add scheduler features without having to rework the scheduler internals.
We're currently working on a design for oversubscription (ref #606) and some of the scheduling work for that may experiment with this idea. Some considerations include:
Round-tripping and Visibility
nomad alloc status -verbose
andnomad eval status
outputs.structs.Job.ImplicitConstraints
field that isn't returns on the job read API but is included on the alloc status and eval status APIs."nomad-"
and then API consumers like Terraform could ignore them. Ex.Size
structs.Node.Attribute
we add increases the size of the memory required by the server for every node. But the total memory increase compared to the number ofstructs.Node
and the jobs seems negligible: on a medium cluster of 100 nodes that ends up being 1MB per 10k attribute, vs the many 1000s of jobs and allocations that could be on a cluster that size.structs.Job.Constraint
we add increases the size of the memory required by the server for every job. This is a greater concern than increasing the size ofstructs.Node
, but individualstructs.Job
are very large compared to adding new constraints anyways so the per-job impact is small.The text was updated successfully, but these errors were encountered: