-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rolling upgrades / gradual rollout for batch type jobs #9745
Comments
Hi @anapsix! Thanks for opening this issue. It looks like you're running into a feature that's been requested a few different times in different ways. This is sort of like the "Schedule a job depending on another job" issue #545, and partly like the Airflow integration discussed in #419 (comment). One of the ideas we decided not to implement for You may be able to workaround this one of two ways:
For both of those workarounds, if you don't want to embed this in the task itself, you could do it as a |
Thank you for a quick response @tgross. Appreciate the workaround suggestion as well. Prestart and poststop could be very useful for this indeed. |
Just in case it's helpful, we do have a
I don't think it's a "bad" idea so much as one we just haven't gotten onto the roadmap. Just as an experiment, I just cooked up a build which removes the validation that creates the job "example" {
datacenters = ["dc1"]
type = "batch"
group "worker" {
count = 10
update {
max_parallel = 2
stagger = "3s"
}
task "worker" {
driver = "docker"
config {
image = "busybox:1"
command = "/bin/sh"
args = ["-c", "echo 'this looks like work'; sleep 10"]
}
resources {
cpu = 128
memory = 64
}
}
}
} If we run that job, we immediately get 10
If we change something in the job and run it again, we get 10 new allocations, because there are no running allocations to replace in the deployment! If we add In any case, I suspect what you really want here is a staggered rollout even for the first deployment, and not just on updates. That's why one of the other approaches like a dispatch job might be a reasonable workaround for you for now. |
Oh, thanks for trying that
In my case, I'd generate a unique job id (e.g. "deploy-MyApp-297209") each time new application version is deployed. So every time the deployment job runs it would be unique. And all would be executed at once, as your experiment shown. Going to consider generating a single job per node, or using locking mechanism we've talked about.. And will play with "sysbatch" jobs, then those become available. ... |
I'd steer away from trying to bend service jobs into this role. If a service task completes, Nomad is going to try to restart/reschedule it. Anything you can do to workaround that is likely to end up being more complicated than doing some of the other approaches you're considering. |
Going to mark this as a feature request and get it into the roadmapping discussion. |
Nomad version
Issue
It appears there is no way to use "update" stanza on "batch" type job to stagger. Only on "service" and "system" type jobs.
In other words, rolling upgrades seem to be unsupported for "batch" type jobs.
Unfortunately, "service" type job isn't flexible enough for my use case, and using "system" job feels wrong.
Executed script could be updated to use a locking mechanism, making it wait until the lock is available before proceeding. But that feels like over engineering the process.
Is there some other Nomad way to achieve a gradual rollout / execution of "batch" job?
Use case: using
raw_exec
to execute script performing complex deployment on hosts gradually, ensuring such execution will not cause an outage by making application unavailable on all targeted hosts./cc @tgross
Reproduction steps
batch-test.nomad
nomad job validate batch-test.nomad
Job file
Nomad Client logs
The text was updated successfully, but these errors were encountered: