-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use soft docker memory limit instead of hard one #2771
Conversation
8673994
to
7410464
Compare
@dadgar any chance this will hit 0.6? |
@dadgar If you want this in, I would suggest making this a client config which the operator can control and not something which lives in the job config. I think allowing users to control such a thing in a multi-tenant environment could be bad from a QOS perspective for the tenants. |
client/driver/docker.go
Outdated
@@ -840,7 +844,15 @@ func (d *DockerDriver) createContainerConfig(ctx *ExecContext, task *structs.Tas | |||
config.WorkingDir = driverConfig.WorkDir | |||
} | |||
|
|||
memLimit := int64(task.Resources.MemoryMB) * 1024 * 1024 | |||
memLimit := int64(0) | |||
memReservation := int64(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var memLimit, memReservation int64
client/driver/docker_test.go
Outdated
|
||
waitForExist(t, client, handle.(*DockerHandle)) | ||
|
||
_, err := client.InspectContainer(handle.(*DockerHandle).ContainerID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably want to test that the MemReservation is set properly in this test. And test the other case too.
Left a few comments but probably worth waiting for @dadgar to weigh in. |
7410464
to
9abdb01
Compare
Thanks for review @diptanu, i updated variable definition, and added test to check docker config. I discussed with our Architect should this be agent configuration, or per job flag. And we came to agreement, that nomad is great scheduler, and it helps us to use all the features drivers are providing. But you are right, and we can add a flad that will permit on the client level this option like it's done with execs in consul. Let's wait what @dadgar will say, and i hope it'll land in 0.6 |
@burdandrei Can you explain your use case for this more. With the current implementation of this PR, a single allocation with this set would be able to starve the memory of the host and effect the QoS of every other allocation on the host. |
Ok, @dadgar. While working on my pull request i checked that behavior of OOM killer when using soft memory limits: I can see @OferE is just using exec driver to do the same, but Docker driver perfectly fits it, using soft memory option. I do agree, that flag name can be questionable, but i think opening this option of docker driver could be really great feature for batch processing. Cheers! |
Our use case is similar as described in #606. Add client {
options {
"docker.allowed_run_opts": "memory,memory-reservation"
}
} This will give operators control over multi-tenant environments. task {
resources {
memory = 500
}
config {
run_opt = [
"memory=1g",
"memory-reservation=200m"
]
}
} In this example Nomad would use memory requirement specified in resources stanza for task placement (same as it does now) and set hard memory limit on docker container (memory=500m). This is actually current implementation. All provided options in This would make docker driver way more flexible for our workloads. What do you think @dadgar @diptanu @burdandrei? We can still go only with memory limits but I would do something like suggested above (client config + task config). I would definitely make both soft and hard docker memory limits configurable. @dadgar I understand your concerns around QoS. As @burdandrei said we can use constrains and place such jobs on specific subset of our nodes which wouldn't effect others. I am ok with relying on docker OOM killer to kill some containers since they will be restarted anyway. I can help you work on this. |
@jzvelc, i understand what stands behind your idea, but i think the real problem is that nomad should maintain resource allocation, and not give jobs to kill the cluster. So maybe the best way to use it is to rename it to |
I would still like to have control over both, hard and soft memory limits (something like ECS memory and memoryReservation parameters). |
The thing is that ECS is only running dockers, and that's why you got memory and memoryReservation options, but in nomad resources are different stanza, but let's hear @dadgar, as an official represenatitve =) |
Don't know if it worth something but i do think @burdandrei is right - and soft limits should be an option. BTW not just for memory but for all other resources. I am using raw-exec but this is a hack (though i am really happy with it). |
Hey @burdandrei , Unfortunately I will be closing this. This isn't without consideration because I do understand there is value here but it has implications that I am not comfortable with. QoSBy enabling this feature you are bypassing all memory enforcement. While this may not seem bad at first, it has an effect on every allocation running on the host. We allow this via the Operator ConfidenceBy enabling this feature there is less determinism in the system and that can cause much confusion. Running a job with this set at a high count may cause some allocations to fail as others run successful just because of what it is collocated with. This is a highly un-desirable side effect. ArchitectureIn the example you gave there is an architectural problem. You are essentially running a batch job as a long lived service but want over-subscription semantics. Over-subscription is a feature that we will add in the long term. This will allow batch jobs to use resources that are unused but potentially allocated by the service jobs on a host. You are better off dispatching jobs on an add needed basis to process work from the queue. This way you can configure them for their peak load and when they finish, their resources are free'd up. We have written a blog on this pattern: https://www.hashicorp.com/blog/replacing-queues-with-nomad-dispatch/ |
thanks for your answer @dadgar, wethought about using job dispatch, but with legacy projects, when worker statup time takes more than 30 seconds and work itself takes 200ms, workers with queues doing great work. |
@dadgar @burdandrei My requirement is more of a In my scenario, I know that 3 containers of a specific low overhead run fine on one machine, yet I have to keep calculating memory and cpu divided by 3 (for each job). I have a scenario where I isolate certain type of workloads on machines by setting the Nomad would just restart jobs on the bigger machine. Regards, |
@dadgar is this still the case, or is there an option for jobs to burst over their allocated resources? Perhaps something an operator can specify at server level? |
@dadgar This is super important for our use-case.
|
@zonnie probably we'll be able to add this as a pluggable driver, that should become a feature soon 🤞 |
@burdandrei - can you give me any lead on how to implement this type of thing ? |
@schmichael, @nickethier after our talk at HashiConf, do you want me to revive this one, or open a new PR? |
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
As duscussed in couple of open issues(like this) there is sometime a need to run docker containters without hard memory limit.
mem_limit_disable
option will run docker container as it described with soft memory limit. So docker will not limit memory allocation for container, but will honor it to know who to kill if host memory is exhausted.