-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VPA daemonset recommendations per-pod based on node metadata #5928
Comments
Hi thanks for opening the issue cause we have the same need too. At the moment of writing, I think that before starting to address the issue Avoiding the use of scheduling setting On the other side extending the default Scheduler to address the binding pod on node can be a possible solution but I've some concerns about this approach WDYT ? |
Thanks for the comment @fbalicchia. I'm not an expert in this space, so I can't really speak to your suggestions. I don't know when the affinity of a pod is determined, but I know if I inspect the pods of a daemonset they have an explicit affinity for the node on which they are intended to run. If that information is available, it might be usable for this case. |
This is one of things I was thinking to support when we have in place support(#4016). I'd rather do something that supports multiple similar use cases (where we have one workload with instances that have somewhat different resource requirements) and support daemonset s than do dedicated feature for daemonsets. |
I think that's a good goal @jbartosik. I'm having trouble thinking of how to generalize this to all deployments. Are you giving up on the idea of prediction and just deferring the decision until runtime? One of the valuable elements of this suggestion is that you would know beforehand how big a specific pod is likely to be based on some external, measurable factor. |
I don't have a specific proposal yet, just some ideas. Like I wrote I think this is something to take a look at after we have support for in-place updates. We need a way to detect pods that have unusual resource usage for their deployment, waiting for the actual usage data to come is one way we could detect that. Another is using different metrics (similar to how you proposed using node size here). |
Why is this a blocker? If a pod is resized and the rescheduled to a different node, it seems like it just needs to respect any existing affinities. |
My guess is that it's something about nodes that makes resource usage of different daemon sets different (size of node, number of pods, amount of logging happening...) So if we don't know on which node a pod will live then we don't know how much resources it will need. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
Which component are you using?:
vertical pod autoscaler
Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:
Some daemonsets are comprised of pods which have variable resource needs depending on the node they run on, and by their nature they cannot horizontally scale out of this problem.
Consider the case where a kube cluster is running a cluster autoscaler that provisions all manner of different node types based on cheapest-available capacity (e.g., karpenter using AWS spot).
In the case of dramatically variable node sizes, a pod that's a member of the datadog agent daemonset will require more resources to handle an instance with more pods on it when compared to a member of that same daemonset running on a tiny instance with only a few pods.
Describe the solution you'd like.:
I would like the VPA to (optionally) provide recommendations along an extra dimension for DaemonSets, such as ENI max pods for the host, and size DS pods individually based on this dimension. The recommender might suggest a memory configuration of any given pod based on historical
memory_consumed/node_max_pods
instead of a single memory value across the daemonset.Describe any alternative solutions you've considered.:
The alternative is to overprovision the daemonset by a large margin on small instances, or to limit cluster node variability.
Additional context.:
Running on AWS EKS 1.24 with Karpenter.
The text was updated successfully, but these errors were encountered: