Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support in-place Pod vertical scaling in VPA #4016

Open
noBlubb opened this issue Apr 15, 2021 · 40 comments · May be fixed by #7673 or #6652
Open

Support in-place Pod vertical scaling in VPA #4016

noBlubb opened this issue Apr 15, 2021 · 40 comments · May be fixed by #7673 or #6652
Assignees
Labels
area/vertical-pod-autoscaler kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@noBlubb
Copy link

noBlubb commented Apr 15, 2021

Hey everyone,

as I gather the VPA currently cannot update pods without recreating them:

Once restart free ("in-place") update of pod requests is available
from README

and neither can the GKE vertical scaler:

Due to Kubernetes limitations, the only way to modify the resource requests of a running Pod is to recreate the Pod
from https://cloud.google.com/kubernetes-engine/docs/concepts/verticalpodautoscaler#vertical_pod_autoscaling_in_auto_mode

Unfortunately, I was unable to learn the specific limitation from this (other than the mere absence of any such feature?) nor timeline for this to appear in VPA or how to contribute on this if possible. Could you please outline what is missing in VPA for this to be implemented?

Best regards,
Raffael

@morganchristiansson
Copy link

Would be nice with more details on the status feature. I would guess it's limitation in Kubernetes or from a lower level like containerd or kernel?

@bskiba
Copy link
Member

bskiba commented Apr 28, 2021

At this moment this is a Kubernetes limitation (kernel and container runtime already supports resizing containers). There is work needed in scheduler, kubelet, core API so a pretty cross-cutting problem. Also a lot of systems assumed pod sized are immutable for a long time so there is need to untangle those as well.

There is ongoing work in Kubernetes to provide in-place pod resizes (Example: kubernetes/enhancements#1883). Once that work completes VPA will be able to take advantage of that.

@k8s-triage-robot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 27, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 26, 2021
@Jeffwan
Copy link
Contributor

Jeffwan commented Aug 27, 2021

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 27, 2021
@jmo-qap
Copy link

jmo-qap commented Sep 15, 2021

kubernetes/kubernetes#102884

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 27, 2022
@jbartosik
Copy link
Collaborator

/remove-lifecycle rotten

@jbartosik
Copy link
Collaborator

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 28, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 29, 2022
@jbartosik
Copy link
Collaborator

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 31, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 29, 2022
@jbartosik
Copy link
Collaborator

/remove-lifecycle stale

Support for in-place updates didn't make it into K8s 1.25 but it aiming for 1.26.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 2, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 1, 2022
@voelzmo
Copy link
Contributor

voelzmo commented Dec 2, 2022

/remove-lifecycle stale
Feature didn't make it in 1.26, but now targeted for 1.27 ;)

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 2, 2022
@frivoire
Copy link

frivoire commented Jan 5, 2023

This issue seems to be a duplicate of: #5046
Shouldn't we close one of those 2 issues ?

@SergeyKanzhelev
Copy link
Member

I don't think the VPA should look at the ResizePolicy field in PodSpec.containers at all.

API currently is limited and not supporting the notion of "apply changes if possible without restart and not apply otherwise". Which may impact PDB. I don't know how autoscaler deals with PDB today, but if there will be higher frequency autoscaling with InPlace update hoping for non disruptive change, this will not work. In other words, we either need a new API to resize ONLY without the restart or treat a resize as a disruption affecting PDB.

@voelzmo
Copy link
Contributor

voelzmo commented Jul 26, 2023

@SergeyKanzhelev thanks for joining the discussion!

I don't know how [vertical pod] autoscaler deals with PDB today

Today, VPA uses the eviction API, which respects PDB.

we either need a new API to resize ONLY without the restart or treat a resize as a disruption affecting PDB.

I'm not sure which component the "we" part in this sentence is, but in general, I tend to agree with the need for an API that respects PDB. If kubelet needs to restart the Pod for applying a resource change, this should count towards PDB. However, I think this shouldn't be a concern that VPA has to deal with. Similarly to eviction, VPA should just be using an API that respects PDB if we consider this relevant for the restart case as well.

Regarding my statement from above

I don't think the VPA should look at the ResizePolicy field in PodSpec.containers at all.

This is no longer correct, as @jbartosik opted for a more informed approach in the enhancement proposal. Currently, VPA implements some constraints to ensure resource updates don't happen too frequently (for example, by requiring a mimimum absolute/relative change for Pods which have been running for shorter than 12 hours). The proposal contains the idea to change these constraints if a Container has ResizePolicy: NotRequired.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 25, 2024
@jbartosik
Copy link
Collaborator

/remove-lifecycle stale
/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 25, 2024
@jkyros jkyros linked a pull request Mar 25, 2024 that will close this issue
@nikimanoledaki
Copy link
Contributor

Hi folks, could someone share a summary of what is blocking this feature please? +1 that this would be really useful to reduce workload evictions. Thank you!

@voelzmo
Copy link
Contributor

voelzmo commented Sep 30, 2024

I think the summary is: the kubernetes feature for in-place resource updates is in alpha stage and there are still many things to be done before it will be promoted to beta status. See kubernetes/enhancements#4704 for a summary and the ongoing discussion.
As for beta, many things will fundamentally change e.g. what the API for this feature is (they're e.g. talking about an introduction of a /resize subresource), I don't think we can start working on this from the VPA side before the feature reaches beta state in k/k.

@adrianmoisey
Copy link
Member

Also note that a work-in-progress PR does exist: #6652

@sftim
Copy link
Contributor

sftim commented Oct 11, 2024

Help with the implementation (both for Pod-level resizing, and automatically managing that size) is very welcome.

@adrianmoisey
Copy link
Member

Help with the implementation (both for Pod-level resizing, and automatically managing that size) is very welcome.

If someone did want to help, where can they go to get involved?

@kennangaibel
Copy link

Help with the implementation (both for Pod-level resizing, and automatically managing that size) is very welcome.

@sftim I would also be interested in helping

@SergeyKanzhelev
Copy link
Member

The proposal contains the idea to change these constraints if a Container has ResizePolicy: NotRequired.

To re-iterate on #4016 (comment), there is no API exposed that would mean "no restart resize". Checking for ResizePolicy may only give information when container if DEFINITELY WILL BE restarted, but if the policy is not this, container MAY be restarted on resize.

If the API "no restart resize" is required, it will be a great feedback for the KEP.

We cannot just assume there will be no restart. VPA will need to have some sort of logic of respecting PDB and detecting restarts as well as a way for users to block VPA for a specfic Pod or Container.

@aglikson-netapp
Copy link

@SergeyKanzhelev in which cases would you expect the container to restart even though ResizePolicy=NotRequired?

@voelzmo
Copy link
Contributor

voelzmo commented Dec 19, 2024

@SergeyKanzhelev gentle ping about the question of Container restarts despite ResizePolicy=NotRequired. Can elaborate (or point me to some docs) for this case and if/how this case adheres to PDB? Thanks!

@sivarama-p-raju
Copy link

@SergeyKanzhelev

We are currently on Kubernetes version 1.29 (on-premises kubeadm installations).

With the featuregate "InPlacePodVerticalScaling" enabled on the API servers and kubelet, in-place resource updates can be done by patching the pods using "kubectl patch". I tested the same with the below patch command and I see that the container did not restart and the updated resources could be seen on describing the pod:

kubectl patch pod <pod> -n <namespace> --patch '{
  "spec": {
    "containers": [
      {
        "name": "<container>",
        "resources": {
          "requests": {
            "cpu": "80m",
            "memory": "120Mi"
          },
          "limits": {
            "cpu": "200m",
            "memory": "128Mi"
          }
        }
      }
    ]
  }
}'
pod/<pod> patched

Just curious to know if VPA can do something like this instead of evicting the pod. Could you please let me know your thoughts on the same ?

@sftim
Copy link
Contributor

sftim commented Jan 6, 2025

This issue requests that support @sivarama-p-raju

The behavior you're asking for won't be available until a volunteer, or group of volunteers, does the work to implement this feature. You can upvote this issue or even contribute time towards the effort.

You can tell it's not yet done because the issue says “Open“ in green at the top.

@maxcao13 maxcao13 linked a pull request Jan 7, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/vertical-pod-autoscaler kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects