Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

During restore, pods will spend a significant amount of time in "PodScheduled=false, reason: Unschedulable" #8584

Open
akalenyu opened this issue Jan 6, 2025 · 17 comments

Comments

@akalenyu
Copy link

akalenyu commented Jan 6, 2025

What steps did you take and what happened:

  1. Create a running kubevirt VM
  2. Backup the VM
  3. Delete the VM
  4. Restore the VM
    (This is probably reproducible with a Pod/PVC combo as well)
      message: '0/6 nodes are available: pod has unbound immediate PersistentVolumeClaims.
        preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.'
      reason: Unschedulable
      status: "False"
      type: PodScheduled

What did you expect to happen:
None or maybe brief amount of time spent in Unschedulable

The following information will help us better understand what's going on:
I believe that velero's restore mechanics differ from a standard dynamic provisioning pod/PVC (totally understandable),
but specifically in the part where the PVC is setting a spec.volumeName && volume is unbound for a long period of time
is going to trigger the pod's Unschedulable condition for quite a bit of time (leaving users/components relying on this API quite confused):
https://github.com/kubernetes/kubernetes/blob/4114a9b4e45a4df96f0383d87b2649640a6ffbf1/pkg/scheduler/framework/plugins/volumebinding/binder.go#L791-L794
https://github.com/kubernetes/kubernetes/blob/4114a9b4e45a4df96f0383d87b2649640a6ffbf1/pkg/scheduler/framework/plugins/volumebinding/volume_binding.go#L363-L369

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

  • kubectl logs deployment/velero -n velero
  • velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
  • velero backup logs <backupname>
  • velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
  • velero restore logs <restorename>

Anything else you would like to add:

I am not sure how restore is achieved, so my suggestions may not make much sense, but,
if the volumeName would only get added to the PVC once it's ready, maybe that would avoid the issue

Environment:

  • Velero version (use velero version):
  • Velero features (use velero client config get features):
  • Kubernetes version (use kubectl version):
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@kaovilai
Copy link
Member

kaovilai commented Jan 6, 2025

https://github.com/kubevirt/kubevirt-velero-plugin did you use this plugin?

@akalenyu
Copy link
Author

akalenyu commented Jan 6, 2025

Yup ofc, this is not specific to kubevirt, but it's how we discovered it 😃

@kaovilai
Copy link
Member

kaovilai commented Jan 6, 2025

you need to clarify how you installed velero and how backup is created since velero has several ways to backup/restore volume data.

@akalenyu
Copy link
Author

akalenyu commented Jan 6, 2025

This actually came from an internal bug in OpenShift so I assume OADP. Does that help?

@blackpiglet
Copy link
Contributor

I believe that velero's restore mechanics differ from a standard dynamic provisioning pod/PVC (totally understandable),
but specifically in the part where the PVC is setting a spec.volumeName && volume is unbound for a long period of time
is going to trigger the pod's Unschedulable condition for quite a bit of time (leaving users/components relying on this API quite confused):

I assume you are using the CSI snapshot data mover to back up and restore the volume data according to your description.
Then we'd better know the unscheduled time for each volume. It may not be that accurate, but we need to know how much is significant in your scenario.

And we also need to know some more information.

  • The amount of data in the volumes.
  • How many volumes are mounted in the restored pods, for example, one pod mount one volume, or multiple volumes?

@kaovilai
Copy link
Member

kaovilai commented Jan 7, 2025

This actually came from an internal bug in OpenShift so I assume OADP. Does that help?

That does not help narrow down no. OADP exposes all the ways velero upstream can handle volume backups so the same query remains unanswered.

If you want to reference a bug feel free to leave a link. Or if you need to file a bug with OpenShift, you can open a customer case at https://access.redhat.com/support/cases/#/case/new

@duyanyan
Copy link

duyanyan commented Jan 7, 2025

we create the backup/restore by yaml:

---
apiVersion: velero.io/v1
kind: Backup
metadata:
  name: backup-demo
  namespace: openshift-adp
spec:
  includedNamespaces:
  - demo
  snapshotMoveData: true
  storageLocation: dpa-1
---
apiVersion: velero.io/v1
kind: Restore
metadata:
  name: restore-demo
  namespace: openshift-adp
spec:
  backupName: backup-demo
  includedNamespaces:
  - demo

@blackpiglet blackpiglet added the Needs info Waiting for information label Jan 7, 2025
@Lyndon-Li
Copy link
Contributor

trigger the pod's Unschedulable condition for quite a bit of time

For data mover, this means the data mover restore for volumes of the pod have not all completed. The pod is purposefully blocked from schedule, otherwise, the restored data may be corrupted. So this is the expected behavior.

(leaving users/components relying on this API quite confused)

Could you describe more how this behavior affects users experience?

@duyanyan
Copy link

duyanyan commented Jan 8, 2025

I think VM keeps in ErrorUnschedulable for about 8 minutes which is confusing user

fedora-s   0s    WaitingForVolumeBinding   False
fedora-s   0s    WaitingForVolumeBinding   False
fedora-s   1s    ErrorUnschedulable        False
fedora-s   7m33s   Starting                  False
fedora-s   7m51s   Starting                  False
fedora-s   7m53s   Running                   False
fedora-s   7m53s   Running                   True
fedora-s   8m14s   Running                   True  

@akalenyu
Copy link
Author

akalenyu commented Jan 8, 2025

For data mover, this means the data mover restore for volumes of the pod have not all completed. The pod is purposefully blocked from schedule, otherwise, the restored data may be corrupted. So this is the expected behavior.

I don't expect that the pod starts without the restore being completed; this issue is about reporting the state better
during that time. 8 minutes of Unschedulable is something that could be avoided using some orchestration from the velero side, no?

@Lyndon-Li
Copy link
Contributor

Unschedulable is something that could be avoided using some orchestration from the velero side, no?

Unfortunately, the pod's Unschedulable is set by k8s scheduler, not by Velero.

@akalenyu
Copy link
Author

akalenyu commented Jan 8, 2025

Unschedulable is something that could be avoided using some orchestration from the velero side, no?

Unfortunately, the pod's Unschedulable is set by k8s scheduler, not by Velero.

Yes, but velero is orchestrating a somewhat unusual path which triggers it. And thus it can also avoid it
(regular dynamic provisioning with populators won't hit this)

@Lyndon-Li
Copy link
Contributor

Let me think about it, at present, Velero only creates the backupPod/restorePod, it doesn't manipulate the status of backupPod/restorePod, everything else is handled by the scheduler.

@Lyndon-Li Lyndon-Li self-assigned this Jan 8, 2025
@akalenyu
Copy link
Author

akalenyu commented Jan 8, 2025

I had a suggestion on the issue description

Anything else you would like to add:
I am not sure how restore is achieved, so my suggestions may not make much sense, but,
if the volumeName would only get added to the PVC once it's ready, maybe that would avoid the issue

@Lyndon-Li
Copy link
Contributor

If volumeName is set as empty, the standard dynamic provision will happen, and eventually result in an empty new volume.

@msfrucht
Copy link
Contributor

msfrucht commented Jan 8, 2025

@akalenyu

Can I attempt to clarify what is going on?

Velero is restoring the VirtualMachine and related objects of kubevirt and the PVC at the same time.

Due to the creation of both objects and the PVC is being restored the VirtualMachine shows as not ErrorUnschedulable state?

Once the PVC finishes restoring the VirtualMachine goes into Running state?

@akalenyu
Copy link
Author

akalenyu commented Jan 9, 2025

@akalenyu

Can I attempt to clarify what is going on?

Velero is restoring the VirtualMachine and related objects of kubevirt and the PVC at the same time.

Due to the creation of both objects and the PVC is being restored the VirtualMachine shows as not ErrorUnschedulable state?

Once the PVC finishes restoring the VirtualMachine goes into Running state?

Of course. But during that time (over 8 minutes or so) the VM pod is "Unscheduleable".
You can reproduce this backing up and restoring a pod/pvc combo
(since a kubevirt virtual machine is precisely that)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants