-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
During restore, pods will spend a significant amount of time in "PodScheduled=false, reason: Unschedulable" #8584
Comments
https://github.com/kubevirt/kubevirt-velero-plugin did you use this plugin? |
Yup ofc, this is not specific to kubevirt, but it's how we discovered it 😃 |
you need to clarify how you installed velero and how backup is created since velero has several ways to backup/restore volume data. |
This actually came from an internal bug in OpenShift so I assume OADP. Does that help? |
I assume you are using the CSI snapshot data mover to back up and restore the volume data according to your description. And we also need to know some more information.
|
That does not help narrow down no. OADP exposes all the ways velero upstream can handle volume backups so the same query remains unanswered. If you want to reference a bug feel free to leave a link. Or if you need to file a bug with OpenShift, you can open a customer case at https://access.redhat.com/support/cases/#/case/new |
we create the backup/restore by yaml:
|
For data mover, this means the data mover restore for volumes of the pod have not all completed. The pod is purposefully blocked from schedule, otherwise, the restored data may be corrupted. So this is the expected behavior.
Could you describe more how this behavior affects users experience? |
I think VM keeps in ErrorUnschedulable for about 8 minutes which is confusing user
|
I don't expect that the pod starts without the restore being completed; this issue is about reporting the state better |
Unfortunately, the pod's |
Yes, but velero is orchestrating a somewhat unusual path which triggers it. And thus it can also avoid it |
Let me think about it, at present, Velero only creates the backupPod/restorePod, it doesn't manipulate the status of backupPod/restorePod, everything else is handled by the scheduler. |
I had a suggestion on the issue description
|
If |
Can I attempt to clarify what is going on? Velero is restoring the VirtualMachine and related objects of kubevirt and the PVC at the same time. Due to the creation of both objects and the PVC is being restored the VirtualMachine shows as not ErrorUnschedulable state? Once the PVC finishes restoring the VirtualMachine goes into Running state? |
Of course. But during that time (over 8 minutes or so) the VM pod is "Unscheduleable". |
What steps did you take and what happened:
(This is probably reproducible with a Pod/PVC combo as well)
What did you expect to happen:
None or maybe brief amount of time spent in
Unschedulable
The following information will help us better understand what's going on:
I believe that velero's restore mechanics differ from a standard dynamic provisioning pod/PVC (totally understandable),
but specifically in the part where the PVC is setting a spec.volumeName && volume is unbound for a long period of time
is going to trigger the pod's
Unschedulable
condition for quite a bit of time (leaving users/components relying on this API quite confused):https://github.com/kubernetes/kubernetes/blob/4114a9b4e45a4df96f0383d87b2649640a6ffbf1/pkg/scheduler/framework/plugins/volumebinding/binder.go#L791-L794
https://github.com/kubernetes/kubernetes/blob/4114a9b4e45a4df96f0383d87b2649640a6ffbf1/pkg/scheduler/framework/plugins/volumebinding/volume_binding.go#L363-L369
If you are using velero v1.7.0+:
Please use
velero debug --backup <backupname> --restore <restorename>
to generate the support bundle, and attach to this issue, more options please refer tovelero debug --help
If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)
kubectl logs deployment/velero -n velero
velero backup describe <backupname>
orkubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
velero restore describe <restorename>
orkubectl get restore/<restorename> -n velero -o yaml
velero restore logs <restorename>
Anything else you would like to add:
I am not sure how restore is achieved, so my suggestions may not make much sense, but,
if the volumeName would only get added to the PVC once it's ready, maybe that would avoid the issue
Environment:
velero version
):velero client config get features
):kubectl version
):/etc/os-release
):Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
The text was updated successfully, but these errors were encountered: