Pre-backup hook fails due to missing compute container in unstable virt-launcher pods #319

sshende-catalogicsoftware · 2025-01-13T11:59:04Z

What happened:
During Velero backup operations of KubeVirt VMs, backups are failing with the following error:

Error executing hook, Type: pre, resource: pods, name: virt-launcher-140-sample-pool1-88529b17-4qqnx-m5qn7, namespace: default, message: unable to upgrade connection: container not found ("compute")
The current implementation of the kubevirt-velero-plugin does not verify the stability of the virt-launcher pod before initiating the backup process. This leads to failed backups when the VM's virt-launcher pod is in an unstable state or during pod transitions.

What you expected to happen

The plugin should verify the state of the virt-launcher pod before initiating a VM/VMI backup
If the pod is not in a stable state (Running phase with all containers ready), the backup should be skipped with appropriate logging
This would prevent backup failures and provide clearer feedback about why certain VMs were not backed up

How to reproduce it

Deploy a KubeVirt VM on your cluster
Trigger an event that causes the virt-launcher pod to restart or enter an unstable state:
- Cause an OOMKill
- Trigger a rolling update
Attempt to create a Velero backup during this transition period
Observe the backup failure with the "container not found ("compute")" error

Additional context

Root cause analysis reveals that this error occurs because:

The backup process attempts to execute pre-backup hooks on the virt-launcher pod
During pod transitions or unstable states, the required 'compute' container may not be available
The current plugin implementation doesn't validate pod stability before backup

The proposed solution involves:

Adding pod state validation in both VM and VMI backup item action plugins
Checking for:
- Pod existence
- Pod Running state
- All containers being ready
Skipping backup with appropriate logging when validation fails

Environment

KubeVirt version: v1.4.0
Kubernetes version: 1.27
Velero version: 1.14.1
kubevirt-velero-plugin version: v0.7.0

Impact

This enhancement would:

Improve backup reliability
Provide clearer feedback about skipped backups
Prevent failed backup attempts for VMs in transition states

The text was updated successfully, but these errors were encountered:

sshende-catalogicsoftware added the kind/bug label Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-backup hook fails due to missing compute container in unstable virt-launcher pods #319

Pre-backup hook fails due to missing compute container in unstable virt-launcher pods #319

sshende-catalogicsoftware commented Jan 13, 2025

Pre-backup hook fails due to missing compute container in unstable virt-launcher pods #319

Pre-backup hook fails due to missing compute container in unstable virt-launcher pods #319

Comments

sshende-catalogicsoftware commented Jan 13, 2025

What you expected to happen

How to reproduce it

Additional context

Environment

Impact