Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-backup hook fails due to missing compute container in unstable virt-launcher pods #319

Open
sshende-catalogicsoftware opened this issue Jan 13, 2025 · 0 comments
Labels

Comments

@sshende-catalogicsoftware

What happened:
During Velero backup operations of KubeVirt VMs, backups are failing with the following error:

Error executing hook, Type: pre, resource: pods, name: virt-launcher-140-sample-pool1-88529b17-4qqnx-m5qn7, namespace: default, message: unable to upgrade connection: container not found ("compute")
The current implementation of the kubevirt-velero-plugin does not verify the stability of the virt-launcher pod before initiating the backup process. This leads to failed backups when the VM's virt-launcher pod is in an unstable state or during pod transitions.

What you expected to happen

  • The plugin should verify the state of the virt-launcher pod before initiating a VM/VMI backup
  • If the pod is not in a stable state (Running phase with all containers ready), the backup should be skipped with appropriate logging
  • This would prevent backup failures and provide clearer feedback about why certain VMs were not backed up

How to reproduce it

  1. Deploy a KubeVirt VM on your cluster
  2. Trigger an event that causes the virt-launcher pod to restart or enter an unstable state:
    • Cause an OOMKill
    • Trigger a rolling update
  3. Attempt to create a Velero backup during this transition period
  4. Observe the backup failure with the "container not found ("compute")" error

Additional context

Root cause analysis reveals that this error occurs because:

  1. The backup process attempts to execute pre-backup hooks on the virt-launcher pod
  2. During pod transitions or unstable states, the required 'compute' container may not be available
  3. The current plugin implementation doesn't validate pod stability before backup

The proposed solution involves:

  1. Adding pod state validation in both VM and VMI backup item action plugins
  2. Checking for:
    • Pod existence
    • Pod Running state
    • All containers being ready
  3. Skipping backup with appropriate logging when validation fails

Environment

  • KubeVirt version: v1.4.0
  • Kubernetes version: 1.27
  • Velero version: 1.14.1
  • kubevirt-velero-plugin version: v0.7.0

Impact

This enhancement would:

  1. Improve backup reliability
  2. Provide clearer feedback about skipped backups
  3. Prevent failed backup attempts for VMs in transition states
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant