Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restic restore stuck InProcess while restoring PV with "volumeBindingMode: WaitForFirstConsumer" storage class #2971

Open
invidian opened this issue Sep 24, 2020 · 3 comments
Labels
Bug Needs investigation Restic - GA needed for restic integration to be considered GA Restic Relates to the restic integration Reviewed Q2 2021

Comments

@invidian
Copy link
Contributor

What steps did you take and what happened:

  1. Create storage class with volumeBindingMode: Immediate.
  2. Create sample workload with PV using this storage class.
  3. Write some data to it.
  4. Take a backup.
  5. Remove namespace with sample workload.
  6. Restore a backup.

What's happening then:

  • Restore object is stuck forever (or very long) in InProgress state
  • workload starts, but no init container gets injected, so no data has been restored.
  • Velero logs says restic restore action has run.
  • PodVolumeRestore object gets created, but it's state is never updated.
  • Restic logs does not say anything on info log level.

What did you expect to happen:
Data in PV should be restored from backup.

The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)

  • kubectl logs deployment/velero -n velero
  • velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
  • velero backup logs <backupname>
  • velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
  • velero restore logs <restorename>

Anything else you would like to add:

As a workaround, one can create a copy of used storageclass and use https://velero.io/docs/v1.5/restore-reference/#changing-pvpvc-storage-classes feature to use modified one, which has volumeBindingMode: Immediate.

Volumes are being provisioned using https://github.com/hetznercloud/csi-driver.

Restoring volumes with volumeBindingMode: Immediate works well.

Environment:

  • Velero version (use velero version): Tried 1.4.2 and 1.5.1 with the same result
  • Velero features (use velero client config get features): features: <NOT SET>
  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"archive", BuildDate:"2020-09-18T18:46:38Z", 
    GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", BuildDate:"2020-09-16T13:32:58Z", 
    GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}
    
  • Kubernetes installer & version: Flexkube v0.4.3
  • Cloud provider or hardware configuration: hcloud
  • OS (e.g. from /etc/os-release): Flatcar stable

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@nrb nrb changed the title Restore stuck InProcess while restoring PV with "volumeBindingMode: WaitForFirstConsumer" storage class Restic restore stuck InProcess while restoring PV with "volumeBindingMode: WaitForFirstConsumer" storage class Sep 24, 2020
@nrb
Copy link
Contributor

nrb commented Sep 24, 2020

Thanks for this report. I think I see the issue, and it's an order of operations one - Velero is trying to recreate the PV, PVC, and Pod (in that order), but when in a WaitForFirstConsumer binding mode, this isn't sufficient.

I'm going to log this as a high priority bug, because it's not a unique use case, but I don't have an answer for it at the moment.

@nrb nrb added Bug Restic Relates to the restic integration Restic - GA needed for restic integration to be considered GA labels Sep 24, 2020
@nrb nrb added this to the v1.6.0 milestone Sep 24, 2020
@dsu-igeek
Copy link
Contributor

We've been looking at this as well for other use cases. A long term solution would be to use the proposed Data Populators (kubernetes/enhancements#1495) but this will require changes in how Restic is handled.

@Elias-elastisys
Copy link

Is there any updates to this? This bug is still in version 1.13.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs investigation Restic - GA needed for restic integration to be considered GA Restic Relates to the restic integration Reviewed Q2 2021
Projects
None yet
Development

No branches or pull requests

5 participants