Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make snapshot timeout configurable #5048

Closed
drewboswell opened this issue Jun 24, 2022 · 4 comments · Fixed by #5104
Closed

Make snapshot timeout configurable #5048

drewboswell opened this issue Jun 24, 2022 · 4 comments · Fixed by #5104
Assignees
Labels
1.10-candidate The label used for 1.10 planning discussion. Area/CSI Related to Container Storage Interface support
Milestone

Comments

@drewboswell
Copy link

Describe the problem/challenge you have
Scheduled velero backups on large volumes fail if the snapshot takes more than 10 minutes. This breaks all automation and requires manual interactions.

time="2022-06-24T16:06:37Z" level=info msg="Waiting for CSI driver to reconcile volumesnapshot xxxx-xxxxx/xxxx-xxxxx-wcpls. Retrying in 5s" backup=velero/xxxxx-xxxxx-xxxxxx-20220624160420 cmd=/plugins/velero-plugin-for-csi logSource="/go/src/velero-plugin-for-csi/internal/util/util.go:169" pluginName=velero-plugin-for-csi
time="2022-06-24T16:16:43Z" level=error msg="fail to wait VolumeSnapshot change to Ready: timed out waiting for the condition" backup=velero/xxxxx-xxxxxx-xxxxx-20220624160420 logSource="pkg/controller/backup_controller.go:643"

https://github.com/vmware-tanzu/velero-plugin-for-csi/blob/701adfefb17a57435f9c84030d0674c9e50b6105/internal/util/util.go#L157-L158

Describe the solution you'd like
Make the timeout configurable.

Environment:

  • Velero version
Client:
	Version: v1.9.0
	Git commit: 6021f148c4d7721285e815a3e1af761262bff029
Server:
	Version: v1.9.0
  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.8", GitCommit:"a12b886b1da059e0190c54d09c5eab5219dd7acf", GitTreeState:"clean", BuildDate:"2022-06-17T22:27:29Z", GoVersion:"go1.17.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.9-eks-a64ea69", GitCommit:"540410f9a2e24b7a2a870ebfacb3212744b5f878", GitTreeState:"clean", BuildDate:"2022-05-12T19:15:31Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes installer & version: Helm, v2.30.0
  • Cloud provider or hardware configuration: AWS EKS
  • OS : official docker image

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "The project would be better with this feature added"
  • 👎 for "This feature will not enhance the project in a meaningful way"
@qiuming-best qiuming-best added the Area/CSI Related to Container Storage Interface support label Jun 27, 2022
@blackpiglet
Copy link
Contributor

blackpiglet commented Jun 27, 2022

@drewboswell
How much data is store in the volume? Base on your environment, how much time would snapshotting take to complete?

Haven't meet such issue yet, and I'm not familiar with how AWS storage snapshot works. I used to assume that snapshotting time consumption has not so much relation with data amount, because snapshotting looks like instant on most cloud provider. Not sure about whether data transferring time is also included in it.

@blackpiglet
Copy link
Contributor

I found this about EBS snapshotting.
https://aws.amazon.com/premiumsupport/knowledge-center/ebs-snapshot-ec2-ami-creation-slow/?nc1=h_ls
Looks like data transferring time to S3 is also included in snapshotting time consumption, then I agree with making this parameter configurable.

@drewboswell
Copy link
Author

drewboswell commented Jul 2, 2022

@drewboswell How much data is store in the volume? Base on your environment, how much time would snapshotting take to complete?

Haven't meet such issue yet, and I'm not familiar with how AWS storage snapshot works. I used to assume that snapshotting time consumption has not so much relation with data amount, because snapshotting looks like instant on most cloud provider. Not sure about whether data transferring time is also included in it.

I'm doing volume snapshots of 500GB-3TB volumes, given the rate of data change one snapshot a day takes 10-30min currently. In these or more extreme cases it makes sense to externalise and tune this timeout config.

@drewboswell
Copy link
Author

The worst thing about this is that it times out to partially failed but the Aws volume snapshot actually completes. This means the backup was made but the reference is unusable from velero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.10-candidate The label used for 1.10 planning discussion. Area/CSI Related to Container Storage Interface support
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants