-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
external-snapshotter constantly retrying CreateSnapshot calls on erro… #651
Conversation
/skip I don't know this code. |
…r w/o backoff Signed-off-by: zhucan <[email protected]>
/retest |
/retest-required |
@zhucan Can you change the release note to the following?
|
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: xing-yang, zhucan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@xing-yang can we take a release of the external snapshotter with this fix included? |
I think this change has the unintended effect of not updating if the CSI driver returns success. I don't think that an update which removes the AnnVolumeSnapshotBeingCreated annotation in indicative of a failure. It can be removed even when the CSI driver CreateSnapshot does not return a failure: https://github.com/kubernetes-csi/external-snapshotter/blob/master/pkg/sidecar-controller/snapshot_controller.go#L352 This PR is currently breaking some tests for the PD CSI driver (which @amacaskill is planning to bring to testgrid). The effect is that the CSI controller does not get resynced after the AnnVolumeSnapshotBeingCreated annotation is removed, which makes the snapshotter update at the next resync period (default 15 minutes). This time period is longer than how long the test waits (5 minutes). It could also affect a production controller, if using the default resync period. |
@pwschuurman Do you see a 15 minutes delay for a VolumeSnapshot to be created successfully? Does this happen all the time? I wonder why CI didn't catch this. |
@xing-yang For PD CSI driver, volume snapshot typically takes <1 minute for a snapshot to be ready. However with this change, a call to syncContent won't happen until the next controller resync interval, which is 15 minutes by default. I chatted with @mattcary, and we think a short term fix would be to reintroduce the call to |
@pwschuurman That makes sense. Do you want to submit a fix? |
@xing-yang Yes, I can send out a PR to fix. |
…r w/o backoff
Signed-off-by: zhucan [email protected]
What type of PR is this?
/kind bug
What this PR does / why we need it:
external-snapshotter constantly retrying CreateSnapshot calls on error w/o backoff
Which issue(s) this PR fixes:
Fixes #533
Special notes for your reviewer:
@xing-yang
Does this PR introduce a user-facing change?: