-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make in-progress backup/restore as failed when doing the reconcile #4833
Conversation
Can someone explain to me why we can't recover here? |
If I have to guess (without looking at the code), the plugin system only report success/failures of backupitemaction and velero does not have an in-progress queue saved, only a counter. A pod restart results in lost of in-progress queue and would requires a start from scratch which is equivalent to recreating another backup. |
@shawn-hurley I believe this came up on a prior discussion and as I understood it, there's a long-term fix needed -- Velero needs to be able to continue a backup or restore in-progress. That will be a significant refactoring. In the meantime, this PR provides a short-term fix/workaround -- this status update provides a more accurate status reporting on the status quo -- when the pod is bounced while a backup or restore is in progress, Velero will never finish it. It is, effectively, failed -- this PR makes that explicit, so that when a user looks at the current set of backups or restores, they won't see more than one in-progress item when one of them is not in progress (and never will be). |
@kaovilai It's not so much the plugin system as the controller itself hasn't been designed to resume a previously-started backup or restore. The whole backup/restore processing will need to be refactored to support this. Plugin actions only occur at a specific point prior to backing up or restoring a single item, so I don't think that's nearly as relevant as being able to pick up a backup/restore where we were -- a way of identifying backed-up or restored items on restart would be part of this. Actually, since we don't persist the backup to object storage until the end, for backups, adding resuming behavior may well be a matter of having to just restart the backup. For restores, we could probably check the backup/restore annotations on any existing resources prior to attempting to restore -- but since we already have logic around that (i.e. not restoring what's there), even for restores, "resuming" may just mean "starting over". That's probably something that needs its own design document to be drafted and reviewed apart from (and after) this PR is merged. |
@shawn-hurley @kaovilai Just like @sseago explained, this is a short-term fix. |
@reasonerjt The upload progress PR introduces a new phase |
This is better than keeping the backup/restore in "inprogress" state after an accidental restart of the velero process. I'll approve. |
2cf8f54
to
5b9b57a
Compare
@sseago @reasonerjt I have rebased the PR to fix the conflict, please take a look again |
5b9b57a
to
3ee0acd
Compare
Make in-progress backup/restore as failed when doing the reconcile to avoid hanging in in-progress status Signed-off-by: Wenkai Yin(尹文开) <[email protected]>
3ee0acd
to
dfc8656
Compare
@sseago I made a tiny change from your last review: set the |
Make in-progress backup/restore as failed when doing the reconcile to avoid hanging in in-progress status
Signed-off-by: Wenkai Yin(尹文开) [email protected]
Thank you for contributing to Velero!
Please add a summary of your change
Does your change fix a particular issue?
Fixes #(issue)
Please indicate you've done the following:
/kind changelog-not-required
as a comment on this pull request.site/content/docs/main
.