Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expired backup that fail to delete should be easy to select. #4728

Closed
kaovilai opened this issue Mar 8, 2022 · 7 comments
Closed

Expired backup that fail to delete should be easy to select. #4728

kaovilai opened this issue Mar 8, 2022 · 7 comments
Assignees
Labels

Comments

@kaovilai
Copy link
Member

kaovilai commented Mar 8, 2022

Describe the problem/challenge you have
Allow user to use label to filter failed garbage collected backups
Either as a new default behavior, or as a new feature flag, there should be an option for Velero to garbage collect TTL expired backups even if it failed validation for reasons such as invalid BSL.

Similar to #4483 but instead of ObjectStorageSync, it's the behavior of garbage collection controller

Describe the solution you'd like
Add label to backups when garbage collector errors out
New default behavior or new feature flag

Anything else you would like to add:
It would help with hanging failed validation Backups from "dirtying" the logs since we don't care if it failed validation if it expired anyway.

Environment:

  • Velero version (use velero version):
  • Kubernetes version (use kubectl version):
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "The project would be better with this feature added"
  • 👎 for "This feature will not enhance the project in a meaningful way"
@reasonerjt
Copy link
Contributor

Making it a default may cause issues that the backup CR is removed but the actual data remained in the object storage, and the user will not know it from velero.
Similarly, if this is exposed as a configuration, the user may set the configuration without fulling aware of the situation he/she may get into in the future.

Therefore, I hope we can control it based on the error type, i.e. we only force remove the backup when certain errors happen.

@kaovilai
Copy link
Member Author

kaovilai commented Mar 9, 2022

Makes sense.

The request I received was specifically around missing bsl error.

Maybe a list of ignoredErrors

@kaovilai
Copy link
Member Author

This is the concern raised from @birsanv for context.

The keyword is the user must have had enough time to investigate and debug during the time the resource was available.

The thinking for the above seems to be that a failed backup may not be in bucket at all and the failure is stored on the cluster so you can check what the error is.

This makes sense for the argument that FailedValidation backups should not be automatically synced so that the user can still investigate the issue.
I think that an expired backup in FailedValidation state is different. This backup is set for deletion by the user after a certain duration and this object has just expired.The object is not removed as soon as in not syncing, it is removed when the time is due. So the user must have had enough time to investigate and debug during the time the resource was available. If we never delete expired backups in FailedValidation state then the user is going to have to deal with left over resources that have to be deleted manually

I have this invalid backup - the storage location was set incorrectly
status:
expiration: '2022-03-04T16:52:08Z'. <<<<<<
formatVersion: 1.1.0
phase: FailedValidation <<<
validationErrors:

an existing backup storage location wasn't specified at backup creation
time and the server default 'default' doesn't exist. Please address this
issue (see velero backup-location -h for options) and create a new
backup. Error: BackupStorageLocation.velero.io "default" not found
This is what the log says
time="2022-03-04T19:20:08Z" level=info msg="Backup has expired" backup=openshift-adp/acm-validation-policy-schedule-20220303174043 controller=gc expiration="2022-03-04 16:51:52 +0000 UTC" logSource="pkg/controller/gc_controller.go:135"
time="2022-03-04T19:20:08Z" level=warning msg="Backup cannot be garbage-collected because backup storage location default does not exist" backup=openshift-adp/acm-validation-policy-schedule-20220303174043 controller=gc expiration="2022-03-04 16:51:52 +0000 UTC" logSource="pkg/controller/gc_controller.go:143"

@kaovilai
Copy link
Member Author

Our internal discussion have another idea to resolve by labelling instead. Enable user to use labelselectors to delete expired backups.

Then it's user's action, not our code.

Maybe allow them to

kubectl get backup -l velero.io/deleteFailReason=velero.io/deleteFailReason="BSLNotFound" -oname | kubectl delete backup

@birsanv
Copy link

birsanv commented Mar 21, 2022

that would work for me #4728 (comment)

@kaovilai kaovilai changed the title TTL: Expired Backup should be garbage collected even if they fail validation Expired backup that fail to delete should be easy to select. Mar 22, 2022
@stale
Copy link

stale bot commented Jun 4, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the staled label Jun 4, 2022
@kaovilai
Copy link
Member Author

kaovilai commented Jun 6, 2022

fixed with #4757

@kaovilai kaovilai closed this as completed Jun 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants