-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a BSL controller to handle validation + update BSL status phase #2490
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the whole this looks good! The main question I have right now is around resync periods.
Signed-off-by: Carlisia <[email protected]>
Signed-off-by: Carlisia <[email protected]>
Signed-off-by: Carlisia <[email protected]>
Signed-off-by: Carlisia <[email protected]>
I added:
Much in line with what's in the I also added this, for the state between creation and validation:
However, although I have verified this new phase was included in the CRD, it does not mark it as the default. Not sure if I'm doing this wrong: After this PR is merged, I'm going to add these columns ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to look at the body of the controller in a little more detail, but so far so good!
We also need to think about how other controllers/packages should incorporate this information. For example, AFAIK, even if a BSL is marked "Unavailable", the backup controller will still try to process backups targeting that location. Those backups will theoretically fail somewhere along the line, at least at the upload step, but ideally we'd check up-front if the BSL is valid or not, and fail validation if it's invalid.
This brings up another nuance, which is - if we add a validation check in the backup controller to ensure the BSL is "Available" - should we rely on the CR field, or should we actually try to connect to the object store (e.g. via the backupStore.IsValid()
method)? Relying on the CR field is easier/more efficient, but it could theoretically be out-of-date.
newBackupStore: persistence.NewObjectBackupStore, | ||
} | ||
|
||
c.resyncFunc = c.run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in addition to the periodic validation via this resync func, we should also add OnAdd/OnUpdate event handlers to the backup storage location informer so that whenever a BSL is created or updated, it immediately triggers a validation. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good idea.
Would we need both the handlers and the BSL validation though?
I'm thinking add the validation to the handlers (create/update), and keep the BSL looping and checking the CR field just so it does the proper periodic logging. Would this work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This brings up another nuance, which is - if we add a validation check in the backup controller to ensure the BSL is "Available" - should we rely on the CR field, or should we actually try to connect to the object store (e.g. via the backupStore.IsValid() method)? Relying on the CR field is easier/more efficient, but it could theoretically be out-of-date.
If we have the handler validating right on update
then it will never be out-of-date. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to do the validation both when a BSL is added/updated, and periodically (via the resync func). That way, we immediately get an updated phase when the BSL is created or changed, but we are also regularly checking to ensure we can still connect to the object store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could still be out-of-date: the minio instance goes down, the S3 credentials are expired, the bucket is deleted, etc. None of those would involve a change to the BSL, but could still mean that it can't be connected to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point!
Ok recapping, I will:
This question pertains to item # 3. My thinking below, taking into account @nrb's comment:
I think doing the validation on create/update + periodically could be good enough and checking for the BSL status only (as opposed to the store is valid) at backup creation is the way to go. Since there are potential costs associated with it, we should aim for less connections to the storage rather than more. Another factor is that each BSL can have its validation period decreased for more frequent checks: https://github.com/vmware-tanzu/velero/pull/2490/files#diff-0528e5cf4cf25ec346aa95f0bd1b2fd4R79 so the user has this level of control. WDYT? We could go with this and add it to the backup creation later if we find out we need it. Updated from the original |
That all sounds reasonable to me! Just a couple clarifications/related comments:
|
Oh ok, I was putting the handlers somewhere else, haha. Great heads up all around. I will comb the controllers. Thanks! |
pkg/cmd/cli/backuplocation/create.go
Outdated
Bucket string | ||
Prefix string | ||
BackupSyncPeriod time.Duration | ||
StoreValidationPeriod time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would validationFrequency
be a better name for this field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, yes indeed!
pkg/cmd/cli/backuplocation/create.go
Outdated
@@ -107,6 +109,10 @@ func (o *CreateOptions) Validate(c *cobra.Command, args []string, f client.Facto | |||
return errors.New("--backup-sync-period must be non-negative") | |||
} | |||
|
|||
if o.StoreValidationPeriod < 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we set this to default and continue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. What if the user for some reason expects that value to take, and we give them another value. Could this be possible?
newPluginManager func(logrus.FieldLogger) clientmgmt.Manager, | ||
logger logrus.FieldLogger, | ||
) Interface { | ||
if defaultStoreValidationPeriod <= 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if defaultStoreValidationPeriod == 0
doesn't that mean validation is disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes so, this is confusing because of two flags with the same name. Mainly I set this up following the backup-sync-controller. We can remove this or change the condition here.
This right here is a flag on the server. It is set so that IF
location.Spec.StoreValidationPeriod
further below is not set, we'll use this value. And so, if there wasn't a value passed thru the server either, it is set to 1min
.
We could change this to match the BSL flag, in other words, if <=0
then don't ever validate. But I don't think we want to never validate. And I know that's not what you are saying either. Since we do have to validate at some interval, we might as well make it configurable thru the server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addendum: *we don't want to never validate a BSL >unless< the user specifically configures it that way.
for _, location := range locations { | ||
locationName := location.Name | ||
log = c.logger.WithField("backupLocation", locationName) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is location.Spec.StoreValidationPeriod == nil
also a condition where we should skip validation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Only a specific value of zero is meant to skip validation. Any other value will set the frequency to that value (obvs), and nil
means they didn't set it, which then the frequency will be set to the server value, either one set by the user, or the default. Pls be sure to see my comment up above.
Signed-off-by: Carlisia <[email protected]>
At any rate.. relying on I'm going to add a method to return all BSLs that are in each phase, or maybe just the count for each, so this logging logic will be much cleaner and the rules clearer. It might be something we'll want to add to the server status anyway, the listing of each existing BSL, grouped by status/phase. |
Signed-off-by: Carlisia <[email protected]>
@carlisia the kubebuilder convo reminded me that i do have a couple recommendations on how to structure this that (a) I think would make a potential migration to kubebuilder easier, and (b) would make testing easier, regardless of whether we move to kubebuilder or not. I'm not going to have time to write detailed notes until next week though, but here's a quick summary: The main idea would be to keep the core "business logic" out of the controller package/struct, and extract it to its own package with a clearly defined interface. the controller should just handle responding to informer events, putting things on a work queue, taking things off the work queue, and then calling into the "business logic" package to actually handle the processing. This would be similar to (a) the server status request controller; and (b) the work I was doing in #2317. Having all the business logic in the controller package and structs IMHO is not the pattern we want to be following going fwd. |
Yes, I started thinking this very thing when I was thinking about the webhooks and, regardless, you're right that the structure needs to have the logic not in the controller. Glad for the nudge. |
Moved it to Draft because likely will be layering this work on top of #2561. |
Closing this PR. I'll be redoing this in a new PR using the kubebuilder/controller-runtime framekwork as per #2597. |
available
(valid) orunavailable
(invalid), and logs accordingly.Signed-off-by: Carlisia [email protected]