-
Notifications
You must be signed in to change notification settings - Fork 46
Split backups by (resource owner, service) at CLI layer #1609
Conversation
Allow getting multiple backups by ID in a single call.
Needs extra support from selectors package to properly materialize the cartesian product of all resource owners and the options in the scope that contains the resource owners.
src/cli/backup/exchange.go
Outdated
// owners and whatever options may have been in the scope containing the | ||
// resource owners. | ||
for _, scope := range sel.DiscreteScopes(users) { | ||
for _, user := range scope.Get(selectors.ExchangeUser) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the plan for iterating over the resource owner here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this was mostly me attempting to get something working with what we already have. The issue I ran into with this patch is that there's no way to extract the other fields from the scope to make a selector. We avoid this problem in GC because we just iterate on the users and pass the full scope around. Here though we need to extract the user and then make a new selector that contains all the information in the current scope except with the users slice replaced by the user we have in the current iteration.
I think long-term it would be better to have some selector function that would return either scopes (to be given to Include
to make a new Selector
) or Selector
s that have the cartesian product of (user, other options)
for each scope. This could then replace both the call to DiscreteScopes
and iterating over users in a scope.
I don't feel I know enough about selectors to know how to implement something that would return the cartesian product in a clean manner though so if you could help out there I'd really appreciate it. Also, while I've said "users" here, it's really going to be resource owners because this same pattern will also be used in other services
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think long-term it would be better to have...
Yup. This complication is where I got the idea to slice the resourceOwners out of the scopes and elevate them to the selector itself. If we imagine a selector structure like:
{
ResourceOwners []string // probably filter, but for ease of mind
Includes []scope // eg: [{mailFolder, *}, {contactFolder, contactID}, ...]
Excludes []scope // similar to ^
Filters []scope // ^
}
then the cartesian product can be handled as, "for each resource owner (the backup boundary), this selection of scopes (the data boundary within the backup)". Which is a statement that more closely matches the design intent of corso usage than our current system.
To get the product, splitting selectors could be handled as sel.SplitByOwners(users) => []selectors
(one selector for each user). Or, maybe even more simply, sel.DiscreteOwners(users) => []string
(returns users if *, or the selected set of users) where the resource owner list/individual id gets passed to callers alongside the selector. Kinda like how GC does it down the line with scopes.
return the cartesian product in a clean manner
If we don't want to pull the above lift, I think the next best option is to add a func similar to discrete scopes that returns a slice of scopes (one per resource owner), rather than a single scope. Ex: SplitDiscreteScopes(user) => []scope
. It keeps the world messier when we think about what it means for a selector to contain multiple overlapping/conflicting scopes relative to their generated backups (eg: we'd split operations along scopes, which is more like a per-category tuple, and a single selector can list multiple category scopes with the same user meaning multiple backups for that user in a single selection). But it'll get us from A to B in a short time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This complication is where I got the idea to slice the resourceOwners out of the scopes and elevate them to the selector itself.
A bit more implementation specific, but in the interest of removing footguns and sticking to the new assumption that each backup will represent only a single (resource owner, service) pair I would highly recommend making whatever houses this information a different type than a selector.Selector
so that Backup cannot be called on something that has multiple resource owners
where the resource owner list/individual id gets passed to callers alongside the selector. Kinda like how GC does it down the line with scopes.
I'm not really a fan of separating out this information. GC may split it out, but I think it would require a fair amount of busy work to pull that information out through the whole stack and I don't really think it would get us anything extra. I'd prefer to stick to some thing like
- make some "super selector" that contains parameters and a set of resource owners
- somehow generate cartesian product of (resource owner, parameter) as a
[]selector.Selector
- call backup for each
Selector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a different type
type Boundary struct {
Service enum
ResourceOwner string
Includes []scope
Excludes []scope
Filters []scope
}
=>
sel := selectors.NewExchangeBackup(users)
sel.Includes(Mail([]string{"inbox"}, Any))
for _, b := range sel.ResourceBounds() {
repo.NewBackup(b)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks reasonable as long as sel
in the example you gave is of type Boundary
.
Is this something you can implement? I'm still not confident I can navigate the selectors package to add behavior easily 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as long as sel in the example you gave is of type Boundary
The sel
in example contains the set of resource owners
. The boundary
exhibits a single resource owner. Names can be swapped as wanted, important part is that a repo.NewBackup
would require the struct that is specific to an owner.
Is this something you can implement?
Sure can. I'll make a ticket for it. However, I don't want to block you on this work. What will be helpful in the meantime is for you to continue with the tools at hand (basically, keep on with this PRs design, discreteScopes and all). It'll be ugly for a minute, and then we'll tidy it up again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
continue with the tools at hand
I think I'm kind of stuck at the moment with the current approach. I don't know of a way to easily convert a scope with many resource owners to a selector for a single resource owner. I don't know how to get the other properties in the scope (excluding resource owners). If I could do that then I could do
- make selector
- get discrete scopes
- for each resource owner in each scope:
- get other scope properties
- make new selector for resource owner
- run backup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ticket in question: #1617
I'm kind of stuck at the moment
I'd suggest the following:
- Introduce a
scope.Clone()
method. Basic "for k, v in map1; map2[k] = v". - Export the
set(cat, val, ops)
scope method. - Use the clone+set to build new single-scope selectors.
For the time being, that'll cause a backup like --user u1 --data email,contacts
to produce two backups, even though it could have only done one. Unhappiness like that aside, would that unblock you?
Split backups for remaining services based on resource owner.
## Description Helper function to fetch backup models by ID. Also renames `Backups` to `BackupsByTag` ## Type of change - [x] 🌻 Feature - [ ] 🐛 Bugfix - [ ] 🗺️ Documentation - [ ] 🤖 Test - [ ] 💻 CI/Deployment - [ ] 🐹 Trivial/Minor ## Issue(s) * #1505 separated from * #1609 ## Test Plan <!-- How will this be tested prior to merging.--> - [ ] 💪 Manual - [x] ⚡ Unit test - [ ] 💚 E2E
Output from local run with changes:
|
@ryanfkeepers can you take another look at this PR and see if it's ready to merge?
I'm not sure if the above should be addressed in this PR or different ones that can be further down the road. Any preference? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One bug and a couple typos. But you're approved, since they're simple fixes.
I'm not sure if the above should be addressed in this PR or different ones
Definitely different PRs. Resource retrieval, graph connection verification, and input validation all have investigation to complete that extends beyond the scope of backup splits/incrementals.
Aviator status
This PR was merged using Aviator. |
Kudos, SonarCloud Quality Gate passed!
|
Description
Sketch of how to split backups by (resource owner, service) for exchange. Needs more support from selectors package to be complete.
Type of change
Issue(s)
Test Plan