Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Split backups by (resource owner, service) at CLI layer #1609

Merged
merged 9 commits into from
Nov 30, 2022
60 changes: 51 additions & 9 deletions src/cli/backup/exchange.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package backup
import (
"context"

"github.com/hashicorp/go-multierror"
"github.com/pkg/errors"
"github.com/spf13/cobra"
"github.com/spf13/pflag"
Expand All @@ -19,6 +20,7 @@ import (
"github.com/alcionai/corso/src/pkg/path"
"github.com/alcionai/corso/src/pkg/repository"
"github.com/alcionai/corso/src/pkg/selectors"
"github.com/alcionai/corso/src/pkg/services/m365"
"github.com/alcionai/corso/src/pkg/store"
)

Expand Down Expand Up @@ -270,27 +272,67 @@ func createExchangeCmd(cmd *cobra.Command, args []string) error {

sel := exchangeBackupCreateSelectors(user, exchangeData)

bo, err := r.NewBackup(ctx, sel)
// TODO(ashmrtn): Only call in the future if "*" or some other wildcard was
// given for resource owners.
users, err := m365.UserIDs(ctx, acct)
ryanfkeepers marked this conversation as resolved.
Show resolved Hide resolved
if err != nil {
return Only(ctx, errors.Wrap(err, "Failed to initialize Exchange backup"))
return Only(ctx, errors.Wrap(err, "Failed to retrieve M365 users"))
}

err = bo.Run(ctx)
if err != nil {
return Only(ctx, errors.Wrap(err, "Failed to run Exchange backup"))
var (
errs *multierror.Error
bIDs []model.StableID
)

// TODO(ashmrtn): Update to range over the cartesian product of resource
// owners and whatever options may have been in the scope containing the
// resource owners.
for _, scope := range sel.DiscreteScopes(users) {
ryanfkeepers marked this conversation as resolved.
Show resolved Hide resolved
for _, user := range scope.Get(selectors.ExchangeUser) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the plan for iterating over the resource owner here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this was mostly me attempting to get something working with what we already have. The issue I ran into with this patch is that there's no way to extract the other fields from the scope to make a selector. We avoid this problem in GC because we just iterate on the users and pass the full scope around. Here though we need to extract the user and then make a new selector that contains all the information in the current scope except with the users slice replaced by the user we have in the current iteration.

I think long-term it would be better to have some selector function that would return either scopes (to be given to Include to make a new Selector) or Selectors that have the cartesian product of (user, other options) for each scope. This could then replace both the call to DiscreteScopes and iterating over users in a scope.

I don't feel I know enough about selectors to know how to implement something that would return the cartesian product in a clean manner though so if you could help out there I'd really appreciate it. Also, while I've said "users" here, it's really going to be resource owners because this same pattern will also be used in other services

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think long-term it would be better to have...

Yup. This complication is where I got the idea to slice the resourceOwners out of the scopes and elevate them to the selector itself. If we imagine a selector structure like:

{
  ResourceOwners   []string     // probably filter, but for ease of mind
  Includes                 []scope     // eg: [{mailFolder, *}, {contactFolder, contactID}, ...]
  Excludes                []scope     // similar to ^
  Filters                     []scope     // ^
}

then the cartesian product can be handled as, "for each resource owner (the backup boundary), this selection of scopes (the data boundary within the backup)". Which is a statement that more closely matches the design intent of corso usage than our current system.

To get the product, splitting selectors could be handled as sel.SplitByOwners(users) => []selectors (one selector for each user). Or, maybe even more simply, sel.DiscreteOwners(users) => []string (returns users if *, or the selected set of users) where the resource owner list/individual id gets passed to callers alongside the selector. Kinda like how GC does it down the line with scopes.

return the cartesian product in a clean manner

If we don't want to pull the above lift, I think the next best option is to add a func similar to discrete scopes that returns a slice of scopes (one per resource owner), rather than a single scope. Ex: SplitDiscreteScopes(user) => []scope. It keeps the world messier when we think about what it means for a selector to contain multiple overlapping/conflicting scopes relative to their generated backups (eg: we'd split operations along scopes, which is more like a per-category tuple, and a single selector can list multiple category scopes with the same user meaning multiple backups for that user in a single selection). But it'll get us from A to B in a short time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This complication is where I got the idea to slice the resourceOwners out of the scopes and elevate them to the selector itself.

A bit more implementation specific, but in the interest of removing footguns and sticking to the new assumption that each backup will represent only a single (resource owner, service) pair I would highly recommend making whatever houses this information a different type than a selector.Selector so that Backup cannot be called on something that has multiple resource owners

where the resource owner list/individual id gets passed to callers alongside the selector. Kinda like how GC does it down the line with scopes.

I'm not really a fan of separating out this information. GC may split it out, but I think it would require a fair amount of busy work to pull that information out through the whole stack and I don't really think it would get us anything extra. I'd prefer to stick to some thing like

  1. make some "super selector" that contains parameters and a set of resource owners
  2. somehow generate cartesian product of (resource owner, parameter) as a []selector.Selector
  3. call backup for each Selector

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a different type

type Boundary struct {
  Service enum
  ResourceOwner string
  Includes []scope
  Excludes []scope
  Filters []scope
}

=>

sel := selectors.NewExchangeBackup(users)
sel.Includes(Mail([]string{"inbox"}, Any))
for _, b := range sel.ResourceBounds() {
  repo.NewBackup(b)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks reasonable as long as sel in the example you gave is of type Boundary.

Is this something you can implement? I'm still not confident I can navigate the selectors package to add behavior easily πŸ˜…

Copy link
Contributor

@ryanfkeepers ryanfkeepers Nov 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as long as sel in the example you gave is of type Boundary

The sel in example contains the set of resource owners. The boundary exhibits a single resource owner. Names can be swapped as wanted, important part is that a repo.NewBackup would require the struct that is specific to an owner.

Is this something you can implement?

Sure can. I'll make a ticket for it. However, I don't want to block you on this work. What will be helpful in the meantime is for you to continue with the tools at hand (basically, keep on with this PRs design, discreteScopes and all). It'll be ugly for a minute, and then we'll tidy it up again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continue with the tools at hand

I think I'm kind of stuck at the moment with the current approach. I don't know of a way to easily convert a scope with many resource owners to a selector for a single resource owner. I don't know how to get the other properties in the scope (excluding resource owners). If I could do that then I could do

  1. make selector
  2. get discrete scopes
  3. for each resource owner in each scope:
  4. get other scope properties
  5. make new selector for resource owner
  6. run backup

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ticket in question: #1617

I'm kind of stuck at the moment

I'd suggest the following:

  1. Introduce a scope.Clone() method. Basic "for k, v in map1; map2[k] = v".
  2. Export the set(cat, val, ops) scope method.
  3. Use the clone+set to build new single-scope selectors.

For the time being, that'll cause a backup like --user u1 --data email,contacts to produce two backups, even though it could have only done one. Unhappiness like that aside, would that unblock you?

opSel := selectors.NewExchangeBackup()
opSel.Include([]selectors.ExchangeScope{scope})

bo, err := r.NewBackup(ctx, opSel.Selector)
if err != nil {
errs = multierror.Append(errs, errors.Wrapf(
err,
"Failed to initialize Exchange backup for user %s",
scope.Get(selectors.ExchangeUser),
))

continue
}

err = bo.Run(ctx)
if err != nil {
errs = multierror.Append(errs, errors.Wrapf(
err,
"Failed to run Exchange backup for user %s",
scope.Get(selectors.ExchangeUser),
))

continue
}

bIDs = append(bIDs, bo.Results.BackupID)
}
}

bu, err := r.Backup(ctx, bo.Results.BackupID)
bups, err := r.BackupsByID(ctx, bIDs)
if err != nil {
return Only(ctx, errors.Wrap(err, "Unable to retrieve backup results from storage"))
}

bu.Print(ctx)
backup.PrintAll(ctx, bups)

if e := errs.ErrorOrNil(); e != nil {
return Only(ctx, e)
}

return nil
}

func exchangeBackupCreateSelectors(userIDs, data []string) selectors.Selector {
func exchangeBackupCreateSelectors(userIDs, data []string) *selectors.ExchangeBackup {
sel := selectors.NewExchangeBackup()

if len(data) == 0 {
Expand All @@ -310,7 +352,7 @@ func exchangeBackupCreateSelectors(userIDs, data []string) selectors.Selector {
}
}

return sel.Selector
return sel
}

func validateExchangeBackupCreateFlags(userIDs, data []string) error {
Expand Down
4 changes: 4 additions & 0 deletions src/cli/utils/testdata/opts.go
Original file line number Diff line number Diff line change
Expand Up @@ -408,6 +408,10 @@ func (MockBackupGetter) Backup(
return nil, errors.New("unexpected call to mock")
}

func (MockBackupGetter) BackupsByID(context.Context, []model.StableID) ([]*backup.Backup, error) {
return nil, errors.New("unexpected call to mock")
}

func (MockBackupGetter) Backups(context.Context, ...store.FilterOption) ([]*backup.Backup, error) {
return nil, errors.New("unexpected call to mock")
}
Expand Down
23 changes: 23 additions & 0 deletions src/pkg/repository/repository.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import (
"time"

"github.com/google/uuid"
"github.com/hashicorp/go-multierror"

"github.com/alcionai/corso/src/internal/events"
"github.com/alcionai/corso/src/internal/kopia"
Expand All @@ -28,6 +29,7 @@ var ErrorRepoAlreadyExists = errors.New("a repository was already initialized wi
// repository.
type BackupGetter interface {
Backup(ctx context.Context, id model.StableID) (*backup.Backup, error)
BackupsByID(ctx context.Context, ids []model.StableID) ([]*backup.Backup, error)
Backups(ctx context.Context, fs ...store.FilterOption) ([]*backup.Backup, error)
BackupDetails(
ctx context.Context,
Expand Down Expand Up @@ -238,6 +240,27 @@ func (r repository) Backup(ctx context.Context, id model.StableID) (*backup.Back
return sw.GetBackup(ctx, id)
}

// BackupsByID lists backups by ID. Returns as many backups as possible with
// errors for the backups it was unable to retrieve.
func (r repository) BackupsByID(ctx context.Context, ids []model.StableID) ([]*backup.Backup, error) {
ryanfkeepers marked this conversation as resolved.
Show resolved Hide resolved
var (
errs *multierror.Error
bups []*backup.Backup
sw = store.NewKopiaStore(r.modelStore)
)

for _, id := range ids {
b, err := sw.GetBackup(ctx, id)
if err != nil {
errs = multierror.Append(errs, err)
}

bups = append(bups, b)
}

return bups, errs.ErrorOrNil()
}

// backups lists backups in a repository
func (r repository) Backups(ctx context.Context, fs ...store.FilterOption) ([]*backup.Backup, error) {
sw := store.NewKopiaStore(r.modelStore)
Expand Down