Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Split backups by (resource owner, service) at CLI layer #1609

Merged
merged 9 commits into from
Nov 30, 2022

Conversation

ashmrtn
Copy link
Contributor

@ashmrtn ashmrtn commented Nov 28, 2022

Description

Sketch of how to split backups by (resource owner, service) for exchange. Needs more support from selectors package to be complete.

Type of change

  • 🌻 Feature
  • 🐛 Bugfix
  • 🗺️ Documentation
  • 🤖 Test
  • 💻 CI/Deployment
  • 🐹 Trivial/Minor

Issue(s)

Test Plan

  • 💪 Manual
  • ⚡ Unit test
  • 💚 E2E

Allow getting multiple backups by ID in a single call.
Needs extra support from selectors package to properly materialize the
cartesian product of all resource owners and the options in the scope
that contains the resource owners.
@ashmrtn ashmrtn added the enhancement New feature or request label Nov 28, 2022
@ashmrtn ashmrtn requested a review from ryanfkeepers November 28, 2022 17:29
@ashmrtn ashmrtn temporarily deployed to Testing November 28, 2022 17:29 Inactive
@ashmrtn ashmrtn temporarily deployed to Testing November 28, 2022 17:29 Inactive
src/cli/backup/exchange.go Show resolved Hide resolved
src/cli/backup/exchange.go Show resolved Hide resolved
// owners and whatever options may have been in the scope containing the
// resource owners.
for _, scope := range sel.DiscreteScopes(users) {
for _, user := range scope.Get(selectors.ExchangeUser) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the plan for iterating over the resource owner here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this was mostly me attempting to get something working with what we already have. The issue I ran into with this patch is that there's no way to extract the other fields from the scope to make a selector. We avoid this problem in GC because we just iterate on the users and pass the full scope around. Here though we need to extract the user and then make a new selector that contains all the information in the current scope except with the users slice replaced by the user we have in the current iteration.

I think long-term it would be better to have some selector function that would return either scopes (to be given to Include to make a new Selector) or Selectors that have the cartesian product of (user, other options) for each scope. This could then replace both the call to DiscreteScopes and iterating over users in a scope.

I don't feel I know enough about selectors to know how to implement something that would return the cartesian product in a clean manner though so if you could help out there I'd really appreciate it. Also, while I've said "users" here, it's really going to be resource owners because this same pattern will also be used in other services

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think long-term it would be better to have...

Yup. This complication is where I got the idea to slice the resourceOwners out of the scopes and elevate them to the selector itself. If we imagine a selector structure like:

{
  ResourceOwners   []string     // probably filter, but for ease of mind
  Includes                 []scope     // eg: [{mailFolder, *}, {contactFolder, contactID}, ...]
  Excludes                []scope     // similar to ^
  Filters                     []scope     // ^
}

then the cartesian product can be handled as, "for each resource owner (the backup boundary), this selection of scopes (the data boundary within the backup)". Which is a statement that more closely matches the design intent of corso usage than our current system.

To get the product, splitting selectors could be handled as sel.SplitByOwners(users) => []selectors (one selector for each user). Or, maybe even more simply, sel.DiscreteOwners(users) => []string (returns users if *, or the selected set of users) where the resource owner list/individual id gets passed to callers alongside the selector. Kinda like how GC does it down the line with scopes.

return the cartesian product in a clean manner

If we don't want to pull the above lift, I think the next best option is to add a func similar to discrete scopes that returns a slice of scopes (one per resource owner), rather than a single scope. Ex: SplitDiscreteScopes(user) => []scope. It keeps the world messier when we think about what it means for a selector to contain multiple overlapping/conflicting scopes relative to their generated backups (eg: we'd split operations along scopes, which is more like a per-category tuple, and a single selector can list multiple category scopes with the same user meaning multiple backups for that user in a single selection). But it'll get us from A to B in a short time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This complication is where I got the idea to slice the resourceOwners out of the scopes and elevate them to the selector itself.

A bit more implementation specific, but in the interest of removing footguns and sticking to the new assumption that each backup will represent only a single (resource owner, service) pair I would highly recommend making whatever houses this information a different type than a selector.Selector so that Backup cannot be called on something that has multiple resource owners

where the resource owner list/individual id gets passed to callers alongside the selector. Kinda like how GC does it down the line with scopes.

I'm not really a fan of separating out this information. GC may split it out, but I think it would require a fair amount of busy work to pull that information out through the whole stack and I don't really think it would get us anything extra. I'd prefer to stick to some thing like

  1. make some "super selector" that contains parameters and a set of resource owners
  2. somehow generate cartesian product of (resource owner, parameter) as a []selector.Selector
  3. call backup for each Selector

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a different type

type Boundary struct {
  Service enum
  ResourceOwner string
  Includes []scope
  Excludes []scope
  Filters []scope
}

=>

sel := selectors.NewExchangeBackup(users)
sel.Includes(Mail([]string{"inbox"}, Any))
for _, b := range sel.ResourceBounds() {
  repo.NewBackup(b)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks reasonable as long as sel in the example you gave is of type Boundary.

Is this something you can implement? I'm still not confident I can navigate the selectors package to add behavior easily 😅

Copy link
Contributor

@ryanfkeepers ryanfkeepers Nov 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as long as sel in the example you gave is of type Boundary

The sel in example contains the set of resource owners. The boundary exhibits a single resource owner. Names can be swapped as wanted, important part is that a repo.NewBackup would require the struct that is specific to an owner.

Is this something you can implement?

Sure can. I'll make a ticket for it. However, I don't want to block you on this work. What will be helpful in the meantime is for you to continue with the tools at hand (basically, keep on with this PRs design, discreteScopes and all). It'll be ugly for a minute, and then we'll tidy it up again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continue with the tools at hand

I think I'm kind of stuck at the moment with the current approach. I don't know of a way to easily convert a scope with many resource owners to a selector for a single resource owner. I don't know how to get the other properties in the scope (excluding resource owners). If I could do that then I could do

  1. make selector
  2. get discrete scopes
  3. for each resource owner in each scope:
  4. get other scope properties
  5. make new selector for resource owner
  6. run backup

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ticket in question: #1617

I'm kind of stuck at the moment

I'd suggest the following:

  1. Introduce a scope.Clone() method. Basic "for k, v in map1; map2[k] = v".
  2. Export the set(cat, val, ops) scope method.
  3. Use the clone+set to build new single-scope selectors.

For the time being, that'll cause a backup like --user u1 --data email,contacts to produce two backups, even though it could have only done one. Unhappiness like that aside, would that unblock you?

src/pkg/repository/repository.go Outdated Show resolved Hide resolved
@ashmrtn ashmrtn mentioned this pull request Nov 29, 2022
9 tasks
@ashmrtn ashmrtn temporarily deployed to Testing November 29, 2022 23:31 Inactive
@ashmrtn ashmrtn temporarily deployed to Testing November 29, 2022 23:32 Inactive
@ashmrtn ashmrtn temporarily deployed to Testing November 29, 2022 23:32 Inactive
@ashmrtn ashmrtn temporarily deployed to Testing November 29, 2022 23:32 Inactive
@ashmrtn ashmrtn temporarily deployed to Testing November 29, 2022 23:32 Inactive
aviator-app bot pushed a commit that referenced this pull request Nov 30, 2022
## Description

Helper function to fetch backup models by ID. Also renames `Backups` to `BackupsByTag`

## Type of change

- [x] 🌻 Feature
- [ ] 🐛 Bugfix
- [ ] 🗺️ Documentation
- [ ] 🤖 Test
- [ ] 💻 CI/Deployment
- [ ] 🐹 Trivial/Minor

## Issue(s)

* #1505

separated from
* #1609 

## Test Plan

<!-- How will this be tested prior to merging.-->
- [ ] 💪 Manual
- [x] ⚡ Unit test
- [ ] 💚 E2E
@ashmrtn ashmrtn temporarily deployed to Testing November 30, 2022 18:27 Inactive
@ashmrtn ashmrtn temporarily deployed to Testing November 30, 2022 18:28 Inactive
@ashmrtn ashmrtn temporarily deployed to Testing November 30, 2022 18:28 Inactive
@ashmrtn ashmrtn temporarily deployed to Testing November 30, 2022 18:28 Inactive
@ashmrtn ashmrtn temporarily deployed to Testing November 30, 2022 18:28 Inactive
@ashmrtn
Copy link
Contributor Author

ashmrtn commented Nov 30, 2022

Output from local run with changes:

go run ./corso.go backup create exchange --user user1@***,user2@***
Connecting to repository:      9s done 
Connecting to M365:      0s done 
Discovering items to backup:      1s done 
∙ contacts - user1@***:      1s done 
Backing up data:      1s done 
Connecting to M365:      0s done 
Discovering items to backup:      1s done 
∙ contacts - user2@***:      0s done 
Backing up data:      1s done 
Connecting to M365:      0s done 
Discovering items to backup:      6s done 
∙ email - user1@***:      6s done 
Backing up data:     16s done 
Connecting to M365:      0s done 
Discovering items to backup:     12s done 
∙ email - user2@***:     12s done 
Backing up data:     45s done 
Connecting to M365:      0s done 
Discovering items to backup:      9s done 
∙ events - user1@***:      9s done 
Backing up data:     28s done 
Connecting to M365:      0s done 
Discovering items to backup:     12s done 
∙ events - user2@***:     12s done 
Backing up data:     38s done 
  Started At            ID                                    Status                Selectors                      
  2022-11-30T18:36:07Z  ead345ab-7d39-45fe-a335-9aa1200de15c  Completed (0 errors)  user1@***
  2022-11-30T18:36:12Z  fd083e84-c60a-498d-b5a4-e3ff6ab30c40  Completed (0 errors)  user2@***   
  2022-11-30T18:36:17Z  1cf29cf9-c182-47df-9977-f88e0a56a103  Completed (0 errors)  user1@***
  2022-11-30T18:36:43Z  6fc7a90e-552c-4011-99aa-d00c9d1ca6c5  Completed (0 errors)  user2@***   
  2022-11-30T18:37:45Z  37a21b95-aa64-438c-aa7a-e936282b1940  Completed (0 errors)  user1@***
  2022-11-30T18:38:26Z  ffd38772-849a-496f-9994-78cae6e82354  Completed (0 errors)  user2@***

@ashmrtn
Copy link
Contributor Author

ashmrtn commented Nov 30, 2022

@ryanfkeepers can you take another look at this PR and see if it's ready to merge?
Things it doesn't address:

  • fetching all sites/users multiple times for a backup. It will currently get them 1 + N times where N is the number of sites/users * categories being backed up
  • verifying inputs outside of GraphConnector. I was unsure where to put this so that the check would still work if someone was using Corso as an SDK. Putting it at the CLI layer means SDK users won't get those error messages

I'm not sure if the above should be addressed in this PR or different ones that can be further down the road. Any preference?

Copy link
Contributor

@ryanfkeepers ryanfkeepers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One bug and a couple typos. But you're approved, since they're simple fixes.

I'm not sure if the above should be addressed in this PR or different ones

Definitely different PRs. Resource retrieval, graph connection verification, and input validation all have investigation to complete that extends beyond the scope of backup splits/incrementals.

src/cli/backup/sharepoint.go Outdated Show resolved Hide resolved
src/cli/backup/onedrive.go Outdated Show resolved Hide resolved
src/cli/backup/onedrive.go Outdated Show resolved Hide resolved
src/cli/backup/sharepoint.go Outdated Show resolved Hide resolved
src/cli/backup/sharepoint.go Outdated Show resolved Hide resolved
@ashmrtn ashmrtn self-assigned this Nov 30, 2022
@ashmrtn ashmrtn temporarily deployed to Testing November 30, 2022 19:19 Inactive
@ashmrtn ashmrtn temporarily deployed to Testing November 30, 2022 19:19 Inactive
@aviator-app
Copy link
Contributor

aviator-app bot commented Nov 30, 2022

Aviator status

Aviator will automatically update this comment as the status of the PR changes.

This PR was merged using Aviator.

@ashmrtn ashmrtn temporarily deployed to Testing November 30, 2022 19:20 Inactive
@ashmrtn ashmrtn temporarily deployed to Testing November 30, 2022 19:20 Inactive
@ashmrtn ashmrtn temporarily deployed to Testing November 30, 2022 19:20 Inactive
@ashmrtn ashmrtn changed the title WIP: split backups by (resource owner, service) at CLI layer Split backups by (resource owner, service) at CLI layer Nov 30, 2022
@ashmrtn ashmrtn temporarily deployed to Testing November 30, 2022 20:53 Inactive
@ashmrtn ashmrtn temporarily deployed to Testing November 30, 2022 20:53 Inactive
@ashmrtn ashmrtn temporarily deployed to Testing November 30, 2022 20:54 Inactive
@ashmrtn ashmrtn temporarily deployed to Testing November 30, 2022 20:54 Inactive
@aviator-app aviator-app bot temporarily deployed to Testing November 30, 2022 21:18 Inactive
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 5 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@aviator-app aviator-app bot temporarily deployed to Testing November 30, 2022 21:18 Inactive
@aviator-app aviator-app bot temporarily deployed to Testing November 30, 2022 21:19 Inactive
@aviator-app aviator-app bot temporarily deployed to Testing November 30, 2022 21:19 Inactive
@aviator-app aviator-app bot temporarily deployed to Testing November 30, 2022 21:19 Inactive
@aviator-app aviator-app bot merged commit 27d0980 into main Nov 30, 2022
@aviator-app aviator-app bot deleted the 1505-split-backups-2 branch November 30, 2022 21:50
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request mergequeue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Split CLI backups of multiple (resource owner, service) to separate backups
2 participants