Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload Progress Monitoring and Item Snapshotter basic support #4467

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions changelogs/unreleased/4467-dsmithuchida
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
Added Upload Progress Monitoring and ItemSnapshotters. This enables monitoring
of snapshots that need additional processing such as an upload to an object
store after the snapshot is taken. Backups are now split into two phases with
the new uploading phase happening after the main part of the backup is complete.
Once a backup has entered the uploading phase, another backup can be executed.
On restart of the server, any backups that are in an Uploading phase will
continue to be monitored.

Added metrics for Item Snapshots in a backup

item_snapshot_attempt_total - number of item snapshots attempted in a backup
item_snapshot_success_total - number of successful item snapshots in a backup
item_snapshot_failure_total - number of failed item snapshots in a backup

Success requires that both the snapshot and the upload succeeded. These stats
are set when the backup moves to a terminal phase.

Added upload-progress-check-interval flag to control how often snapshots are checked for upload progress - defaults to 1 minute

This feature is currently enabled with the EnableUploadProgress flag. It
also requires ItemSnapshot plugins.

Fixes #3756
Fixes #3757
Fixes #3759

10 changes: 10 additions & 0 deletions config/crd/v1/bases/velero.io_backups.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -435,12 +435,22 @@ spec:
description: FormatVersion is the backup format version, including
major, minor, and patch version.
type: string
itemSnapshotsAttempted:
description: ItemSnapshotsAttempted is the total number of attempted
item snapshots for this backup.
type: integer
itemSnapshotsCompleted:
description: ItemSnapshotsCompleted is the total number of successfully
completed item snapshots for this backup.
type: integer
phase:
description: Phase is the current state of the Backup.
enum:
- New
- FailedValidation
- InProgress
- Uploading
- UploadingPartialFailure
- Completed
- PartiallyFailed
- Failed
Expand Down
2 changes: 1 addition & 1 deletion config/crd/v1/crds/crds.go

Large diffs are not rendered by default.

Binary file modified design/UploadFSM.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
152 changes: 67 additions & 85 deletions design/upload-progress.md

Large diffs are not rendered by default.

74 changes: 56 additions & 18 deletions internal/delete/delete_item_action_handler.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,15 @@ limitations under the License.
package delete

import (
"context"
"encoding/json"
"io"

"github.com/vmware-tanzu/velero/pkg/features"
"github.com/vmware-tanzu/velero/pkg/plugin/clientmgmt"
isv1 "github.com/vmware-tanzu/velero/pkg/plugin/velero/item_snapshotter/v1"
"github.com/vmware-tanzu/velero/pkg/volume"

"github.com/vmware-tanzu/velero/pkg/plugin/framework"

"github.com/pkg/errors"
Expand All @@ -34,28 +41,24 @@ import (
"github.com/vmware-tanzu/velero/pkg/util/filesystem"
)

// Context provides the necessary environment to run DeleteItemAction plugins
// Context provides the necessary environment to run DeleteItemAction and ItemSnapshotter plugins
type Context struct {
Backup *velerov1api.Backup
BackupReader io.Reader
Actions []velero.DeleteItemAction
Filesystem filesystem.Interface
Log logrus.FieldLogger
DiscoveryHelper discovery.Helper
resolvedActions []framework.DeleteItemResolvedAction
Backup *velerov1api.Backup
BackupReader io.Reader
Filesystem filesystem.Interface
Log logrus.FieldLogger
DiscoveryHelper discovery.Helper
DeleteItemResolvedActions []framework.DeleteItemResolvedAction
ItemSnapshots []*volume.ItemSnapshot
ItemSnapshotters []isv1.ItemSnapshotter
}

func InvokeDeleteActions(ctx *Context) error {
var err error
resolver := framework.NewDeleteItemActionResolver(ctx.Actions)
ctx.resolvedActions, err = resolver.ResolveActions(ctx.DiscoveryHelper)
// No actions installed and no error means we don't have to continue;
// No actions installed and no item snapshots means we don't have to continue;
// just do the backup deletion without worrying about plugins.
if len(ctx.resolvedActions) == 0 && err == nil {
ctx.Log.Debug("No delete item actions present, proceeding with rest of backup deletion process")
if len(ctx.DeleteItemResolvedActions) == 0 && len(ctx.ItemSnapshots) == 0 {
ctx.Log.Debug("No delete item actions or item snapshots present, proceeding with rest of backup deletion process")
return nil
} else if err != nil {
return errors.Wrapf(err, "error resolving actions")
}

// get items out of backup tarball into a temp directory
Expand Down Expand Up @@ -121,19 +124,54 @@ func InvokeDeleteActions(ctx *Context) error {
// Since we want to keep looping even on errors, log them instead of just returning.
if err != nil {
itemLog.WithError(err).Error("plugin error")

}
}
}
}
}

if features.IsEnabled(velerov1api.UploadProgressFeatureFlag) {
ctx.Log.Info("Deleting item snapshots")
// Handle item snapshots
for _, snapshot := range ctx.ItemSnapshots {
rid := velero.ResourceIdentifier{}
json.Unmarshal([]byte(snapshot.Spec.ResourceIdentifier), &rid)
itemLogger := ctx.Log.WithFields(logrus.Fields{
"namespace": rid.Namespace,
"resource": rid.Resource,
"item": rid.Name,
"snapshotID": snapshot.Status.ProviderSnapshotID,
})
itemLogger.Info("Deleting item snapshot")

itemSnapshotter := clientmgmt.ItemSnapshotterForSnapshot(snapshot, ctx.ItemSnapshotters)

itemPath := archive.GetItemFilePath(dir, rid.Resource, rid.Namespace, rid.Name)

// obj is the Unstructured item from the backup
obj, err := archive.Unmarshal(ctx.Filesystem, itemPath)
if err != nil {
itemLogger.WithError(err).Errorf("Could not unmarshal item: %v", rid)
continue
}
dsi := isv1.DeleteSnapshotInput{
SnapshotID: snapshot.Status.ProviderSnapshotID,
ItemFromBackup: obj,
SnapshotMetadata: snapshot.Status.Metadata,
Params: nil, // TBD
}
if err := itemSnapshotter.DeleteSnapshot(context.TODO(), &dsi); err != nil {
itemLogger.WithError(err).Error("Error deleting snapshot")
}
}
}
return nil
}

// getApplicableActions takes resolved DeleteItemActions and filters them for a given group/resource and namespace.
func (ctx *Context) getApplicableActions(groupResource schema.GroupResource, namespace string) []framework.DeleteItemResolvedAction {
var actions []framework.DeleteItemResolvedAction
for _, action := range ctx.resolvedActions {
for _, action := range ctx.DeleteItemResolvedActions {
if action.ShouldUse(groupResource, namespace, nil, ctx.Log) {
actions = append(actions, action)
}
Expand Down
20 changes: 13 additions & 7 deletions internal/delete/delete_item_action_handler_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ import (
"sort"
"testing"

"github.com/vmware-tanzu/velero/pkg/plugin/framework"

"github.com/sirupsen/logrus"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
Expand Down Expand Up @@ -165,16 +167,20 @@ func TestInvokeDeleteItemActionsRunForCorrectItems(t *testing.T) {
actions = append(actions, action)
}

deleteItemActionResolver := framework.NewDeleteItemActionResolver(actions)
deleteItemResolvedActions, err := deleteItemActionResolver.ResolveActions(h.discoveryHelper)
require.NoError(t, err)

c := &Context{
Backup: tc.backup,
BackupReader: tc.tarball,
Filesystem: fs,
DiscoveryHelper: h.discoveryHelper,
Actions: actions,
Log: log,
Backup: tc.backup,
BackupReader: tc.tarball,
Filesystem: fs,
DiscoveryHelper: h.discoveryHelper,
DeleteItemResolvedActions: deleteItemResolvedActions,
Log: log,
}

err := InvokeDeleteActions(c)
err = InvokeDeleteActions(c)
require.NoError(t, err)

// Compare the plugins against the ids that we wanted.
Expand Down
12 changes: 11 additions & 1 deletion pkg/apis/velero/v1/backup.go
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,7 @@ const (

// BackupPhase is a string representation of the lifecycle phase
// of a Velero backup.
// +kubebuilder:validation:Enum=New;FailedValidation;InProgress;Completed;PartiallyFailed;Failed;Deleting
// +kubebuilder:validation:Enum=New;FailedValidation;InProgress;Uploading;UploadingPartialFailure;Completed;PartiallyFailed;Failed;Deleting
type BackupPhase string

const (
Expand Down Expand Up @@ -305,6 +305,16 @@ type BackupStatus struct {
// +optional
FailureReason string `json:"failureReason,omitempty"`

// ItemSnapshotsAttempted is the total number of attempted
// item snapshots for this backup.
// +optional
ItemSnapshotsAttempted int `json:"itemSnapshotsAttempted,omitempty"`

// ItemSnapshotsCompleted is the total number of successfully
// completed item snapshots for this backup.
// +optional
ItemSnapshotsCompleted int `json:"itemSnapshotsCompleted,omitempty"`

// Warnings is a count of all warning messages that were generated during
// execution of the backup. The actual warnings are in the backup's log
// file in object storage.
Expand Down
Loading