Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(snapshot): add support for snapshot recovery from S3 #1839

Merged

Conversation

andydunstall
Copy link
Contributor

@andydunstall andydunstall commented Sep 10, 2023

Fixes https://github.com/dragonflydb/controlplane/issues/227.

Adds support for loading snapshots from S3. The Helio S3 client still needs some work to make it more reliable, though this at least gets loading snapshots working.

AwsS3SnapshotStorage::OpenReadFile is the same as AwsS3SnapshotStorage::OpenWriteFile, so the main changes here are adding LoadFile and LoadFiles implementations for S3

To match the correct snapshot, unlike FileSnapshotStorage::LoadFile we can't use globs, so this instead uses a regex for the timestamp and shard number.

(Note looked at combining LoadFile and LoadFiles to avoid repeated calls to S3, but since the DEBUG command allows explicitly specifying the load path its easier to keep separate. Could cache but doesn't seem worth it for 1 extra call)

Also refactors logging and error handling so both LoadPath and LoadPaths return a GenericError then log the error in one place, rather than logging spread out among arbitrary functions which is hard to follow and easy to miss or forget to update. (Its a bit more verbose though I think its cleaner - can revert if prefered)

Successful load:

I20230911 09:23:18.520285 289591 snapshot_storage.cc:225] Load snapshot: Searching for snapshot in S3 path: s3://dev-andy-dfcloud-backups-us-east-1/backupa/
I20230911 09:23:19.355569 289590 server_family.cc:522] Loading s3://dev-andy-dfcloud-backups-us-east-1/backupa/dump-2023-09-11T09:23:16-summary.dfs
I20230911 09:34:00.932328 290444 snapshot_storage.cc:104] Load snapshot: Searching for snapshot in directory: "/home/andydunstall/projects/dragonfly/dragonfly/build-dbg"
I20230911 09:34:00.932595 290444 server_family.cc:522] Loading /home/andydunstall/projects/dragonfly/dragonfly/build-dbg/dump-2023-09-11T09:33:59-summary.dfs

Not found:

I20230911 09:23:57.472137 289655 snapshot_storage.cc:225] Load snapshot: Searching for snapshot in S3 path: s3://dev-andy-dfcloud-backups-us-east-1/backupb/
W20230911 09:23:57.864356 289654 server_family.cc:452] Load snapshot: No snapshot found

Error:

I20230911 09:24:22.830351 289682 snapshot_storage.cc:225] Load snapshot: Searching for snapshot in S3 path: s3://dev-andy-dfcloud-backups-us-east-1/backupb/
E20230911 09:24:23.364082 289681 server_family.cc:454] Failed to load snapshot: Connection refused: Couldn't list objects in S3 bucket

Tasks

  • Rebase latest DF and Helio S3 fixes
  • Test with different S3 paths and RDB
  • Improve logging and error handling when failing to load snapshots

@andydunstall andydunstall force-pushed the feature/add-load-snapshot-from-s3 branch from b84c68f to f3c02c6 Compare September 11, 2023 06:45
@andydunstall andydunstall marked this pull request as ready for review September 11, 2023 08:37
@royjacobson
Copy link
Contributor

Looks good. Do we have any testing strategy for this?

@andydunstall
Copy link
Contributor Author

andydunstall commented Sep 11, 2023

Do we have any testing strategy for this?

I haven't added tests as part of this PR though added a ticket to add before we make S3 snapshots widely available or use in control plane (https://github.com/dragonflydb/controlplane/issues/250)

Comment on lines 45 to 46
virtual std::string LoadPath(const std::string_view& dir, const std::string_view& dbfilename) = 0;
virtual io::Result<std::string, GenericError> LoadPath(const std::string_view& dir,
const std::string_view& dbfilename) = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small side note, its okay to pass string_views by value 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and please also use them instead of const string& below where its possible

Copy link
Contributor Author

@andydunstall andydunstall Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, out of interest why is string_view prefered to const std::string&? (doesn't that risk unintentionally modifying the underlying string?)

Copy link
Collaborator

@romange romange Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string_view as an argument in function is usually better because it accepts both strings and const char* without doing any allocations. It's like a decorator around an immutable string passed to a function. When you use const string& it accepts strings of course but if you pass "foobar" it will create a temporary string object during the call

Comment on lines 511 to 517
auto paths_result = snapshot_storage_->LoadPaths(load_path);
if (!paths_result) {
LOG(ERROR) << "Failed to load snapshot: " << paths_result.error().Format();

Promise<std::error_code> ec_promise;
ec_promise.set_value(paths_result.error());
return ec_promise.get_future();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a generic error, right? Unfortunately it can be the case, that if created just with a string, its error_code will be a default-value (not set), so the result of Load will actually return that no error happened. You do so for example with "Invalid S3 Path", so I assume its a possible bug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops - looks like the Load() returned error is currently ignored - though could update that to return a GenericError too?

dranikpg
dranikpg previously approved these changes Sep 13, 2023
Copy link
Contributor

@dranikpg dranikpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'd combine LoadPath & LoadPaths, but remember you commented its simpler to keep them split 🙂

@andydunstall
Copy link
Contributor Author

LGTM. I'd combine LoadPath & LoadPaths, but remember you commented its simpler to keep them split 🙂

Thanks for reviewing - yep kept separate just as debug command sets load path explicitly

Had to rebase - mind re-approving if you get a sec :)

@andydunstall andydunstall merged commit 02664ee into dragonflydb:main Sep 13, 2023
@andydunstall andydunstall deleted the feature/add-load-snapshot-from-s3 branch September 13, 2023 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants