Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

Gracefully handles errors when archiving snapshots #34856

Merged
merged 1 commit into from
Jan 19, 2024

Conversation

brooksprumo
Copy link
Contributor

@brooksprumo brooksprumo commented Jan 19, 2024

Problem

When archiving a snapshot, inside SnapshotPackagerService, we call archive_snapshot_package(). This is not allowed to fail, so we use .expect(). However, the archiving can fail for many non-programmer-related reasons. For example, if the snapshot archive partition gets full, this will fail. And instead of a user-friendly error, we'll see a panic with information tailored for developers. Additionally, this panic prevents graceful shutdown, as the exit flag is not used.

Summary of Changes

Check the result from archive_snapshot_package(). If there's an error, log it, and then shutdown gracefully.

@brooksprumo brooksprumo self-assigned this Jan 19, 2024
@brooksprumo brooksprumo marked this pull request as ready for review January 19, 2024 15:04
@brooksprumo brooksprumo requested a review from steviez January 19, 2024 15:04
Copy link

codecov bot commented Jan 19, 2024

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (9d13244) 81.7% compared to head (dcc7c20) 81.7%.
Report is 8 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #34856   +/-   ##
=======================================
  Coverage    81.7%    81.7%           
=======================================
  Files         826      826           
  Lines      223114   223119    +5     
=======================================
+ Hits       182335   182419   +84     
+ Misses      40779    40700   -79     

Copy link
Contributor

@steviez steviez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - this kind of stuff is definitely helpful for parsing logs. I know some of the services you have added have pretty uniform starting/stopping/error logs, was thinking about extending that level of logging to other services.

Just the one nit, feel free to ignore, but I can give ya a quick second review if you decide to change it

.expect("failed to archive snapshot package");
);
if let Err(err) = result {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could do the if statement directly on the function call:

                        if let Err(err) = snapshot_utils::archive_snapshot_package(
                            &snapshot_package,
                            &snapshot_config.full_snapshot_archives_dir,
                            &snapshot_config.incremental_snapshot_archives_dir,
                            snapshot_config.maximum_full_snapshot_archives_to_retain,
                            snapshot_config.maximum_incremental_snapshot_archives_to_retain,
                        ) {
                            error!("Stopping SnapshotPackagerService! Fatal error while archiving snapshot package: {err}");
                            exit.store(true, Ordering::Relaxed);
                            break;                           
                        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! I actually prefer not this. My brain likes to see the call to archive_snapshot_package() separate; because I find it a bit harder to read when the part of the if before the braces ({}) is longer.

Clippy even has a lint for this: https://rust-lang.github.io/rust-clippy/master/index.html#/blocks_in_conditions, and I fixed a few of them, like here: #34630.

I do agree that this is all subjective, and others prefer the terse/inline-with-the-conditional version instead.

@brooksprumo brooksprumo merged commit 2b0b5ae into solana-labs:master Jan 19, 2024
35 checks passed
@brooksprumo brooksprumo deleted the snap/errors/sps branch January 19, 2024 16:15
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants