-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
snapshot restore-from-archive streaming and filtering #13658
Conversation
The `RestoreFromArchive` helper decompresses the snapshot archive to a temporary file before reading it into the FSM. For large snapshots this performs a lot of disk IO. Stream decompress the snapshot as we read it, without first writing to a temporary file.
The operator can pass these as `-filter` arguments to `nomad operator snapshot state` (and other commands in the future) to include only desired data when reading the snapshot.
f2e12f4
to
5730498
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
This changeset implements two improvements to restoring FSM snapshots from archives:
RestoreFromArchive
helper. The operator can pass these as-filter
arguments tonomad operator snapshot state
(and other commands in the future) to include only desired data when reading the snapshot.Deferred for this PR: the
nomad operator snapshot state
command still has to load everything that's been filtered into the FSM before writing it out to a large JSON blob. We should provide a tool that streams the decoded objects directly to an encoder without loading into the FSM, so that we can emit NDJSON, write out to a sqlite DB, etc.Example:
Starting with a 439MB snapshot (~13GiB uncompressed), I want to filter for all objects associated with 3 different jobs and 3 different nodes:
Previously this would write ~13GiB to disk, read 14GiB from disk, and saturate 1 core for over an hour before running out of memory on my machine (16GiB) and crashing.
With this change, the command reads ~450MiB from disk, only writes the 197MiB JSON blob to disk, and uses about 150% CPU, maxing out memory usage around 330MB.